https://github.com/bernard-ng/drc-legal-ner
Towards a Congolese Legal Knowledge Graph: LLM-Enhanced NER for Citation Detection
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Keywords
Repository
Towards a Congolese Legal Knowledge Graph: LLM-Enhanced NER for Citation Detection
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Automated Citation Detection in Congolese Legal Texts: Leveraging LLM-Based NER for Knowledge Graph Construction
This paper builds upon our previous work on Juro, an AI-powered chatbot designed to improve legal information access in the Democratic Republic of Congo (DRC), by ad- dressing the specific challenge of automated citation detection in unstructured legal texts. We propose an end-to-end approach that combines Large Language Model (LLM)-based annotation and Named Entity Recognition (NER) for extracting key entities critical to constructing a legal knowledge graph. Over 8,400 Congolese legal document titles were scraped and annotated via the GPT-4o-mini model, with subsequent training implemented in spaCy under two distinct configurations emphasizing accuracy and efficiency. We evaluated the system using both a split dataset and a human-annotated benchmark, demonstrating robust per- formance in identifying document types, reference numbers, and publication dates. An initial mapping algorithm connected documents based on annotated entities, revealing a preliminary citation graph of over 1,400 relationships. While the current methodology shows promise in automating entity extraction and preliminary graph construction, future developments will explore deeper relationship modeling, improved type coverage, and integration into the Juro framework to provide enhanced legal support.
Usage
```bash git clone https://github.com/bernard-ng/drc-legal-ner.git cd drc-legal-ner
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt docker compose up ```
- Annotation
Will generate a dataset of Congolese legal texts and annotate it using OpenAI's GPT-4o-mini you can do it synchronously or asynchronously (with batch API).
```bash python -m processing.batch.requests --build python -m processing.batch.requests --upload python -m processing.batch.requests --create python -m processing.batch.response # 24h later
python -m process.annotate --method=async
python -m processing.format --label-studio # for Human feedback and validation python -m processing.format --spacy-binary # Spacy compatible format for training ```
- Tasks
bash
make train_efficiency # Train the model with efficiency
make train_accuracy # Train the model with accuracy
make evaluate # Evaluate the model
make benchmark # Benchmark the model
make visualize # Visualize NER
make clean # Clean the model and results
Owner
- Name: Bernard Ngandu
- Login: bernard-ng
- Kind: user
- Location: Lubumbashi RDC
- Company: @devscast
- Website: https://devscast.tech
- Twitter: BernardNgandu
- Repositories: 7
- Profile: https://github.com/bernard-ng
Building a community of skilled developers : @devscast
GitHub Events
Total
- Watch event: 3
- Push event: 10
- Create event: 2
Last Year
- Watch event: 3
- Push event: 10
- Create event: 2
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Deprecated ==1.2.18
- Django ==4.2.19
- Faker ==36.1.1
- GitPython ==3.1.44
- Jinja2 ==3.1.5
- MarkupSafe ==3.0.2
- PyYAML ==6.0.2
- Pygments ==2.19.1
- altair ==5.5.0
- annotated-types ==0.7.0
- anyio ==4.8.0
- appdirs ==1.4.4
- argcomplete ==3.5.3
- asgiref ==3.8.1
- attr ==0.3.1
- attrs ==25.1.0
- azure-core ==1.32.0
- azure-storage-blob ==12.24.1
- black ==25.1.0
- bleach ==5.0.1
- blinker ==1.9.0
- blis ==1.2.0
- boto ==2.49.0
- boto3 ==1.36.21
- botocore ==1.36.21
- cachetools ==5.5.1
- catalogue ==2.0.10
- certifi ==2025.1.31
- cffi ==1.17.1
- charset-normalizer ==3.4.1
- click ==8.1.8
- cloudpathlib ==0.20.0
- colorama ==0.4.6
- confection ==0.1.5
- cryptography ==44.0.1
- cymem ==2.0.11
- datamodel-code-generator ==0.26.1
- defusedxml ==0.7.1
- distro ==1.9.0
- django-annoying ==0.10.6
- django-cors-headers ==3.6.0
- django-csp ==3.7
- django-debug-toolbar ==3.2.1
- django-environ ==0.10.0
- django-extensions ==3.2.3
- django-filter ==2.4.0
- django-migration-linter ==5.1.0
- django-model-utils ==4.1.1
- django-ranged-fileresponse ==0.1.2
- django-rq ==2.5.1
- django-storages ==1.12.3
- django-user-agents ==0.4.0
- djangorestframework ==3.15.2
- dnspython ==2.7.0
- drf-dynamic-fields ==0.3.0
- drf-flex-fields ==0.9.5
- drf-generators ==0.3.0
- email_validator ==2.2.0
- exceptiongroup ==1.2.2
- expiringdict ==1.2.2
- genson ==1.3.0
- gitdb ==4.0.12
- google-api-core ==2.24.1
- google-auth ==2.38.0
- google-cloud-appengine-logging ==1.6.0
- google-cloud-audit-log ==0.3.0
- google-cloud-core ==2.4.1
- google-cloud-logging ==3.11.4
- google-cloud-storage ==2.19.0
- google-crc32c ==1.6.0
- google-resumable-media ==2.7.2
- googleapis-common-protos ==1.67.0
- grpc-google-iam-v1 ==0.14.0
- grpcio ==1.70.0
- grpcio-status ==1.70.0
- h11 ==0.14.0
- httpcore ==1.0.7
- httpx ==0.28.1
- humansignal-drf-yasg ==1.21.10.post1
- idna ==3.10
- ijson ==3.3.0
- importlib_metadata ==8.5.0
- inflect ==5.6.2
- inflection ==0.5.1
- isodate ==0.7.2
- isort ==5.13.2
- jiter ==0.8.2
- jmespath ==1.0.1
- joblib ==1.4.2
- jsf ==0.11.2
- jsonschema ==4.23.0
- jsonschema-specifications ==2024.10.1
- label-studio ==1.15.0
- label-studio-sdk ==1.0.8
- langcodes ==3.5.0
- language_data ==1.3.0
- launchdarkly-server-sdk ==8.2.1
- lockfile ==0.12.2
- lxml ==5.3.1
- lxml_html_clean ==0.4.1
- marisa-trie ==1.2.1
- markdown-it-py ==3.0.0
- mdurl ==0.1.2
- murmurhash ==1.0.12
- mypy-extensions ==1.0.0
- narwhals ==1.27.1
- nltk ==3.9.1
- numpy ==1.26.4
- ollama ==0.4.7
- openai ==1.61.1
- opentelemetry-api ==1.30.0
- ordered-set ==4.0.2
- packaging ==24.2
- pandas ==2.2.3
- pathspec ==0.12.1
- pillow ==10.4.0
- platformdirs ==4.3.6
- preshed ==3.0.9
- proto-plus ==1.26.0
- protobuf ==5.29.3
- psycopg2-binary ==2.9.10
- pyRFC3339 ==2.0.1
- pyarrow ==19.0.0
- pyasn1 ==0.6.1
- pyasn1_modules ==0.4.1
- pyboxen ==1.3.0
- pycparser ==2.22
- pydantic ==2.10.6
- pydantic_core ==2.27.2
- pydeck ==0.9.1
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- python-json-logger ==2.0.4
- pytz ==2022.7.1
- redis ==3.5.3
- referencing ==0.36.2
- regex ==2024.11.6
- requests ==2.32.3
- requests-mock ==1.12.1
- rich ==13.9.4
- rpds-py ==0.22.3
- rq ==1.10.1
- rsa ==4.9
- rstr ==3.2.2
- rules ==3.4
- s3transfer ==0.11.2
- semver ==3.0.4
- sentry-sdk ==2.21.0
- shellingham ==1.5.4
- six ==1.17.0
- smart-open ==7.1.0
- smmap ==5.0.2
- sniffio ==1.3.1
- spacy ==3.8.3
- spacy-legacy ==3.0.12
- spacy-loggers ==1.0.5
- spacy-streamlit ==1.0.6
- sqlparse ==0.5.3
- srsly ==2.5.1
- streamlit ==1.36.0
- tenacity ==8.5.0
- thinc ==8.3.4
- toml ==0.10.2
- tomli ==2.2.1
- tornado ==6.4.2
- tqdm ==4.67.1
- typer ==0.15.1
- typing_extensions ==4.12.2
- tzdata ==2025.1
- ua-parser ==1.0.1
- ua-parser-builtins ==0.18.0.post1
- ujson ==5.10.0
- uritemplate ==4.1.1
- urllib3 ==1.26.20
- user-agents ==2.2.0
- wasabi ==1.1.3
- weasel ==0.4.1
- webencodings ==0.5.1
- wrapt ==1.17.2
- xmljson ==0.2.1
- zipp ==3.21.0