https://github.com/conect2ai/legislative-texts-rn
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: conect2ai
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 5.49 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Exploring Legislative Textual Data in Brazilian Portuguese: Readability Analysis and Knowledge Graph Generation
✍🏾 Authors: Gisliany Alves, Breno Santana Santos, Marianne Diniz, Ivanovitch Silva
1. Abstract/Overview
Legislative documents are key to democratic societies, defining the legal framework for social life. In Brazil, legislative texts are particularly complex due to extensive technical jargon, intricate sentence structures, and frequent references to prior legislation. The country’s civil law tradition and multicultural context introduce further interpretative and linguistic challenges. Moreover, the study of Brazilian Portuguese legislative texts remains underexplored, lacking legal-specific models and datasets. To address these issues, a data-driven approach using Large Language Models (LLMs) was proposed to analyze these documents and extract Knowledge Graphs (KGs). A case study using 1,869 proposals from the Legislative Assembly of Rio Grande do Norte (ALRN), spanning January 2019 to April 2024, was conducted. The Llama 3.2 3B Instruct model was applied to extract KGs representing entities and their relationships. The findings support the method’s effectiveness in producing coherent graphs faithful to the original content. Nonetheless, challenges remain in resolving entity ambiguity and achieving full relationship coverage. Additionally, readability analyses using metrics for Brazilian Portuguese revealed that ALRN proposals require high-level reading skills due to their technical style. Finally, this study advances Legal Artificial Intelligence by offering insights into Brazilian legislative texts and promoting transparency and accessibility through natural language processing techniques.
2. Artifacts
The results can be found in the notebook: 📕 #01 Analyzing Legislative Data
The remainder of this repository holds the folders:
- data: This folder serves as the destination for downloading the dataset used in the project. It also contains an intermediate dataset that includes readability metrics.
- images: Stores visual assets related to the project, such as figures generated during analysis or for documentation purposes.
- models: This folder is where Llama 3.2 3B Instruct will be downloaded
- utils: Contains utility scripts or helper functions used throughout the project, such as data preprocessing scripts and reusable code.
3. Environment Setup
Requirements
- Python 3.12
- Miniconda 23.11 or greater
- Visual Studio Code
Setup
- Clone this repository:
bash
git clone https://github.com/conect2ai/legislative-texts-rn.git
Create a file named
.envbased on.env_template.BASE_DIRspecifies the absolute path to the project folder (filled with it), whileHUGGING_FACE_TOKENis the token used to authenticate and download the required model. You can generate your token from your Hugging Face account Settings > Access Token.Create your conda environment named
legislativeexecuting:
bash
conda env create -f environment.yml
- Activate the environment using
conda activate legislative. Now, you can run the notebook in it.
4. Additional Notes
Hugging Face Token Permissions: Ensure your Hugging Face token has the appropriate permission (READ) to download the model.
Variability of the Generated Knowledge Graph (KG): Even when setting a random seed, the generated Knowledge Graph (KG) may vary slightly due to factors beyond direct control. Some potential reasons include:
- Non-Deterministic Operations: Certain GPU operations, such as matrix multiplications or cumulative sums, may use non-deterministic algorithms depending on your hardware and software configuration.
- CUDA and CuBLAS Behavior: GPU operations often rely on CuBLAS, which can introduce slight variations in computation.
- Floating-Point Precision: The inherent nature of floating-point arithmetic on GPUs may produce slightly different results on different hardware or configurations.
License
This project is licensed under the MIT License - see the LICENSE file for details.
About us
The research group Conect2AI consists of undergraduate and graduate students from the Federal University of Rio Grande do Norte (UFRN) and aims to apply Artificial Intelligence (AI) and machine learning in emerging fields. Our expertise includes Embedded Intelligence and IoT, optimizing resource management and energy efficiency, contributing to sustainable cities. In energy transition and mobility, we apply AI to optimize energy use in connected vehicles and promote more sustainable mobility.
Owner
- Name: conect2ai
- Login: conect2ai
- Kind: organization
- Repositories: 1
- Profile: https://github.com/conect2ai
GitHub Events
Total
- Watch event: 2
- Push event: 2
- Public event: 1
Last Year
- Watch event: 2
- Push event: 2
- Public event: 1