https://github.com/conect2ai/legislative-texts-rn

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: conect2ai
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 5.49 MB

Statistics

Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

Exploring Legislative Textual Data in Brazilian Portuguese: Readability Analysis and Knowledge Graph Generation

✍🏾 Authors: Gisliany Alves, Breno Santana Santos, Marianne Diniz, Ivanovitch Silva

1. Abstract/Overview

Legislative documents are key to democratic societies, defining the legal framework for social life. In Brazil, legislative texts are particularly complex due to extensive technical jargon, intricate sentence structures, and frequent references to prior legislation. The country’s civil law tradition and multicultural context introduce further interpretative and linguistic challenges. Moreover, the study of Brazilian Portuguese legislative texts remains underexplored, lacking legal-specific models and datasets. To address these issues, a data-driven approach using Large Language Models (LLMs) was proposed to analyze these documents and extract Knowledge Graphs (KGs). A case study using 1,869 proposals from the Legislative Assembly of Rio Grande do Norte (ALRN), spanning January 2019 to April 2024, was conducted. The Llama 3.2 3B Instruct model was applied to extract KGs representing entities and their relationships. The findings support the method’s effectiveness in producing coherent graphs faithful to the original content. Nonetheless, challenges remain in resolving entity ambiguity and achieving full relationship coverage. Additionally, readability analyses using metrics for Brazilian Portuguese revealed that ALRN proposals require high-level reading skills due to their technical style. Finally, this study advances Legal Artificial Intelligence by offering insights into Brazilian legislative texts and promoting transparency and accessibility through natural language processing techniques.

2. Artifacts

The results can be found in the notebook: 📕 #01 Analyzing Legislative Data

The remainder of this repository holds the folders: - data: This folder serves as the destination for downloading the dataset used in the project. It also contains an intermediate dataset that includes readability metrics. - images: Stores visual assets related to the project, such as figures generated during analysis or for documentation purposes. - models: This folder is where Llama 3.2 3B Instruct will be downloaded - utils: Contains utility scripts or helper functions used throughout the project, such as data preprocessing scripts and reusable code.

3. Environment Setup

Requirements

Python 3.12
Miniconda 23.11 or greater
Visual Studio Code

Setup

Clone this repository:

bash git clone https://github.com/conect2ai/legislative-texts-rn.git

Create a file named .env based on .env_template. BASE_DIR specifies the absolute path to the project folder (filled with it), while HUGGING_FACE_TOKEN is the token used to authenticate and download the required model. You can generate your token from your Hugging Face account Settings > Access Token.
Create your conda environment named legislative executing:

bash conda env create -f environment.yml

Activate the environment using conda activate legislative. Now, you can run the notebook in it.

4. Additional Notes

Hugging Face Token Permissions: Ensure your Hugging Face token has the appropriate permission (READ) to download the model.
Variability of the Generated Knowledge Graph (KG): Even when setting a random seed, the generated Knowledge Graph (KG) may vary slightly due to factors beyond direct control. Some potential reasons include:
- Non-Deterministic Operations: Certain GPU operations, such as matrix multiplications or cumulative sums, may use non-deterministic algorithms depending on your hardware and software configuration.
- CUDA and CuBLAS Behavior: GPU operations often rely on CuBLAS, which can introduce slight variations in computation.
- Floating-Point Precision: The inherent nature of floating-point arithmetic on GPUs may produce slightly different results on different hardware or configurations.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About us

The research group Conect2AI consists of undergraduate and graduate students from the Federal University of Rio Grande do Norte (UFRN) and aims to apply Artificial Intelligence (AI) and machine learning in emerging fields. Our expertise includes Embedded Intelligence and IoT, optimizing resource management and energy efficiency, contributing to sustainable cities. In energy transition and mobility, we apply AI to optimize energy use in connected vehicles and promote more sustainable mobility.

Owner

Name: conect2ai
Login: conect2ai
Kind: organization

Repositories: 1
Profile: https://github.com/conect2ai

GitHub Events

Total

Watch event: 2
Push event: 2
Public event: 1

Last Year

Watch event: 2
Push event: 2
Public event: 1

Dependencies

environment.yml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science