legalkit-pipeline
Publication pipeline for French legal codes on 🤗 Datasets from LegiFrance with concurrent upload and dynamic REAMDE.md.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
â—‹DOI references
-
â—‹Academic publication links
-
â—‹Academic email domains
-
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Keywords
Repository
Publication pipeline for French legal codes on 🤗 Datasets from LegiFrance with concurrent upload and dynamic REAMDE.md.
Basic Info
- Host: GitHub
- Owner: louisbrulenaudet
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://huggingface.co/louisbrulenaudet
- Size: 51.8 KB
Statistics
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
LegalKit Pipeline: Open Access to French Legal Codes on 🤗 Datasets
The LegalKit Pipeline project aims to provide open access to French legal codes on the 🤗 Datasets platform, thereby democratizing access to legal information and promoting transparency and understanding of the French legal system. Our mission is to compile and publish a comprehensive collection of French legal codes, spanning civil law, criminal law, and administrative regulations among other areas, to cater to the diverse needs of legal professionals, researchers, students, and enthusiasts alike.
With LegalKit Pipeline, individuals have the opportunity to explore, analyze, and leverage French legal texts for various purposes, empowering them to navigate and interpret the law with ease. By facilitating access to this valuable resource, we aim to foster greater transparency and knowledge accessibility in the legal domain, enabling stakeholders to make informed decisions and advance legal scholarship and practice.
Join us in our commitment to advancing legal transparency and knowledge accessibility through the LegalKit Pipeline project, as we strive to make French legal codes accessible to everyone on the 🤗 Datasets platform.
Inspiration and Ideas
The LegalKit Pipeline project draws inspiration for cutting-edge techniques such as fine-tuning and the use of Retrieval-Augmented Generation (RAG) to create efficient and accurate language models tailored for legal practice.
Tech Stack
Language: Python +3.9.0
Installation
Clone the repo
sh
git clone https://github.com/louisbrulenaudet/legalkit-pipeline.git
Concurrent reading of the LegalKit
To use all the legal data published on LegalKit, you can use this code snippet: ```python
-- coding: utf-8 --
import concurrent.futures import os
import datasets from tqdm.notebook import tqdm
def dataset_loader( name:str, streaming:bool=True ) -> datasets.Dataset: """ Helper function to load a single dataset in parallel.
Parameters
----------
name : str
Name of the dataset to be loaded.
streaming : bool, optional
Determines if datasets are streamed. Default is True.
Returns
-------
dataset : datasets.Dataset
Loaded dataset object.
Raises
------
Exception
If an error occurs during dataset loading.
"""
try:
return datasets.load_dataset(
name,
split="train",
streaming=streaming
)
except Exception as exc:
logging.error(f"Error loading dataset {name}: {exc}")
return None
def load_datasets( req:list, streaming:bool=True ) -> list: """ Downloads datasets specified in a list and creates a list of loaded datasets.
Parameters
----------
req : list
A list containing the names of datasets to be downloaded.
streaming : bool, optional
Determines if datasets are streamed. Default is True.
Returns
-------
datasets_list : list
A list containing loaded datasets as per the requested names provided in 'req'.
Raises
------
Exception
If an error occurs during dataset loading or processing.
Examples
--------
>>> datasets = load_datasets(["dataset1", "dataset2"], streaming=False)
"""
datasets_list = []
with concurrent.futures.ThreadPoolExecutor() as executor:
future_to_dataset = {executor.submit(dataset_loader, name): name for name in req}
for future in tqdm(concurrent.futures.as_completed(future_to_dataset), total=len(req)):
name = future_to_dataset[future]
try:
dataset = future.result()
if dataset:
datasets_list.append(dataset)
except Exception as exc:
logging.error(f"Error processing dataset {name}: {exc}")
return datasets_list
req = [ "louisbrulenaudet/code-artisanat", "louisbrulenaudet/code-action-sociale-familles", # ... ]
datasetslist = loaddatasets( req=req, streaming=True )
dataset = datasets.concatenatedatasets( datasetslist ) ```
Citing this project
If you use this code in your research, please use the following BibTeX entry.
BibTeX
@misc{louisbrulenaudet2024,
author = {Louis Brulé Naudet},
title = {LegalKit Pipeline: Open Access to French Legal Codes on 🤗 Datasets},
howpublished = {\url{https://github.com/louisbrulenaudet/legalkit-pipeline}},
year = {2024}
}
Feedback
If you have any feedback, please reach out at louisbrulenaudet@icloud.com.
Owner
- Name: Louis Brulé Naudet
- Login: louisbrulenaudet
- Kind: user
- Location: Paris
- Company: Université Paris-Dauphine (Paris Sciences et Lettres - PSL)
- Website: https://louisbrulenaudet.com
- Twitter: BruleNaudet
- Repositories: 81
- Profile: https://github.com/louisbrulenaudet
Research in business taxation and development (NLP, LLM, Computer vision...), University Dauphine-PSL 📖 | Backed by the Microsoft for Startups Hub program
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Brulé Naudet" given-names: "Louis" orcid: "https://orcid.org/0000-0001-9111-4879" title: "LegalKit Pipeline: Open Access to French Legal Codes on 🤗 Datasets" version: 1.0.0 date-released: 2024-03-31
GitHub Events
Total
- Watch event: 3
- Push event: 1
- Fork event: 1
Last Year
- Watch event: 3
- Push event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0