llm_gov_assistant

An agent-based system for context-based document filtering and information aggregation.

https://github.com/marios-mamalis/llm_gov_assistant

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.8%) to scientific vocabulary

Keywords

information-retrieval large-language-models llm-agent retrieval-augmented-generation
Last synced: 6 months ago · JSON representation

Repository

An agent-based system for context-based document filtering and information aggregation.

Basic Info
  • Host: GitHub
  • Owner: Marios-Mamalis
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 28.3 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
information-retrieval large-language-models llm-agent retrieval-augmented-generation
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

An LLM agent-based governace assistant

Official implementation for the systems presented in "A Large Language Model Agent Based Legal Assistant for Governance Applications".

Overview

This repository includes two systems: a basic retrieval-augmented generation pipeline for answering questions based on an external corpus, and an agent-based system specialized in answering questions about aggregated metrics across multiple documents.

RAG is implemented in a standard manner by using the initial query's embeddings to retrieve relevant documents for inclusion in the question prompt. The agent-based subsystem operates in three steps: first, it defines the information to be extracted from each document based on the user's query. Next, it extracts this information from each document, homogenizes the results, and stores them in a structured format. Finally, a Python agent is used to answer the user's queries based on the structured data. The system also supports integrating existing metadata alongside the extracted, structured data, and is optimized for batched inference. Both systems utilize OpenAI models for text and embeddings generation.

The case presented in the paper is contained in case.py.

Installation

Requires Python 3.9 Install dependencies with: pip install -r requirements.txt

Citation

If you use the code, please cite the corresponding paper:
@inproceedings{mamalis2024large, title={A Large Language Model Agent Based Legal Assistant for Governance Applications}, author={Mamalis, Marios Evangelos and Kalampokis, Evangelos and Fitsilis, Fotios and Theodorakopoulos, Georgios and Tarabanis, Konstantinos}, booktitle={International Conference on Electronic Government}, pages={286--301}, year={2024}, organization={Springer} }

Owner

  • Name: Marios Mamalis
  • Login: Marios-Mamalis
  • Kind: user

Data Scientist

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • beautifulsoup4 ==4.12.2
  • chromadb ==0.4.3
  • langchain ==0.1.12
  • langchain_community ==0.0.28
  • langchain_experimental ==0.0.54
  • numpy ==1.23.5
  • openai ==0.27.8
  • pandas ==1.5.3
  • pypdf ==4.2.0
  • python-dotenv ==1.0.0
  • tabulate ==0.9.0
  • tiktoken ==0.6.0