bluebook
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: JG21243
- License: mit
- Language: Python
- Default Branch: main
- Size: 44.9 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
Bluebook Legal Citation Processor
A comprehensive Python toolkit for extracting, validating, and enriching legal citations with AI-powered analysis and external data integration.
Overview
The Bluebook Legal Citation Processor is a powerful tool designed for legal professionals, researchers, and developers who need to:
- Extract legal citations from text documents and PDFs using the industry-standard
eyecitelibrary - Validate citation accuracy using OpenAI's GPT-4 for factual verification and 21st edition Legal Bluebook compliance
- Enrich citations with comprehensive case data from the Court Listener API
- Process documents efficiently using multi-threaded batch processing for high-performance workflows
Features
- 🔍 Smart Citation Extraction: Advanced text processing with
eyecitefor accurate citation detection - 🤖 AI-Powered Validation: GPT-4 integration for factual accuracy and Legal Bluebook compliance checking
- 📚 Data Enrichment: Automatic case data retrieval from Court Listener API
- ⚡ High Performance: Multi-threaded batch processing for efficient large-document handling
- 📄 PDF Support: Extract citations directly from PDF documents using PyMuPDF
- 🔧 Modular Design: Multiple specialized scripts for different use cases
- 📊 Comprehensive Logging: Detailed logging for debugging and process tracking
Project Structure
├── arm.py # Main citation processor with threading and API integration
├── eye.py # Alternative citation processor with GPT-4 analysis
├── citation_extractor.py # PDF and general citation extraction utilities
├── Citation_Extractor_Eyecite.py # Basic eyecite citation extraction example
├── legal_citation_extractor.py # Additional citation extraction utilities
├── pdf_citation_extractor.py # PDF-specific citation extraction tools
├── api_gov # Congressional API integration example
├── README.md # This documentation
├── CODEBASE_GUIDE.md # Detailed codebase overview
└── LICENSE.txt # MIT License
Requirements
- Python 3.7+ (tested with Python 3.12)
- Required Libraries:
eyecite- Legal citation extraction and parsingopenai- OpenAI API integration for GPT-4 analysisrequests- HTTP requests for API communicationPyMuPDF(fitz) - PDF document processingPyPDF2- Alternative PDF processing librarypandas- Data manipulation and analysis
Installation
Clone the repository:
bash git clone https://github.com/JG21243/Bluebook.git cd BluebookInstall required dependencies:
bash pip install eyecite openai requests PyMuPDF PyPDF2 pandas
API Configuration
⚠️ Security Warning
Do not commit API keys to version control. The current codebase contains hardcoded API keys that should be removed and replaced with environment variables.
Required API Keys
OpenAI API Key (for GPT-4 analysis):
- Sign up at OpenAI
- Create an API key and set it as an environment variable:
bash export OPENAI_API_KEY="your-openai-api-key-here"
Court Listener API Token (for case data enrichment):
- Register at Court Listener
- Set your token as an environment variable:
bash export COURT_LISTENER_TOKEN="your-court-listener-token-here"
Congressional API Key (optional, for
api_govscript):- Register at Congress.gov API
- Set your key as an environment variable:
bash export CONGRESS_API_KEY="your-congress-api-key-here"
Updating the Code for Environment Variables
Before using the scripts, update the hardcoded API keys to use environment variables:
```python import os from openai import OpenAI
Replace hardcoded keys with environment variables
client = OpenAI(apikey=os.getenv('OPENAIAPIKEY')) courtlistenertoken = os.getenv('COURTLISTENER_TOKEN') ```
Usage
Basic Citation Extraction
```python from CitationExtractorEyecite import extract_citations
text = """As held in Roe v. Wade, 410 U.S. 113 (1973), privacy rights are fundamental. See also Planned Parenthood v. Casey, 505 U.S. 833 (1993)."""
citations = extract_citations(text) for citation in citations: print(f"Found citation: {citation}") ```
Advanced Processing with AI Analysis
```python from arm import check_citations
example_text = """The Supreme Court in Brown v. Board of Education, 347 U.S. 483 (1954), held that racial segregation in public schools violates the Equal Protection Clause."""
Process citations with GPT-4 analysis and Court Listener data
citationresults = checkcitations(example_text)
for citationtext, gpt4feedback, courtlistenerdata in citationresults: print(f"Citation: {citationtext}") print(f"GPT-4 Analysis: {gpt4feedback}") print(f"Case Data: {courtlistener_data}") print("-" * 50) ```
PDF Citation Extraction
```python from citationextractor import extracttextfrompdf from eyecite import get_citations
Extract text from PDF and find citations
pdftext = extracttextfrompdf("legaldocument.pdf") citations = getcitations(pdf_text)
for citation in citations: print(f"PDF Citation: {citation}") ```
Script Descriptions
| Script | Purpose | Key Features |
|--------|---------|--------------|
| arm.py | Main processing engine | Multi-threading, GPT-4 analysis, Court Listener integration |
| eye.py | Alternative processor | Focused GPT-4 citation validation |
| citation_extractor.py | PDF utilities | PDF text extraction, citation conversion utilities |
| Citation_Extractor_Eyecite.py | Basic example | Simple eyecite usage demonstration |
| legal_citation_extractor.py | Utility functions | Additional citation processing tools |
| pdf_citation_extractor.py | PDF specialist | Dedicated PDF citation extraction |
| api_gov | Government data | Congressional API integration example |
Development
Running the Scripts
Basic citation extraction:
bash python Citation_Extractor_Eyecite.pyAdvanced processing:
bash python arm.pyPDF processing:
bash python pdf_citation_extractor.py
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes with proper testing
- Remove any hardcoded API keys
- Submit a pull request
Security Best Practices
- Never commit API keys or sensitive credentials
- Use environment variables for all API configurations
- Regularly rotate API keys
- Review code for security vulnerabilities before committing
Troubleshooting
Common Issues:
- Import Errors: Ensure all required libraries are installed
- API Errors: Verify API keys are set correctly as environment variables
- PDF Processing Issues: Install PyMuPDF with:
pip install PyMuPDF - Rate Limiting: Implement delays between API calls if needed
License
This project is licensed under the MIT License - see the LICENSE.txt file for details.
Acknowledgments
- eyecite - Legal citation extraction
- Court Listener - Legal case data API
- OpenAI - GPT-4 AI analysis
- Contributors and the legal technology community
Owner
- Name: Josh Glen
- Login: JG21243
- Kind: user
- Repositories: 1
- Profile: https://github.com/JG21243
Citation (Citation_Extractor_Eyecite.py)
from eyecite import get_citations, resolve_citations
from eyecite.clean import clean_text
def extract_citations(text):
# Define the cleaning steps using actual functions from eyecite.clean
steps = [
'html', # Removes HTML markup
'inline_whitespace', # Collapses multiple spaces or tabs into one space
'all_whitespace', # Collapses multiple whitespace characters into one space
'underscores' # Removes strings of two or more underscores
]
# Clean the text and extract citations
cleaned_text = clean_text(text, steps)
citations = get_citations(cleaned_text)
return citations
def find_first_full_case_citation(citations):
for citation in citations:
if citation.__class__.__name__ == "FullCaseCitation":
return citation
return None
def print_citation_details(citation):
#print(dir(citation)) # Lists all attributes and methods
# To see the value of a specific attribute, for example, 'reporter'
print(getattr(citation, 'corrected_citation', 'Attribute not found'))
# Or simply using dot notation if you know the attribute exists
#print(citation.corrected_citation())
#print(citation.groups['reporter'])
text = "We conclude that this approach was error. The law has long accommodated new technologies within existing legal frameworks. See, e.g., Kyllo v. United States, 533 U.S. 27, 33-40 (2001) (holding that the use of thermal imaging technology can constitute a search under the Fourth Amendment); Thyroff v. Nationwide Mut. Ins. Co., 8 N.Y.3d 283, 292-93 (2007) (treating electronic records as property equivalent to physical records for the purposes of conversion)."
citations = extract_citations(text)
full_case_citation = find_first_full_case_citation(citations)
if full_case_citation is not None:
print_citation_details(full_case_citation)
GitHub Events
Total
- Issues event: 1
- Delete event: 1
- Push event: 3
- Pull request review event: 4
- Pull request event: 2
Last Year
- Issues event: 1
- Delete event: 1
- Push event: 3
- Pull request review event: 4
- Pull request event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 2 minutes
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 2 minutes
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tsrrrrr (1)
Pull Request Authors
- Copilot (1)