bluebook

https://github.com/jg21243/bluebook

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: JG21243
License: mit
Language: Python
Default Branch: main
Size: 44.9 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 0

Created about 2 years ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

Bluebook Legal Citation Processor

A comprehensive Python toolkit for extracting, validating, and enriching legal citations with AI-powered analysis and external data integration.

Overview

The Bluebook Legal Citation Processor is a powerful tool designed for legal professionals, researchers, and developers who need to:

Extract legal citations from text documents and PDFs using the industry-standard eyecite library
Validate citation accuracy using OpenAI's GPT-4 for factual verification and 21st edition Legal Bluebook compliance
Enrich citations with comprehensive case data from the Court Listener API
Process documents efficiently using multi-threaded batch processing for high-performance workflows

Features

🔍 Smart Citation Extraction: Advanced text processing with eyecite for accurate citation detection
🤖 AI-Powered Validation: GPT-4 integration for factual accuracy and Legal Bluebook compliance checking
📚 Data Enrichment: Automatic case data retrieval from Court Listener API
⚡ High Performance: Multi-threaded batch processing for efficient large-document handling
📄 PDF Support: Extract citations directly from PDF documents using PyMuPDF
🔧 Modular Design: Multiple specialized scripts for different use cases
📊 Comprehensive Logging: Detailed logging for debugging and process tracking

Project Structure

├── arm.py # Main citation processor with threading and API integration ├── eye.py # Alternative citation processor with GPT-4 analysis ├── citation_extractor.py # PDF and general citation extraction utilities ├── Citation_Extractor_Eyecite.py # Basic eyecite citation extraction example ├── legal_citation_extractor.py # Additional citation extraction utilities ├── pdf_citation_extractor.py # PDF-specific citation extraction tools ├── api_gov # Congressional API integration example ├── README.md # This documentation ├── CODEBASE_GUIDE.md # Detailed codebase overview └── LICENSE.txt # MIT License

Requirements

Python 3.7+ (tested with Python 3.12)
Required Libraries:
- eyecite - Legal citation extraction and parsing
- openai - OpenAI API integration for GPT-4 analysis
- requests - HTTP requests for API communication
- PyMuPDF (fitz) - PDF document processing
- PyPDF2 - Alternative PDF processing library
- pandas - Data manipulation and analysis

Installation

Clone the repository: bash git clone https://github.com/JG21243/Bluebook.git cd Bluebook
Install required dependencies: bash pip install eyecite openai requests PyMuPDF PyPDF2 pandas

API Configuration

⚠️ Security Warning

Do not commit API keys to version control. The current codebase contains hardcoded API keys that should be removed and replaced with environment variables.

Required API Keys

OpenAI API Key (for GPT-4 analysis):
- Sign up at OpenAI
- Create an API key and set it as an environment variable: bash export OPENAI_API_KEY="your-openai-api-key-here"
Court Listener API Token (for case data enrichment):
- Register at Court Listener
- Set your token as an environment variable: bash export COURT_LISTENER_TOKEN="your-court-listener-token-here"
Congressional API Key (optional, for api_gov script):
- Register at Congress.gov API
- Set your key as an environment variable: bash export CONGRESS_API_KEY="your-congress-api-key-here"

Updating the Code for Environment Variables

Before using the scripts, update the hardcoded API keys to use environment variables:

```python import os from openai import OpenAI

Replace hardcoded keys with environment variables

client = OpenAI(apikey=os.getenv('OPENAIAPIKEY')) courtlistenertoken = os.getenv('COURTLISTENER_TOKEN') ```

Usage

Basic Citation Extraction

```python from CitationExtractorEyecite import extract_citations

text = """As held in Roe v. Wade, 410 U.S. 113 (1973), privacy rights are fundamental. See also Planned Parenthood v. Casey, 505 U.S. 833 (1993)."""

citations = extract_citations(text) for citation in citations: print(f"Found citation: {citation}") ```

Advanced Processing with AI Analysis

```python from arm import check_citations

example_text = """The Supreme Court in Brown v. Board of Education, 347 U.S. 483 (1954), held that racial segregation in public schools violates the Equal Protection Clause."""

Process citations with GPT-4 analysis and Court Listener data

citationresults = checkcitations(example_text)

for citationtext, gpt4feedback, courtlistenerdata in citationresults: print(f"Citation: {citationtext}") print(f"GPT-4 Analysis: {gpt4feedback}") print(f"Case Data: {courtlistener_data}") print("-" * 50) ```

PDF Citation Extraction

```python from citationextractor import extracttextfrompdf from eyecite import get_citations

Extract text from PDF and find citations

pdftext = extracttextfrompdf("legaldocument.pdf") citations = getcitations(pdf_text)

for citation in citations: print(f"PDF Citation: {citation}") ```

Script Descriptions

| Script | Purpose | Key Features | |--------|---------|--------------| | arm.py | Main processing engine | Multi-threading, GPT-4 analysis, Court Listener integration | | eye.py | Alternative processor | Focused GPT-4 citation validation | | citation_extractor.py | PDF utilities | PDF text extraction, citation conversion utilities | | Citation_Extractor_Eyecite.py | Basic example | Simple eyecite usage demonstration | | legal_citation_extractor.py | Utility functions | Additional citation processing tools | | pdf_citation_extractor.py | PDF specialist | Dedicated PDF citation extraction | | api_gov | Government data | Congressional API integration example |

Development

Running the Scripts

Basic citation extraction: bash python Citation_Extractor_Eyecite.py
Advanced processing: bash python arm.py
PDF processing: bash python pdf_citation_extractor.py

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes with proper testing
Remove any hardcoded API keys
Submit a pull request

Security Best Practices

Never commit API keys or sensitive credentials
Use environment variables for all API configurations
Regularly rotate API keys
Review code for security vulnerabilities before committing

Troubleshooting

Common Issues:

Import Errors: Ensure all required libraries are installed
API Errors: Verify API keys are set correctly as environment variables
PDF Processing Issues: Install PyMuPDF with: pip install PyMuPDF
Rate Limiting: Implement delays between API calls if needed

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

Acknowledgments

eyecite - Legal citation extraction
Court Listener - Legal case data API
OpenAI - GPT-4 AI analysis
Contributors and the legal technology community

Owner

Name: Josh Glen
Login: JG21243
Kind: user

Repositories: 1
Profile: https://github.com/JG21243

Citation (Citation_Extractor_Eyecite.py)

from eyecite import get_citations, resolve_citations
from eyecite.clean import clean_text

def extract_citations(text):
    # Define the cleaning steps using actual functions from eyecite.clean
    steps = [
        'html',                 # Removes HTML markup
        'inline_whitespace',    # Collapses multiple spaces or tabs into one space
        'all_whitespace',       # Collapses multiple whitespace characters into one space
        'underscores'           # Removes strings of two or more underscores
    ]

    # Clean the text and extract citations
    cleaned_text = clean_text(text, steps)
    citations = get_citations(cleaned_text)

    return citations

def find_first_full_case_citation(citations):
    for citation in citations:
        if citation.__class__.__name__ == "FullCaseCitation":
            return citation

    return None

def print_citation_details(citation):
    #print(dir(citation))  # Lists all attributes and methods

    # To see the value of a specific attribute, for example, 'reporter'
    print(getattr(citation, 'corrected_citation', 'Attribute not found'))

    # Or simply using dot notation if you know the attribute exists
    #print(citation.corrected_citation())

    #print(citation.groups['reporter'])

text = "We conclude that this approach was error. The law has long accommodated new technologies within existing legal frameworks. See, e.g., Kyllo v. United States, 533 U.S. 27, 33-40 (2001) (holding that the use of thermal imaging technology can constitute a search under the Fourth Amendment); Thyroff v. Nationwide Mut. Ins. Co., 8 N.Y.3d 283, 292-93 (2007) (treating electronic records as property equivalent to physical records for the purposes of conversion)."

citations = extract_citations(text)
full_case_citation = find_first_full_case_citation(citations)
if full_case_citation is not None:
    print_citation_details(full_case_citation)

GitHub Events

Total

Issues event: 1
Delete event: 1
Push event: 3
Pull request review event: 4
Pull request event: 2

Last Year

Issues event: 1
Delete event: 1
Push event: 3
Pull request review event: 4
Pull request event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 2 minutes
Total issue authors: 1
Total pull request authors: 2
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 2 minutes
Issue authors: 1
Pull request authors: 2
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0