citation-decoder

: An AI-powered tool that extracts citations from research papers and explains their context, purpose, and relationship to the main paper. Built with PyMuPDF for PDF parsing, OpenAI GPT-4 for citation analysis, and Streamlit for the web interface.

https://github.com/naveenvasou/citation-decoder

Last synced: 9 months ago · JSON representation ·

Repository

: An AI-powered tool that extracts citations from research papers and explains their context, purpose, and relationship to the main paper. Built with PyMuPDF for PDF parsing, OpenAI GPT-4 for citation analysis, and Streamlit for the web interface.

Basic Info

Host: GitHub
Owner: naveenvasou
License: apache-2.0
Language: Python
Default Branch: main
Size: 21.5 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

README.md

Citation Decoder

Citation Decoder is an AI-powered tool that helps researchers understand the citation network within academic papers. It extracts in-text citations from research papers and provides insights about each citation's context, purpose, and relationship to the main paper.

Features

Citation Extraction: Automatically identifies and extracts in-text citations from PDF research papers
Context Analysis: Analyzes each citation to determine:
- What the cited paper contributes to the current paper
- The purpose of the citation (supporting evidence, contrast, background, etc.)
- Whether the authors agree, critique, or extend the cited work
Multiple Input Options: Upload PDFs directly or provide arXiv links
Interactive Interface: Clean Streamlit UI for easy exploration of citation insights

Installation

```bash

Clone the repository

git clone https://github.com/yourusername/citation-decoder.git cd citation-decoder

Create a virtual environment

python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Create a .env file and add your API keys

echo "OPENAIAPIKEY=youropenaiapi_key" > .env ```

Usage

```bash

Run the Streamlit app

streamlit run app.py ```

Then open your browser and navigate to the displayed URL (usually http://localhost:8501).

Requirements

Python 3.7+
PyMuPDF
OpenAI API key
Streamlit
Pandas
Requests
python-dotenv

Project Structure

citation-decoder/ ├── src/ │ ├── utils/ │ │ ├── pdf_parser.py # PDF extraction functionality │ │ ├── citation_analyzer.py # Citation analysis using GPT-4 │ │ └── api_client.py # API clients for paper metadata │ └── main.py # Core application logic ├── app.py # Streamlit web interface ├── requirements.txt # Project dependencies └── .env # Environment variables (API keys)

Future Improvements

Citation network visualization
Integration with more academic paper databases
Batch processing of multiple papers
Enhanced pattern matching for more citation styles
Caching of results for improved performance

License

Apache 2.0

Acknowledgements

OpenAI for providing the GPT-4 API
PyMuPDF for PDF parsing capabilities
Streamlit for the web application framework

Owner

Name: Naveen Kumar
Login: naveenvasou
Kind: user
Location: Pondicherry, India

Repositories: 1
Profile: https://github.com/naveenvasou

Happy day!

Citation (citation_analyzer.py)

import openai
import os
import logging
from typing import List, Dict, Any

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class CitationAnalyzer:
    def __init__(self, api_key=None):
        """Initialize the CitationAnalyzer with an API key"""
        if api_key:
            openai.api_key = api_key
        else:
            openai.api_key = os.getenv("OPENAI_API_KEY")
            
        if not openai.api_key:
            raise ValueError("OpenAI API key is required. Please provide it or set OPENAI_API_KEY environment variable.")
        self.client = openai.OpenAI(api_key=api_key)  
    
    def analyze_citation(self, citation: Dict[str, Any]) -> Dict[str, Any]:
        """
        Analyze a single citation with its context
        
        Args:
            citation: Dictionary with citation text and context
            
        Returns:
            Dictionary with analysis results
        """

        try:
            prompt = f"""
            Analyze the following citation in its context:
            
            Citation: {citation['citation']}
            
            Context:
            "{citation['context']}"
            
            Please provide:
            1. What this cited paper contributes to the current paper
            2. The purpose of this citation (e.g., supporting evidence, contrasting view, background information)
            3. The authors' stance towards the cited work (agree, critique, extend, or neutral)
            
            Format your response as a JSON with these keys: "contribution", "purpose", "stance"
            """
            
            prompt_with_system = """You are a research assistant that analyzes academic citations.""" + prompt
            
            response = self.client.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": "You are a research assistant that analyzes academic citations."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.3,
                max_tokens=500
            )
            
            # Extract and clean the response
            result = response.choices[0].message.content.strip()
            

            # Process the JSON response (in practice, add error handling here)
            import json
            try:
                analysis = json.loads(result)
                citation.update(analysis)
            except json.JSONDecodeError:
                # If response isn't valid JSON, extract info manually
                citation.update({
                    "contribution": "Unable to parse analysis",
                    "purpose": "Unknown",
                    "stance": "Unknown",
                    "raw_analysis": result
                })

            return citation
        
        except Exception as e:
            logger.error(f"Error analyzing citation: {e}")
            citation.update({
                "contribution": f"Error during analysis: {str(e)}",
                "purpose": "Error",
                "stance": "Error"
            })
            return citation
        
    def batch_analyze_citations(self, citations: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """
        Analyze a batch of citations
        
        Args:
            citations: List of citation dictionaries
            
        Returns:
            List of dictionaries with analysis results
        """

        results = []
        total = len(citations)

        for i, citation in enumerate(citations):
            logger.info(f"Analyzing citation {i+1}/{total}: {citation['citation']}")
            result = self.analyze_citation(citation)
            results.append(result)

        return results

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science