citation-decoder

: An AI-powered tool that extracts citations from research papers and explains their context, purpose, and relationship to the main paper. Built with PyMuPDF for PDF parsing, OpenAI GPT-4 for citation analysis, and Streamlit for the web interface.

https://github.com/naveenvasou/citation-decoder

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

: An AI-powered tool that extracts citations from research papers and explains their context, purpose, and relationship to the main paper. Built with PyMuPDF for PDF parsing, OpenAI GPT-4 for citation analysis, and Streamlit for the web interface.

Basic Info
  • Host: GitHub
  • Owner: naveenvasou
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 21.5 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

Citation Decoder

Citation Decoder is an AI-powered tool that helps researchers understand the citation network within academic papers. It extracts in-text citations from research papers and provides insights about each citation's context, purpose, and relationship to the main paper.

Features

  • Citation Extraction: Automatically identifies and extracts in-text citations from PDF research papers
  • Context Analysis: Analyzes each citation to determine:
    • What the cited paper contributes to the current paper
    • The purpose of the citation (supporting evidence, contrast, background, etc.)
    • Whether the authors agree, critique, or extend the cited work
  • Multiple Input Options: Upload PDFs directly or provide arXiv links
  • Interactive Interface: Clean Streamlit UI for easy exploration of citation insights

Installation

```bash

Clone the repository

git clone https://github.com/yourusername/citation-decoder.git cd citation-decoder

Create a virtual environment

python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Create a .env file and add your API keys

echo "OPENAIAPIKEY=youropenaiapi_key" > .env ```

Usage

```bash

Run the Streamlit app

streamlit run app.py ```

Then open your browser and navigate to the displayed URL (usually http://localhost:8501).

Requirements

  • Python 3.7+
  • PyMuPDF
  • OpenAI API key
  • Streamlit
  • Pandas
  • Requests
  • python-dotenv

Project Structure

citation-decoder/ ├── src/ │ ├── utils/ │ │ ├── pdf_parser.py # PDF extraction functionality │ │ ├── citation_analyzer.py # Citation analysis using GPT-4 │ │ └── api_client.py # API clients for paper metadata │ └── main.py # Core application logic ├── app.py # Streamlit web interface ├── requirements.txt # Project dependencies └── .env # Environment variables (API keys)

Future Improvements

  • Citation network visualization
  • Integration with more academic paper databases
  • Batch processing of multiple papers
  • Enhanced pattern matching for more citation styles
  • Caching of results for improved performance

License

Apache 2.0

Acknowledgements

  • OpenAI for providing the GPT-4 API
  • PyMuPDF for PDF parsing capabilities
  • Streamlit for the web application framework

Owner

  • Name: Naveen Kumar
  • Login: naveenvasou
  • Kind: user
  • Location: Pondicherry, India

Happy day!

Citation (citation_analyzer.py)

import openai
import os
import logging
from typing import List, Dict, Any

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class CitationAnalyzer:
    def __init__(self, api_key=None):
        """Initialize the CitationAnalyzer with an API key"""
        if api_key:
            openai.api_key = api_key
        else:
            openai.api_key = os.getenv("OPENAI_API_KEY")
            
        if not openai.api_key:
            raise ValueError("OpenAI API key is required. Please provide it or set OPENAI_API_KEY environment variable.")
        self.client = openai.OpenAI(api_key=api_key)  
    
    def analyze_citation(self, citation: Dict[str, Any]) -> Dict[str, Any]:
        """
        Analyze a single citation with its context
        
        Args:
            citation: Dictionary with citation text and context
            
        Returns:
            Dictionary with analysis results
        """

        try:
            prompt = f"""
            Analyze the following citation in its context:
            
            Citation: {citation['citation']}
            
            Context:
            "{citation['context']}"
            
            Please provide:
            1. What this cited paper contributes to the current paper
            2. The purpose of this citation (e.g., supporting evidence, contrasting view, background information)
            3. The authors' stance towards the cited work (agree, critique, extend, or neutral)
            
            Format your response as a JSON with these keys: "contribution", "purpose", "stance"
            """
            
            prompt_with_system = """You are a research assistant that analyzes academic citations.""" + prompt
            
            response = self.client.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": "You are a research assistant that analyzes academic citations."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.3,
                max_tokens=500
            )
            
            # Extract and clean the response
            result = response.choices[0].message.content.strip()
            

            # Process the JSON response (in practice, add error handling here)
            import json
            try:
                analysis = json.loads(result)
                citation.update(analysis)
            except json.JSONDecodeError:
                # If response isn't valid JSON, extract info manually
                citation.update({
                    "contribution": "Unable to parse analysis",
                    "purpose": "Unknown",
                    "stance": "Unknown",
                    "raw_analysis": result
                })

            return citation
        
        except Exception as e:
            logger.error(f"Error analyzing citation: {e}")
            citation.update({
                "contribution": f"Error during analysis: {str(e)}",
                "purpose": "Error",
                "stance": "Error"
            })
            return citation
        
    def batch_analyze_citations(self, citations: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """
        Analyze a batch of citations
        
        Args:
            citations: List of citation dictionaries
            
        Returns:
            List of dictionaries with analysis results
        """

        results = []
        total = len(citations)

        for i, citation in enumerate(citations):
            logger.info(f"Analyzing citation {i+1}/{total}: {citation['citation']}")
            result = self.analyze_citation(citation)
            results.append(result)

        return results

GitHub Events

Total
  • Push event: 9
  • Create event: 2
Last Year
  • Push event: 9
  • Create event: 2