citation-decoder
: An AI-powered tool that extracts citations from research papers and explains their context, purpose, and relationship to the main paper. Built with PyMuPDF for PDF parsing, OpenAI GPT-4 for citation analysis, and Streamlit for the web interface.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary
Repository
: An AI-powered tool that extracts citations from research papers and explains their context, purpose, and relationship to the main paper. Built with PyMuPDF for PDF parsing, OpenAI GPT-4 for citation analysis, and Streamlit for the web interface.
Basic Info
- Host: GitHub
- Owner: naveenvasou
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 21.5 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Citation Decoder
Citation Decoder is an AI-powered tool that helps researchers understand the citation network within academic papers. It extracts in-text citations from research papers and provides insights about each citation's context, purpose, and relationship to the main paper.
Features
- Citation Extraction: Automatically identifies and extracts in-text citations from PDF research papers
- Context Analysis: Analyzes each citation to determine:
- What the cited paper contributes to the current paper
- The purpose of the citation (supporting evidence, contrast, background, etc.)
- Whether the authors agree, critique, or extend the cited work
- Multiple Input Options: Upload PDFs directly or provide arXiv links
- Interactive Interface: Clean Streamlit UI for easy exploration of citation insights
Installation
```bash
Clone the repository
git clone https://github.com/yourusername/citation-decoder.git cd citation-decoder
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Install dependencies
pip install -r requirements.txt
Create a .env file and add your API keys
echo "OPENAIAPIKEY=youropenaiapi_key" > .env ```
Usage
```bash
Run the Streamlit app
streamlit run app.py ```
Then open your browser and navigate to the displayed URL (usually http://localhost:8501).
Requirements
- Python 3.7+
- PyMuPDF
- OpenAI API key
- Streamlit
- Pandas
- Requests
- python-dotenv
Project Structure
citation-decoder/
├── src/
│ ├── utils/
│ │ ├── pdf_parser.py # PDF extraction functionality
│ │ ├── citation_analyzer.py # Citation analysis using GPT-4
│ │ └── api_client.py # API clients for paper metadata
│ └── main.py # Core application logic
├── app.py # Streamlit web interface
├── requirements.txt # Project dependencies
└── .env # Environment variables (API keys)
Future Improvements
- Citation network visualization
- Integration with more academic paper databases
- Batch processing of multiple papers
- Enhanced pattern matching for more citation styles
- Caching of results for improved performance
License
Apache 2.0
Acknowledgements
- OpenAI for providing the GPT-4 API
- PyMuPDF for PDF parsing capabilities
- Streamlit for the web application framework
Owner
- Name: Naveen Kumar
- Login: naveenvasou
- Kind: user
- Location: Pondicherry, India
- Repositories: 1
- Profile: https://github.com/naveenvasou
Happy day!
Citation (citation_analyzer.py)
import openai
import os
import logging
from typing import List, Dict, Any
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class CitationAnalyzer:
def __init__(self, api_key=None):
"""Initialize the CitationAnalyzer with an API key"""
if api_key:
openai.api_key = api_key
else:
openai.api_key = os.getenv("OPENAI_API_KEY")
if not openai.api_key:
raise ValueError("OpenAI API key is required. Please provide it or set OPENAI_API_KEY environment variable.")
self.client = openai.OpenAI(api_key=api_key)
def analyze_citation(self, citation: Dict[str, Any]) -> Dict[str, Any]:
"""
Analyze a single citation with its context
Args:
citation: Dictionary with citation text and context
Returns:
Dictionary with analysis results
"""
try:
prompt = f"""
Analyze the following citation in its context:
Citation: {citation['citation']}
Context:
"{citation['context']}"
Please provide:
1. What this cited paper contributes to the current paper
2. The purpose of this citation (e.g., supporting evidence, contrasting view, background information)
3. The authors' stance towards the cited work (agree, critique, extend, or neutral)
Format your response as a JSON with these keys: "contribution", "purpose", "stance"
"""
prompt_with_system = """You are a research assistant that analyzes academic citations.""" + prompt
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a research assistant that analyzes academic citations."},
{"role": "user", "content": prompt}
],
temperature=0.3,
max_tokens=500
)
# Extract and clean the response
result = response.choices[0].message.content.strip()
# Process the JSON response (in practice, add error handling here)
import json
try:
analysis = json.loads(result)
citation.update(analysis)
except json.JSONDecodeError:
# If response isn't valid JSON, extract info manually
citation.update({
"contribution": "Unable to parse analysis",
"purpose": "Unknown",
"stance": "Unknown",
"raw_analysis": result
})
return citation
except Exception as e:
logger.error(f"Error analyzing citation: {e}")
citation.update({
"contribution": f"Error during analysis: {str(e)}",
"purpose": "Error",
"stance": "Error"
})
return citation
def batch_analyze_citations(self, citations: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""
Analyze a batch of citations
Args:
citations: List of citation dictionaries
Returns:
List of dictionaries with analysis results
"""
results = []
total = len(citations)
for i, citation in enumerate(citations):
logger.info(f"Analyzing citation {i+1}/{total}: {citation['citation']}")
result = self.analyze_citation(citation)
results.append(result)
return results
GitHub Events
Total
- Push event: 9
- Create event: 2
Last Year
- Push event: 9
- Create event: 2