deep_research_final

https://github.com/saiteja12-g/deep_research_final

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: saiteja12-g
Language: Python
Default Branch: main
Size: 309 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

README.md

Research Paper Assistant

A comprehensive system for extracting, analyzing, and generating review papers from academic research using AI.

Overview

This application provides a complete workflow for academic research paper processing:

Research Paper Extraction: Automatically fetch papers from arXiv based on your query and follow citation networks
Knowledge Base Integration: Load extracted papers into a Neo4j graph database and vector database
Contextual Analysis: Process papers to extract key themes, methodologies, strengths, and limitations
Review Paper Generation: Generate comprehensive review papers using AI agents

Features

Intelligent Paper Discovery: BFS traversal of citation networks starting from initial query results
Graph-based Knowledge Representation: Store papers and their relationships in Neo4j
Semantic Search: Find related papers using vector embeddings in ChromaDB
Image Processing: Extract and analyze figures from research papers
Citation Mapping: Identify and map citations between papers
AI-Powered Review Generation: Generate structured review papers with proper citations using LLM agents
Interactive UI: Streamlit-based frontend for easy interaction

Setup and Installation

Prerequisites

Python 3.9+
Docker (for running Neo4j)
OpenAI API key

Installation

Clone this repository: bash git clone <your-repository-url> cd <repository-directory>
Create a virtual environment (optional but recommended): bash python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
Install the required Python packages: bash pip install -r requirements.txt
Create a .env file in the project root with your OpenAI API key: OPENAI_API_KEY=your-api-key-here

Running the App

Using the Workflow Manager (Recommended)

The workflow.py script provides a simplified way to run the complete pipeline or individual steps:

Run the complete workflow: bash python workflow.py --query "Single image to 3D" --full-workflow
Extract papers only: bash python workflow.py --query "Single image to 3D" --extract-only --max-depth 2 --max-papers 5
Load extracted papers to database: bash python workflow.py --load-only
Generate a review paper: bash python workflow.py --query "Single image to 3D" --generate-review
Continue a previously started paper: bash python workflow.py --generate-review --continue

Using the Streamlit UI

Start the Streamlit app: bash streamlit run frontend.py
Open your browser and navigate to http://localhost:8501
In the Streamlit interface:
- Configure your environment settings
- Start the Neo4j database
- Run paper extraction, loading, or review generation processes

Running Components Individually

Alternatively, you can run each component separately:

Start Neo4j (required for knowledge storage): bash docker run -p 7474:7474 -p 7687:7687 --env NEO4J_AUTH=neo4j/research123 neo4j:latest
Extract papers from arXiv: bash python papers_extractor_bfs.py
Process and load papers into knowledge base: bash python knowledge_base.py
Generate a review paper: bash python main.py --query "Your research topic"

Running with Docker (Not Tested)

The application includes Docker support for easy deployment:

Build the Docker image: bash docker build -t research-paper-assistant .
Run using docker-compose (handles both the app and Neo4j): bash docker-compose up
Access the Streamlit interface at http://localhost:8501

Project Structure

frontend.py: Streamlit application
workflow.py: Complete workflow manager
papers_extractor_bfs.py: ArXiv paper extraction with BFS traversal
knowledge_base.py: Database and knowledge storage integrations
citation_mapper.py: Handles paper citations
processing_pipeline.py: Text and image processing
review_writer.py: AI-powered paper generation
main.py: Command-line interface for review generation

Folder Structure

/papers - Downloaded PDF files
/papers_summary - Extracted metadata in JSON format
/output - Generated review papers and figures
/chroma_db - Vector embeddings database
/neo4j - Graph database files

Troubleshooting

Docker Issues: Ensure Docker is running and you have permission to create containers
API Rate Limits: If you encounter OpenAI API rate limits, add waiting periods or implement retries
Memory Issues: Reduce batch sizes in the extraction and processing pipelines for lower memory usage
Neo4j Connection: Ensure the Neo4j container is running before running knowledge base operations

Video Demo

Flowcharts

Model Architecture

Agent Workflow

Click the image above to watch the demo video of the Research Paper Assistant in action.

Acknowledgements

ArXiv API for paper access
OpenAI for natural language processing
Neo4j for graph database functionality
ChromaDB for vector database
Streamlit for the user interface

Owner

Name: SAI TEJA GILUKARA
Login: saiteja12-g
Kind: user

Repositories: 1
Profile: https://github.com/saiteja12-g

Coding maniac

GitHub Events

Total

Push event: 20
Create event: 1

Last Year

Push event: 20
Create event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science