Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: kavya-sree-chandhi
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 4.93 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

🧠 AI Research Agent for Healthcare Diagnostics

This project is an intelligent research assistant that autonomously conducts research, synthesizes information, and produces a comprehensive report on a given research question.This project automates literature review and question-answering for medical diagnostics using AI. It focuses on domain-specific topics such as breast cancer and brain tumor detection, leveraging cutting-edge technologies like LLMs, vector stores, and academic/web scraping.

📌 Example Question: "How does AI help to detect brain tumors?"

🚀 Features

  • 🔍 Automatically generates sub-questions from user queries using an LLM
  • 🌐 Gathers information from:
    • Web sources (via DuckDuckGo/Google search)
    • Academic sources (arXiv and PubMed)
    • Local PDF files (/docs folder)
  • 📚 Performs document chunking, embedding, and vector search
  • 🧠 Synthesizes high-quality answers for each sub-question
  • 📝 Generates a final report with:
    • Findings (Finding 1, Finding 2...)
    • Executive Summary
  • 📄 Export options: PDF, Word (.docx), Markdown, and Text

🛠️ Tech Stack

| Component | Technology | |----------------|---------------------------| | UI | Streamlit | | LLM | Ollama + LLaMA3 | | Text Splitter | LangChain RecursiveCharacterTextSplitter | | Embedding | nomic-embed-text | | Vector DB | In-memory FAISS via LangChain | | PDF Parsing | PyMuPDF (fitz) | | Word/PDF Export| python-docx, fpdf |


Architecture diagram

Research_Agent_Architecture drawio

The system is designed in 4 layers, implemented as shown in the architecture diagram.

1️⃣ User Interface Implemented with Streamlit

Accepts user research topic

Displays real-time progress of research nodes

Shows final report and allows export

2️⃣ Orchestration Layer Implemented using LangGraph

Defines workflow as a graph of nodes and edges

Nodes:

📋 Planner Node: breaks down research topic into sub-questions

🔍 Information Gatherer Node: queries multiple sources

📝 Synthesis Node: organizes and drafts findings

✅ Verifier Node: fact-checks and assigns confidence scores

📄 Report Generator Node: formats final report, adds citations & summary

Handles state management, retries, and conditional flows

3️⃣ Data & Knowledge Sources 🌐 Web Search: DDGS (DuckDuckGo Search API)

📚 Academic Papers: PubMed, arXiv

📄 Local Documents: PDF/text parser (PyMuPDF)

4️⃣ LLM & Reasoning Powered by LLaMA (mistral:latest)

Responsible for planning, summarizing, verifying, and writing

📝 Setup & Installation

Prerequisites:

Python 3.10+

pip

virtualenv

Clone the repository

git clone https://github.com//intelligent-research-assistant.git

cd intelligent-research-assistant

Create virtual environment & activate

python -m venv venv

source venv/bin/activate # Linux/macOS

venv\Scripts\activate # Windows

Install dependencies

pip install -r requirements.txt

Run the application

streamlit run app.py

📷 Demo

https://github.com/user-attachments/assets/bfa77ba6-315c-433a-b86a-7ce84abc1b3c

Owner

  • Login: kavya-sree-chandhi
  • Kind: user

Citation (citations.py)

# agent/citations.py

def extract_sources_from_chunks(chunks):
    """
    Collects all unique sources (URLs, DOIs, arXiv IDs, etc.) from chunk metadata.
    Returns a list of unique sources.
    """
    sources = set()
    for chunk in chunks:
        # Standardize: your chunk metadata should include 'source'
        source = chunk.metadata.get('source') if hasattr(chunk, 'metadata') else None
        if source:
            sources.add(source)
    return list(sources)

def render_citations(sources):
    """
    Formats a list of sources as a citation section (Markdown or plain text).
    """
    if not sources:
        return "No citations found."
    out = "## References\n"
    for idx, source in enumerate(sources, 1):
        out += f"[{idx}]: {source}\n"
    return out


# # Example usage
# if __name__ == "__main__":
#     # Fake chunks with metadata for demonstration
#     class DummyChunk:
#         def __init__(self, meta):
#             self.metadata = meta
#     chunks = [DummyChunk({'source': 'https://arxiv.org/abs/2007.07892'}),
#               DummyChunk({'source': 'https://pubmed.ncbi.nlm.nih.gov/123456/'}),
#               DummyChunk({'source': 'https://arxiv.org/abs/2007.07892'})]  # duplicate
#     sources = extract_sources_from_chunks(chunks)
#     print(render_citations(sources))

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • arxiv *
  • beautifulsoup4 *
  • chromadb *
  • ddgs *
  • fpdf *
  • langchain *
  • langchain-chroma *
  • langchain-ollama *
  • langchain-text-splitters *
  • langgraph *
  • newspaper3k *
  • ollama *
  • pubmed *
  • pymupdf *
  • pytest *
  • python-docx *
  • python-dotenv *
  • requests *
  • streamlit *
  • tqdm *