AI_Research-Agent-for-Healthcare-Diagnostics

https://github.com/kavya-sree-chandhi/AI_Research-Agent-for-Healthcare-Diagnostics

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: kavya-sree-chandhi
License: mit
Language: Python
Default Branch: main
Size: 4.93 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 7 months ago · Last pushed 7 months ago

Metadata Files

Readme License Citation

🧠 AI Research Agent for Healthcare Diagnostics

This project is an intelligent research assistant that autonomously conducts research, synthesizes information, and produces a comprehensive report on a given research question.This project automates literature review and question-answering for medical diagnostics using AI. It focuses on domain-specific topics such as breast cancer and brain tumor detection, leveraging cutting-edge technologies like LLMs, vector stores, and academic/web scraping.

📌 Example Question: "How does AI help to detect brain tumors?"

🚀 Features

🔍 Automatically generates sub-questions from user queries using an LLM
🌐 Gathers information from:
- Web sources (via DuckDuckGo/Google search)
- Academic sources (arXiv and PubMed)
- Local PDF files (/docs folder)
📚 Performs document chunking, embedding, and vector search
🧠 Synthesizes high-quality answers for each sub-question
📝 Generates a final report with:
- Findings (Finding 1, Finding 2...)
- Executive Summary
📄 Export options: PDF, Word (.docx), Markdown, and Text

🛠️ Tech Stack

| Component | Technology | |----------------|---------------------------| | UI | Streamlit | | LLM | Ollama + LLaMA3 | | Text Splitter | LangChain RecursiveCharacterTextSplitter | | Embedding | nomic-embed-text | | Vector DB | In-memory FAISS via LangChain | | PDF Parsing | PyMuPDF (fitz) | | Word/PDF Export| python-docx, fpdf |

Architecture diagram

Research_Agent_Architecture drawio

The system is designed in 4 layers, implemented as shown in the architecture diagram.

1️⃣ User Interface Implemented with Streamlit

Accepts user research topic

Displays real-time progress of research nodes

Shows final report and allows export

2️⃣ Orchestration Layer Implemented using LangGraph

Defines workflow as a graph of nodes and edges

Nodes:

📋 Planner Node: breaks down research topic into sub-questions

🔍 Information Gatherer Node: queries multiple sources

📝 Synthesis Node: organizes and drafts findings

✅ Verifier Node: fact-checks and assigns confidence scores

📄 Report Generator Node: formats final report, adds citations & summary

Handles state management, retries, and conditional flows

3️⃣ Data & Knowledge Sources 🌐 Web Search: DDGS (DuckDuckGo Search API)

📚 Academic Papers: PubMed, arXiv

📄 Local Documents: PDF/text parser (PyMuPDF)

4️⃣ LLM & Reasoning Powered by LLaMA (mistral:latest)

Responsible for planning, summarizing, verifying, and writing

📝 Setup & Installation

Prerequisites:

Python 3.10+

pip

virtualenv

Clone the repository

git clone https://github.com//intelligent-research-assistant.git

cd intelligent-research-assistant

Create virtual environment & activate

python -m venv venv

source venv/bin/activate # Linux/macOS

venv\Scripts\activate # Windows

Install dependencies

pip install -r requirements.txt

Run the application

streamlit run app.py

📷 Demo

https://github.com/user-attachments/assets/bfa77ba6-315c-433a-b86a-7ce84abc1b3c

Owner

Login: kavya-sree-chandhi
Kind: user

Repositories: 1
Profile: https://github.com/kavya-sree-chandhi

Citation (citations.py)

# agent/citations.py

def extract_sources_from_chunks(chunks):
    """
    Collects all unique sources (URLs, DOIs, arXiv IDs, etc.) from chunk metadata.
    Returns a list of unique sources.
    """
    sources = set()
    for chunk in chunks:
        # Standardize: your chunk metadata should include 'source'
        source = chunk.metadata.get('source') if hasattr(chunk, 'metadata') else None
        if source:
            sources.add(source)
    return list(sources)

def render_citations(sources):
    """
    Formats a list of sources as a citation section (Markdown or plain text).
    """
    if not sources:
        return "No citations found."
    out = "## References\n"
    for idx, source in enumerate(sources, 1):
        out += f"[{idx}]: {source}\n"
    return out


# # Example usage
# if __name__ == "__main__":
#     # Fake chunks with metadata for demonstration
#     class DummyChunk:
#         def __init__(self, meta):
#             self.metadata = meta
#     chunks = [DummyChunk({'source': 'https://arxiv.org/abs/2007.07892'}),
#               DummyChunk({'source': 'https://pubmed.ncbi.nlm.nih.gov/123456/'}),
#               DummyChunk({'source': 'https://arxiv.org/abs/2007.07892'})]  # duplicate
#     sources = extract_sources_from_chunks(chunks)
#     print(render_citations(sources))

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

arxiv *
beautifulsoup4 *
chromadb *
ddgs *
fpdf *
langchain *
langchain-chroma *
langchain-ollama *
langchain-text-splitters *
langgraph *
newspaper3k *
ollama *
pubmed *
pymupdf *
pytest *
python-docx *
python-dotenv *
requests *
streamlit *
tqdm *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science