AI_Research-Agent-for-Healthcare-Diagnostics
https://github.com/kavya-sree-chandhi/AI_Research-Agent-for-Healthcare-Diagnostics
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: kavya-sree-chandhi
- License: mit
- Language: Python
- Default Branch: main
- Size: 4.93 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🧠 AI Research Agent for Healthcare Diagnostics
This project is an intelligent research assistant that autonomously conducts research, synthesizes information, and produces a comprehensive report on a given research question.This project automates literature review and question-answering for medical diagnostics using AI. It focuses on domain-specific topics such as breast cancer and brain tumor detection, leveraging cutting-edge technologies like LLMs, vector stores, and academic/web scraping.
📌 Example Question: "How does AI help to detect brain tumors?"
🚀 Features
- 🔍 Automatically generates sub-questions from user queries using an LLM
- 🌐 Gathers information from:
- Web sources (via DuckDuckGo/Google search)
- Academic sources (arXiv and PubMed)
- Local PDF files (
/docsfolder)
- 📚 Performs document chunking, embedding, and vector search
- 🧠 Synthesizes high-quality answers for each sub-question
- 📝 Generates a final report with:
- Findings (Finding 1, Finding 2...)
- Executive Summary
- 📄 Export options: PDF, Word (.docx), Markdown, and Text
🛠️ Tech Stack
| Component | Technology |
|----------------|---------------------------|
| UI | Streamlit |
| LLM | Ollama + LLaMA3 |
| Text Splitter | LangChain RecursiveCharacterTextSplitter |
| Embedding | nomic-embed-text |
| Vector DB | In-memory FAISS via LangChain |
| PDF Parsing | PyMuPDF (fitz) |
| Word/PDF Export| python-docx, fpdf |
Architecture diagram
The system is designed in 4 layers, implemented as shown in the architecture diagram.
1️⃣ User Interface Implemented with Streamlit
Accepts user research topic
Displays real-time progress of research nodes
Shows final report and allows export
2️⃣ Orchestration Layer Implemented using LangGraph
Defines workflow as a graph of nodes and edges
Nodes:
📋 Planner Node: breaks down research topic into sub-questions
🔍 Information Gatherer Node: queries multiple sources
📝 Synthesis Node: organizes and drafts findings
✅ Verifier Node: fact-checks and assigns confidence scores
📄 Report Generator Node: formats final report, adds citations & summary
Handles state management, retries, and conditional flows
3️⃣ Data & Knowledge Sources 🌐 Web Search: DDGS (DuckDuckGo Search API)
📚 Academic Papers: PubMed, arXiv
📄 Local Documents: PDF/text parser (PyMuPDF)
4️⃣ LLM & Reasoning Powered by LLaMA (mistral:latest)
Responsible for planning, summarizing, verifying, and writing
📝 Setup & Installation
Prerequisites:
Python 3.10+
pip
virtualenv
Clone the repository
git clone https://github.com/
cd intelligent-research-assistant
Create virtual environment & activate
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
Install dependencies
pip install -r requirements.txt
Run the application
streamlit run app.py
📷 Demo
https://github.com/user-attachments/assets/bfa77ba6-315c-433a-b86a-7ce84abc1b3c
Owner
- Login: kavya-sree-chandhi
- Kind: user
- Repositories: 1
- Profile: https://github.com/kavya-sree-chandhi
Citation (citations.py)
# agent/citations.py
def extract_sources_from_chunks(chunks):
"""
Collects all unique sources (URLs, DOIs, arXiv IDs, etc.) from chunk metadata.
Returns a list of unique sources.
"""
sources = set()
for chunk in chunks:
# Standardize: your chunk metadata should include 'source'
source = chunk.metadata.get('source') if hasattr(chunk, 'metadata') else None
if source:
sources.add(source)
return list(sources)
def render_citations(sources):
"""
Formats a list of sources as a citation section (Markdown or plain text).
"""
if not sources:
return "No citations found."
out = "## References\n"
for idx, source in enumerate(sources, 1):
out += f"[{idx}]: {source}\n"
return out
# # Example usage
# if __name__ == "__main__":
# # Fake chunks with metadata for demonstration
# class DummyChunk:
# def __init__(self, meta):
# self.metadata = meta
# chunks = [DummyChunk({'source': 'https://arxiv.org/abs/2007.07892'}),
# DummyChunk({'source': 'https://pubmed.ncbi.nlm.nih.gov/123456/'}),
# DummyChunk({'source': 'https://arxiv.org/abs/2007.07892'})] # duplicate
# sources = extract_sources_from_chunks(chunks)
# print(render_citations(sources))
GitHub Events
Total
Last Year
Dependencies
- arxiv *
- beautifulsoup4 *
- chromadb *
- ddgs *
- fpdf *
- langchain *
- langchain-chroma *
- langchain-ollama *
- langchain-text-splitters *
- langgraph *
- newspaper3k *
- ollama *
- pubmed *
- pymupdf *
- pytest *
- python-docx *
- python-dotenv *
- requests *
- streamlit *
- tqdm *