ai-research-assistant

An AI-powered research assistant that answers academic questions from uploaded PDFs or links (arXiv, PubMed) and returns context-rich answers with citation support using LangChain, LLaMA 3 (Groq), and FAISS.

https://github.com/atifanawaz/ai-research-assistant

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

An AI-powered research assistant that answers academic questions from uploaded PDFs or links (arXiv, PubMed) and returns context-rich answers with citation support using LangChain, LLaMA 3 (Groq), and FAISS.

Basic Info
  • Host: GitHub
  • Owner: atifanawaz
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.8 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme Citation

README.md

AI Research Assistant with Accurate Citation Support

A fully featured, citation-aware AI research assistant powered by LangChain, Groq (LLaMA 3), and FAISS. It allows users to upload academic papers or link to online research (like arXiv or PubMed), ask questions, and receive grounded, page-referenced answers — with optional inline citation tags like [1].


Live Demo

Try it live here:
https://askmyresearch.streamlit.app/


Features

  • Upload research papers in PDF, DOCX, or TXT formats
  • Paste public links from arXiv or PubMed (auto-fetch and parsing supported)
  • Ask custom research questions related to uploaded papers or URLs
  • Get contextual, grounded answers using LLaMA 3 via Groq API
  • Automatically extract citations including:
    • Page number (Page X)
    • Matching content snippet
    • Source (file name or URL)
  • Injects inline citation tags like [1] if content from a document is used
  • Citations are only shown if the answer overlaps the source content
  • Works on any research domain — no fixed keywords or filters
  • Processes multiple files and URLs together

Technologies Used

  • Streamlit — for building the interactive web interface
  • LangChain — for orchestrating the retrieval-augmented generation (RAG) pipeline
  • Groq (LLaMA 3) — used for generating language model responses with high speed and accuracy
  • FAISS — for storing and retrieving semantic document chunks using vector similarity
  • Hugging Face Sentence Transformers — used for generating document embeddings (all-MiniLM-L6-v2)
  • PyMuPDF and docx2txt — for extracting text from PDF and DOCX files
  • tiktoken — used for token-aware chunking of long texts to fit LLM context
  • Regex Matching and String Inference — for inline citation injection based on content similarity

How It Works

  1. Upload Files or Paste Links

    • Supports PDF, DOCX, TXT, arXiv.org and PubMed URLs.
  2. Document Parsing and Chunking

    • Text is extracted and chunked intelligently using sentence boundaries and token-aware limits.
  3. Embedding and Vector Storage

    • Each chunk is embedded with all-MiniLM-L6-v2 and stored in FAISS.
  4. Question Answering with Citations

    • Your question is matched to relevant chunks using max marginal relevance.
    • Answer is generated with Groq's LLaMA 3 and cited only if source content overlaps.
  5. Citation Injection

    • Citation numbers like [1] are shown inline if document content matches.
    • A full citation summary is appended at the end (page number + snippet + source).

Setup Instructions

```bash

1. Clone the repo

git clone https://github.com/your-username/ai-research-assistant.git cd ai-research-assistant

2. Create virtual environment

python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows

3. Install dependencies

pip install -r requirements.txt

4. Set your API key in environment (or config.py)

export GROQAPIKEY=yourgroqapi_key

5. Run the app

streamlit run app.py

Owner

  • Name: Atifa
  • Login: atifanawaz
  • Kind: user

Hi there! I'm Atifa Nawaz, a CS graduate from BUKC (2026), aspiring and Python, Javascript, C++ & Java Developer Eager to learn new Skills.

GitHub Events

Total
  • Push event: 14
  • Create event: 2
Last Year
  • Push event: 14
  • Create event: 2

Dependencies

requirements.txt pypi
  • PyMuPDF *
  • docx2txt *
  • faiss-cpu *
  • langchain >=0.1.14
  • langchain-community *
  • langchain-groq *
  • pypdf *
  • requests *
  • sentence-transformers *
  • streamlit *
  • tf-keras *
  • tiktoken *
  • transformers *