ai-research-assistant

An AI-powered research assistant that answers academic questions from uploaded PDFs or links (arXiv, PubMed) and returns context-rich answers with citation support using LangChain, LLaMA 3 (Groq), and FAISS.

https://github.com/atifanawaz/ai-research-assistant

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: atifanawaz
Language: Python
Default Branch: main
Homepage:
Size: 1.8 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 11 months ago · Last pushed 11 months ago

Metadata Files

Readme Citation

README.md

AI Research Assistant with Accurate Citation Support

A fully featured, citation-aware AI research assistant powered by LangChain, Groq (LLaMA 3), and FAISS. It allows users to upload academic papers or link to online research (like arXiv or PubMed), ask questions, and receive grounded, page-referenced answers — with optional inline citation tags like [1].

Live Demo

Try it live here:
https://askmyresearch.streamlit.app/

Features

Upload research papers in PDF, DOCX, or TXT formats
Paste public links from arXiv or PubMed (auto-fetch and parsing supported)
Ask custom research questions related to uploaded papers or URLs
Get contextual, grounded answers using LLaMA 3 via Groq API
Automatically extract citations including:
- Page number (Page X)
- Matching content snippet
- Source (file name or URL)
Injects inline citation tags like [1] if content from a document is used
Citations are only shown if the answer overlaps the source content
Works on any research domain — no fixed keywords or filters
Processes multiple files and URLs together

Technologies Used

Streamlit — for building the interactive web interface
LangChain — for orchestrating the retrieval-augmented generation (RAG) pipeline
Groq (LLaMA 3) — used for generating language model responses with high speed and accuracy
FAISS — for storing and retrieving semantic document chunks using vector similarity
Hugging Face Sentence Transformers — used for generating document embeddings (all-MiniLM-L6-v2)
PyMuPDF and docx2txt — for extracting text from PDF and DOCX files
tiktoken — used for token-aware chunking of long texts to fit LLM context
Regex Matching and String Inference — for inline citation injection based on content similarity

How It Works

Upload Files or Paste Links
- Supports PDF, DOCX, TXT, arXiv.org and PubMed URLs.
Document Parsing and Chunking
- Text is extracted and chunked intelligently using sentence boundaries and token-aware limits.
Embedding and Vector Storage
- Each chunk is embedded with all-MiniLM-L6-v2 and stored in FAISS.
Question Answering with Citations
- Your question is matched to relevant chunks using max marginal relevance.
- Answer is generated with Groq's LLaMA 3 and cited only if source content overlaps.
Citation Injection
- Citation numbers like [1] are shown inline if document content matches.
- A full citation summary is appended at the end (page number + snippet + source).

Setup Instructions

```bash

1. Clone the repo

git clone https://github.com/your-username/ai-research-assistant.git cd ai-research-assistant

2. Create virtual environment

python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows

3. Install dependencies

pip install -r requirements.txt

4. Set your API key in environment (or config.py)

export GROQAPIKEY=yourgroqapi_key

5. Run the app

streamlit run app.py

Owner

Name: Atifa
Login: atifanawaz
Kind: user

Repositories: 1
Profile: https://github.com/atifanawaz

Hi there! I'm Atifa Nawaz, a CS graduate from BUKC (2026), aspiring and Python, Javascript, C++ & Java Developer Eager to learn new Skills.

GitHub Events

Total

Push event: 14
Create event: 2

Last Year

Push event: 14
Create event: 2

Dependencies

requirements.txt pypi

PyMuPDF *
docx2txt *
faiss-cpu *
langchain >=0.1.14
langchain-community *
langchain-groq *
pypdf *
requests *
sentence-transformers *
streamlit *
tf-keras *
tiktoken *
transformers *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science