fyirp-realignqa
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: LI3ARA
- Language: Jupyter Notebook
- Default Branch: main
- Size: 42.5 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Multimodal RAG & VQA Research Repository - ReAlignQA
This repository explores usage of multimodal data(Images and Text) for question answering based on given text and image data.
Project Structure
```bash . ├── Notebooks/ # Experiments organized by date or topic │ ├── 24/ # Earlier experiments (LangChain-based RAG) │ └── 25/ │ ├── LayoutLM+RAG/ # Layout-aware document QA │ ├── VQA_RAG/ # LLaVA-based VQA pipelines <--- Main experiments │ └── ... # Other vision-based extraction tools ├── src/ # Core Python modules ├── Utils/ # Helper scripts, e.g., notebook outlining ├── notebooks/en/ # Cleaned examples for multimodal RAG ├── docs/ # Project documentation / UI ├── requirements.txt # Python dependencies └── .gitignore
```
Installation
Using a conda environment is reccommended
conda create -n multimodalrag python=3.11
conda activate multimodalrag
pip install -r requirements.txt
Also configure your environment variables:
- Create a .env file in the root directory.
- Add any API keys for the models called under OpenAI wrapper.
.env example
REMOTE_URL=your-remote\local-llm-url
LOCAL_URL=your-remote\local-lmm-url
API_KEY=your-api-key
Evaluation
- Metrics Functions: see
src/eval_metrics_utils.py - Evaluation notebooks:
run_text_eval.ipynbrun_vision_only_eval.ipynb
- Outputs are saved to structured CSVs in
../Eval_outputs/SPIQA/vision_only/
Datasets Used
| Dataset | Purpose | | -------------- | ------------------------------------------------------- | | SPIQA | Visual QA dataset from structured documents | | PDF-VQA | PDF-based VQA task for layout-sensitive models | | VisDoMRAG | Extension for SPIQA dataset |
Models Used
| Model | Use |
| ------------ | ----------------------------------------- |
| LLaVA | Vision-language inference and fine-tuning |
| LayoutLMv3 | OCR-aware document representation |
| Mistral | Text-only RAG baseline |
| LLaVA and Mistral | For RealignQA |
UI & Deployment
- Gradio demo:
src/multimodal_gradio_UI.ipynb - Colab scripts for LLaVA fine tuning and evaluation:
VQA_RAG/Googl_colab/soft_prompting/
Cite the Work
Owner
- Name: Lisara Gajaweera
- Login: LI3ARA
- Kind: user
- Location: Sri Lanka
- Repositories: 1
- Profile: https://github.com/LI3ARA
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Gajaweera" given-names: "Lisara" orcid: "https://orcid.org/0009-0001-4251-2034" title: "FYIRP-ReAlignQA" version: 1.0.0 doi: 10.5281/zenodo.15471174 date-released: 2025-05-20 url: "https://github.com/LI3ARA/FYIRP-ReAlignQA"
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2