fyirp-realignqa

https://github.com/li3ara/fyirp-realignqa

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: LI3ARA
Language: Jupyter Notebook
Default Branch: main
Size: 42.5 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed 10 months ago

Metadata Files

Readme Citation

Multimodal RAG & VQA Research Repository - ReAlignQA

This repository explores usage of multimodal data(Images and Text) for question answering based on given text and image data.

Project Structure

```bash . ├── Notebooks/ # Experiments organized by date or topic │ ├── 24/ # Earlier experiments (LangChain-based RAG) │ └── 25/ │ ├── LayoutLM+RAG/ # Layout-aware document QA │ ├── VQA_RAG/ # LLaVA-based VQA pipelines <--- Main experiments │ └── ... # Other vision-based extraction tools ├── src/ # Core Python modules ├── Utils/ # Helper scripts, e.g., notebook outlining ├── notebooks/en/ # Cleaned examples for multimodal RAG ├── docs/ # Project documentation / UI ├── requirements.txt # Python dependencies └── .gitignore

```

Installation

Using a conda environment is reccommended conda create -n multimodalrag python=3.11 conda activate multimodalrag pip install -r requirements.txt Also configure your environment variables: - Create a .env file in the root directory. - Add any API keys for the models called under OpenAI wrapper.

.env example REMOTE_URL=your-remote\local-llm-url LOCAL_URL=your-remote\local-lmm-url API_KEY=your-api-key

Evaluation

Metrics Functions: see src/eval_metrics_utils.py
Evaluation notebooks:
- run_text_eval.ipynb
- run_vision_only_eval.ipynb
Outputs are saved to structured CSVs in ../Eval_outputs/SPIQA/vision_only/

Datasets Used

| Dataset | Purpose | | -------------- | ------------------------------------------------------- | | SPIQA | Visual QA dataset from structured documents | | PDF-VQA | PDF-based VQA task for layout-sensitive models | | VisDoMRAG | Extension for SPIQA dataset |

Models Used

| Model | Use | | ------------ | ----------------------------------------- | | LLaVA | Vision-language inference and fine-tuning | | LayoutLMv3 | OCR-aware document representation | | Mistral | Text-only RAG baseline | | LLaVA and Mistral | For RealignQA |

UI & Deployment

Gradio demo: src/multimodal_gradio_UI.ipynb
Colab scripts for LLaVA fine tuning and evaluation: VQA_RAG/Googl_colab/soft_prompting/

Cite the Work

Owner

Name: Lisara Gajaweera
Login: LI3ARA
Kind: user
Location: Sri Lanka

Repositories: 1
Profile: https://github.com/LI3ARA

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Gajaweera"
  given-names: "Lisara"
  orcid: "https://orcid.org/0009-0001-4251-2034"
title: "FYIRP-ReAlignQA"
version: 1.0.0
doi: 10.5281/zenodo.15471174
date-released: 2025-05-20
url: "https://github.com/LI3ARA/FYIRP-ReAlignQA"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science