https://github.com/ammarlodhi255/chest-xray-report-generation-app-with-chatbot-end-to-end-implementation

AI-powered Chest X-ray report generation app using VLM (Swin-T5) and LLM (LLaMA-3) for multilingual Q&A and medical education support.

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary

Keywords

chest-xray data-science-projects deep-learning-project deep-learning-projects end-to-end-machine-learning end-to-end-project huggingface large-language-models llm machine-learning-projects medical-imaging medical-report-generation pytorch radiology radiology-report-generation vision-language-model vlm

Last synced: 5 months ago · JSON representation

Repository

AI-powered Chest X-ray report generation app using VLM (Swin-T5) and LLM (LLaMA-3) for multilingual Q&A and medical education support.

Basic Info

Host: GitHub
Owner: ammarlodhi255
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 25.2 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Topics

Created 11 months ago · Last pushed 9 months ago

Metadata Files

Readme

README.md

🩺 Chest X-ray Report Generation via Vision-Language Models

A modular monolithic web application that generates radiology-style reports from chest X-ray images using Vision-Language Models (VLMs) and supports multilingual, contextual question-answering via Large Language Models (LLMs).

Interface Screenshot (Dark Mode) Interface Screenshot (Light Mode) Notes Zoom Functionality

Overview

This project combines computer vision and natural language understanding to assist medical students and practitioners in interpreting chest X-rays. Users can:

Upload chest X-ray images.
Automatically generate medical-style reports using Swin-T5.
Ask contextual questions about the report.
Receive multilingual explanations (e.g., Hindi, Urdu, Norwegian).
Take structured notes as a student or educator.

Models and Data

VLMs used in this project are BLIP, Swin-BART, and Swin-T5
LLM used in this project is LLaMA3-8B Instruct (https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
Dataset used is called "CheXpert Plus". The first chunk of size 155GB is used (https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1)

- The weights of the best performing model (Swin +T5) can be found here (https://studntnu-my.sharepoint.com/personal/aleksacentnuno/layouts/15/onedrive.aspx?id=/personal/aleksacentnuno/Documents/InnovationProject/swin-t5-model.pth&parent=/personal/aleksacentnu_no/Documents/InnovationProject&ga=1)

Features

🔍 Vision-Language Report Generation (Swin-T5, Swin-BART, BLIP)
💬 Interactive Chatbot (LLaMA-3.1) with multilingual responses
🖼️ Zoomable image preview
📝 Note-taking section for medical education
🌗 Dark/Light mode toggle
🧪 ROUGE-1 metric evaluation
🔐 No external API dependencies (except Hugging Face for model access)

Technology Stack

| Layer | Technology | |--------------|-------------------------------------| | Backend | Python, Flask, PyTorch, Hugging Face Transformers | | Frontend | HTML5, CSS3, JavaScript, Bootstrap | | Deep Learning | Swin-T5, LLaMA-3, BLIP, Torchvision | | Deployment | Docker, NVIDIA CUDA, Git, GitHub | | Development | VS Code |

Application Architecture

This is a modular monolithic application organized into the following components:

app.py: Main Flask entry point
vlm_utils.py: Vision-Language Model loading and inference
chat_utils.py: LLM-based contextual question answering
preprocess.py: Image transformations and metadata extraction
templates/: Jinja2 HTML files (frontend)
static/: CSS, JS, and assets

Getting Started

Prerequisites

Python 3.9+
CUDA-enabled GPU (recommended)
Docker (optional for containerized setup)

Setup Instructions

```bash

1. Clone the repository

git clone https://github.com/ammarlodhi255/Chest-xray-report-generation-app-using-VLM-and-LLM.git cd Chest-xray-report-generation-app-using-VLM-and-LLM

2. Create virtual environment

python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows

3. Install dependencies

pip install -r requirements.txt

4. (Optional) Load HF Token for private LLaMA access

export HFTOKEN=yourtoken_here

5. Running the App

python app.py

Then visit: http://127.0.0.1:5000 ```

LLM Interactions

Initialization Hindi Urdu Norwegian

Owner

Name: Ammar Ahmed
Login: ammarlodhi255
Kind: user
Location: Sukkur, Pakistan

Website: https://www.youtube.com/channel/UCAh8QVO85NLQGj_RhYoTU1w/videos
Repositories: 9
Profile: https://github.com/ammarlodhi255

A computer scientist at heart, interested in AI, software development, and space.

GitHub Events

Total

Issues event: 1
Watch event: 4
Push event: 2
Fork event: 1

Last Year

Issues event: 1
Watch event: 4
Push event: 2
Fork event: 1

Dependencies

requirements.txt pypi

Flask >=2.0
Pillow >=8.0
python-dotenv *
sentencepiece >=0.1.90
torch >=1.8
torchvision >=0.9
tqdm *
transformers >=4.10

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/ammarlodhi255/chest-xray-report-generation-app-with-chatbot-end-to-end-implementation

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

🩺 Chest X-ray Report Generation via Vision-Language Models

Overview

Models and Data

- The weights of the best performing model (Swin +T5) can be found here (https://studntnu-my.sharepoint.com/personal/aleksacentnuno/layouts/15/onedrive.aspx?id=/personal/aleksacentnuno/Documents/InnovationProject/swin-t5-model.pth&parent=/personal/aleksacentnu_no/Documents/InnovationProject&ga=1)

Features

Technology Stack

Application Architecture

Getting Started

Prerequisites

Setup Instructions

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. (Optional) Load HF Token for private LLaMA access

5. Running the App

LLM Interactions

Owner

GitHub Events

Total

Last Year

Dependencies