https://github.com/ammarlodhi255/chest-xray-report-generation-app-with-chatbot-end-to-end-implementation

AI-powered Chest X-ray report generation app using VLM (Swin-T5) and LLM (LLaMA-3) for multilingual Q&A and medical education support.

https://github.com/ammarlodhi255/chest-xray-report-generation-app-with-chatbot-end-to-end-implementation

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary

Keywords

chest-xray data-science-projects deep-learning-project deep-learning-projects end-to-end-machine-learning end-to-end-project huggingface large-language-models llm machine-learning-projects medical-imaging medical-report-generation pytorch radiology radiology-report-generation vision-language-model vlm
Last synced: 5 months ago · JSON representation

Repository

AI-powered Chest X-ray report generation app using VLM (Swin-T5) and LLM (LLaMA-3) for multilingual Q&A and medical education support.

Basic Info
  • Host: GitHub
  • Owner: ammarlodhi255
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 25.2 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
chest-xray data-science-projects deep-learning-project deep-learning-projects end-to-end-machine-learning end-to-end-project huggingface large-language-models llm machine-learning-projects medical-imaging medical-report-generation pytorch radiology radiology-report-generation vision-language-model vlm
Created 11 months ago · Last pushed 9 months ago
Metadata Files
Readme

README.md

🩺 Chest X-ray Report Generation via Vision-Language Models

A modular monolithic web application that generates radiology-style reports from chest X-ray images using Vision-Language Models (VLMs) and supports multilingual, contextual question-answering via Large Language Models (LLMs).

Interface Screenshot (Dark Mode) Interface Screenshot (Light Mode) Notes Zoom Functionality


Overview

This project combines computer vision and natural language understanding to assist medical students and practitioners in interpreting chest X-rays. Users can:

  • Upload chest X-ray images.
  • Automatically generate medical-style reports using Swin-T5.
  • Ask contextual questions about the report.
  • Receive multilingual explanations (e.g., Hindi, Urdu, Norwegian).
  • Take structured notes as a student or educator.

Models and Data

  • VLMs used in this project are BLIP, Swin-BART, and Swin-T5
  • LLM used in this project is LLaMA3-8B Instruct (https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
  • Dataset used is called "CheXpert Plus". The first chunk of size 155GB is used (https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1)

- The weights of the best performing model (Swin +T5) can be found here (https://studntnu-my.sharepoint.com/personal/aleksacentnuno/layouts/15/onedrive.aspx?id=/personal/aleksacentnuno/Documents/InnovationProject/swin-t5-model.pth&parent=/personal/aleksacentnu_no/Documents/InnovationProject&ga=1)

Features

  • 🔍 Vision-Language Report Generation (Swin-T5, Swin-BART, BLIP)
  • 💬 Interactive Chatbot (LLaMA-3.1) with multilingual responses
  • 🖼️ Zoomable image preview
  • 📝 Note-taking section for medical education
  • 🌗 Dark/Light mode toggle
  • 🧪 ROUGE-1 metric evaluation
  • 🔐 No external API dependencies (except Hugging Face for model access)

Technology Stack

| Layer | Technology | |--------------|-------------------------------------| | Backend | Python, Flask, PyTorch, Hugging Face Transformers | | Frontend | HTML5, CSS3, JavaScript, Bootstrap | | Deep Learning | Swin-T5, LLaMA-3, BLIP, Torchvision | | Deployment | Docker, NVIDIA CUDA, Git, GitHub | | Development | VS Code |


Application Architecture

This is a modular monolithic application organized into the following components:

  • app.py: Main Flask entry point
  • vlm_utils.py: Vision-Language Model loading and inference
  • chat_utils.py: LLM-based contextual question answering
  • preprocess.py: Image transformations and metadata extraction
  • templates/: Jinja2 HTML files (frontend)
  • static/: CSS, JS, and assets

Getting Started

Prerequisites

  • Python 3.9+
  • CUDA-enabled GPU (recommended)
  • Docker (optional for containerized setup)

Setup Instructions

```bash

1. Clone the repository

git clone https://github.com/ammarlodhi255/Chest-xray-report-generation-app-using-VLM-and-LLM.git cd Chest-xray-report-generation-app-using-VLM-and-LLM

2. Create virtual environment

python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows

3. Install dependencies

pip install -r requirements.txt

4. (Optional) Load HF Token for private LLaMA access

export HFTOKEN=yourtoken_here

5. Running the App

python app.py

Then visit: http://127.0.0.1:5000 ```

LLM Interactions

Initialization Hindi Urdu Norwegian

Owner

  • Name: Ammar Ahmed
  • Login: ammarlodhi255
  • Kind: user
  • Location: Sukkur, Pakistan

A computer scientist at heart, interested in AI, software development, and space.

GitHub Events

Total
  • Issues event: 1
  • Watch event: 4
  • Push event: 2
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 4
  • Push event: 2
  • Fork event: 1

Dependencies

requirements.txt pypi
  • Flask >=2.0
  • Pillow >=8.0
  • python-dotenv *
  • sentencepiece >=0.1.90
  • torch >=1.8
  • torchvision >=0.9
  • tqdm *
  • transformers >=4.10