vidore-benchmark
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
colpali-engine
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
https://github.com/altunenes/calcarine
Desktop VLM: Real-time FastVLM analysis of video & textures with live compute shaders
https://github.com/ammarlodhi255/chest-xray-report-generation-app-with-chatbot-end-to-end-implementation
AI-powered Chest X-ray report generation app using VLM (Swin-T5) and LLM (LLaMA-3) for multilingual Q&A and medical education support.
https://github.com/astrazeneca/vlm
Official implementation for "Diffusion Instruction Tuning"
urban-worm
Urban-Worm is a Python library that integrates remote sensing imagery, street view data, and multimodal model to assess environments and urban units
awesome-robotics-3d
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
spatialfusion-lm
SpatialFusion-LM is a real-time spatial reasoning framework that combines neural depth, 3D reconstruction, and language-driven scene understanding.