bentoml
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
swarms
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
https://github.com/bytedance/ui-tars-desktop
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
awesome-mmps
Corpus of resources for multimodal machine learning with physiological signals (mmps).
https://github.com/rerun-io/rerun
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
autodistill
Images to inference with no labeling (use foundation models to train supervised models).
lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
ppdiffusers
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
marqo-fashionclip
State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.
lrv-instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
https://github.com/hci-lab-um/cactus
Constraint-free multi-modal Access to Communication Technology for Users with Severe motor impairments. This proposal pushes the state of the art through novel eye-tracking interaction patterns along with the introduction of secondary input modalities to improve throughput and usability.
https://github.com/alleninstitute/coupledae-patchseq
Multimodal data alignment and cell type analysis with coupled autoencoders.
https://github.com/predict-idlab/tsflex
Flexible time series feature extraction & processing
https://github.com/aisuko/notebooks
Implementation for the different ML tasks on Kaggle platform with GPUs.
flair-2
Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.
https://github.com/aehrc/imageclefmedical_caption_23
MedICap: Code for the participation of team CSIRO at the ImageCLEFmedical Caption task of 2023.
nemo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://github.com/buaadreamer/chinese-llava-med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
2d3mf
Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"
https://github.com/ai4healthuol/cardiolab
This is the official repository for CardioLab. A machine and deep learning framework for the estimation and monitoring of laboratory abnormalities throught ECG data.
https://github.com/ai4healthuol/mds-ed
Repository for the paper 'MDS-ED: Multimodal Decision Support in the Emergency Department – a benchmark dataset based on MIMIC-IV'.
sutd-trafficqa
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
https://github.com/amazon-science/mix-generation
MixGen: A New Multi-Modal Data Augmentation
asaca-automatic-speech-analysis-for-cognitive-assessment
The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.
xrayglm
🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.
https://github.com/autodistill/autodistill-kosmos-2
Kosmos-2 base model for use with Autodistill.
https://github.com/aehrc/cvt2distilgpt2
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
https://github.com/awslabs/guidance-for-multi-omics-and-multi-modal-data-integration-and-analysis-on-aws
This guidance creates a scalable environment in AWS to prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and perform interactive queries against a data lake. The solution also demonstrates the use of Amazon Omics for multi-modal analysis.
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
asaca-automatic-speech-analysis-for-cognitive-assessment
Transform speech into cognitive assessments with ASACA. Achieve accurate predictions and low error rates using our end-to-end toolkit. 🚀🔧
jamie
Joint variational Autoencoders for Multimodal Imputation and Embedding (JAMIE)
https://github.com/alleninstitute/biomolvec
Notebooks and scripts used for the Nautilex Hackathon