Updated 9 months ago

bentoml • Rank 26.1 • Science 54%

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Updated 9 months ago

uform • Rank 15.6 • Science 64%

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Updated 9 months ago

https://github.com/bytedance/ui-tars-desktop • Rank 26.3 • Science 46%

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

Updated 9 months ago

https://github.com/rerun-io/rerun • Rank 31.1 • Science 36%

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

Updated 9 months ago

maestro • Rank 10.5 • Science 54%

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Updated 9 months ago

lmms-finetune • Rank 7.1 • Science 54%

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Updated 9 months ago

ppdiffusers • Rank 19.3 • Science 36%

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated 9 months ago

llama_ros • Rank 7.2 • Science 44%

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

Updated 9 months ago

marqo-fashionclip • Rank 6.4 • Science 44%

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

Updated 8 months ago

https://github.com/hci-lab-um/cactus • Rank 5.4 • Science 36%

Constraint-free multi-modal Access to Communication Technology for Users with Severe motor impairments. This proposal pushes the state of the art through novel eye-tracking interaction patterns along with the introduction of secondary input modalities to improve throughput and usability.

Updated 9 months ago

https://github.com/alleninstitute/coupledae-patchseq • Rank 2.4 • Science 36%

Multimodal data alignment and cell type analysis with coupled autoencoders.

Updated 9 months ago

flair-2 • Rank 3.5 • Science 26%

Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.

Updated 9 months ago

visualwebarena • Science 36%

VisualWebArena is a benchmark for multimodal agents.

Updated 9 months ago

https://github.com/aehrc/imageclefmedical_caption_23 • Science 13%

MedICap: Code for the participation of team CSIRO at the ImageCLEFmedical Caption task of 2023.

Updated 9 months ago

nemo • Science 13%

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Updated 9 months ago

https://github.com/buaadreamer/chinese-llava-med • Science 13%

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

Updated 9 months ago

2d3mf • Science 36%

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"

Updated 9 months ago

https://github.com/ai4healthuol/cardiolab • Science 36%

This is the official repository for CardioLab. A machine and deep learning framework for the estimation and monitoring of laboratory abnormalities throught ECG data.

Updated 9 months ago

https://github.com/ai4healthuol/mds-ed • Science 49%

Repository for the paper 'MDS-ED: Multimodal Decision Support in the Emergency Department – a benchmark dataset based on MIMIC-IV'.

Updated 9 months ago

sutd-trafficqa • Science 41%

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Updated 9 months ago

asaca-automatic-speech-analysis-for-cognitive-assessment • Science 44%

The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.

Updated 9 months ago

xrayglm • Science 54%

🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.

Updated 9 months ago

https://github.com/awslabs/guidance-for-multi-omics-and-multi-modal-data-integration-and-analysis-on-aws • Science 26%

This guidance creates a scalable environment in AWS to prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and perform interactive queries against a data lake. The solution also demonstrates the use of Amazon Omics for multi-modal analysis.

Updated 9 months ago

NeMo • Science 44%

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Updated 9 months ago

jamie • Science 36%

Joint variational Autoencoders for Multimodal Imputation and Embedding (JAMIE)

Updated 9 months ago

https://github.com/alleninstitute/biomolvec • Science 13%

Notebooks and scripts used for the Nautilex Hackathon