Projects | Open Source Science

Updated 11 months ago

mmpretrain • Rank 23.1 • Science 64%

OpenMMLab Pre-training Toolbox and Benchmark

beit clip constrastive-learning convnext deep-learning image-classification mae masked-image-modeling mobilenet moco multimodal pretrained-models pytorch resnet self-supervised-learning swin-transformer vision-transformer

Engineering (40%)

Updated 10 months ago

bentoml • Rank 26.1 • Science 54%

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

Updated 10 months ago

uform • Rank 15.6 • Science 64%

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

bert clip clustering contrastive-learning cross-attention huggingface-transformers image-search language-vision llava multi-lingual multimodal neural-network openai openclip pretrained-models pytorch representation-learning semantic-search transformer vector-search

Updated 10 months ago

swarms • Rank 22.4 • Science 54%

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

agents ai artificial-intelligence attention-mechanism chatgpt gpt4 gpt4all huggingface langchain langchain-python machine-learning multi-modal-imaging multi-modality multimodal prompt-engineering prompt-toolkit prompting swarms transformer-models tree-of-thoughts

Updated 10 months ago

https://github.com/bytedance/ui-tars-desktop • Rank 26.3 • Science 46%

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

agent agent-tars browser-use computer-use gui-agent gui-operator mcp mcp-server multimodal tars ui-tars vision vlm

Updated 10 months ago

awesome-mmps • Rank 4.7 • Science 67%

Corpus of resources for multimodal machine learning with physiological signals (mmps).

biosignals deep-learning machine-learning multimodal multimodal-data multimodal-deep-learning multimodal-learning physiological-signals signal-processing wearable wearable-devices

Updated 10 months ago

https://github.com/rerun-io/rerun • Rank 31.1 • Science 36%

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

computer-vision cpp multimodal python robotics rust visualization

Updated 10 months ago

biotrove • Rank 5.8 • Science 59%

NeurIPS 2024 Track on Datasets and Benchmarks (Spotlight)

animals clip image-classification multimodal rare-species species taxonomy zero-shot-classification

Updated 11 months ago

maestro • Rank 10.5 • Science 54%

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision qwen2-vl transformers vision-and-language vqa

Mathematics (40%)

Updated 10 months ago

autodistill • Rank 19.6 • Science 44%

Images to inference with no labeling (use foundation models to train supervised models).

auto-labeling computer-vision deep-learning foundation-models grounding-dino image-annotation image-classification instance-segmentation labeling-tool machine-learning model-distillation multimodal object-detection pytorch segment-anything yolov5 yolov8

Updated 10 months ago

lmms-finetune • Rank 7.1 • Science 54%

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning foundation-models instruction-tuning large-language-model large-multimodal-models llava llava-next multimodal multimodal-large-language-models qwen-vl vision-language visual-instruction-tuning

Updated 10 months ago

ppdiffusers • Rank 19.3 • Science 36%

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

aigc clip controlnet deepseek-vl dit eva-clip got-ocr20 image-to-text internvl2 llava minicpm-v multimodal ppdiffusers qwen2-vl sd-xl sora stable-diffusion stablevideodiffusion text-to-image text-to-video

Updated 10 months ago

llama_ros • Rank 7.2 • Science 44%

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

audio cpp embeddings ggml gguf gpt langchain llama llamacpp llava llavacpp llm multimodal rerank reranking ros2 vlm

Updated 10 months ago

marqo-fashionclip • Rank 6.4 • Science 44%

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

clip embeddings fashion-classifier fashionclip informationretrieval multimodal recomendations search transformers vectorsearch vision-transformer

Updated 10 months ago

lrv-instruction • Rank 5.7 • Science 41%

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

chatgpt evaluation evaluation-metrics foundation-models gpt gpt-4 hallucination iclr iclr2024 llama llava multimodal object-detection prompt-engineering vicuna vision vision-and-language vqa

Updated 9 months ago

https://github.com/hci-lab-um/cactus • Rank 5.4 • Science 36%

Constraint-free multi-modal Access to Communication Technology for Users with Severe motor impairments. This proposal pushes the state of the art through novel eye-tracking interaction patterns along with the introduction of secondary input modalities to improve throughput and usability.

assistive-technology browser eye-tracking multimodal

Updated 10 months ago

https://github.com/alleninstitute/coupledae-patchseq • Rank 2.4 • Science 36%

Multimodal data alignment and cell type analysis with coupled autoencoders.

autoencoders celltypes multimodal patchseq representation-learning

Updated 10 months ago

https://github.com/predict-idlab/tsflex • Rank 7.9 • Science 23%

Flexible time series feature extraction & processing

data-science feature-engineering feature-extraction multimodal multivariate pandas processing python time-series window-stride

Updated 10 months ago

https://github.com/aisuko/notebooks • Rank 4.6 • Science 26%

Implementation for the different ML tasks on Kaggle platform with GPUs.

accelerator computer-vision fine-tuning kaggle large-language-models multimodal natural-language-processing neural-network peft pytorch quantization renforcement-learning tensorboard transformers visulization wandb

Updated 10 months ago

flair-2 • Rank 3.5 • Science 26%

Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.

computer-vision cookiecutter-template deep-learning deeplearning image-processing lightning multiclass-segmentation multimodal multimodal-deep-learning pytorch pytorch-lightning sentinel-2 test-time-augmentation timm tta wandb

Updated 10 months ago

jamie • Science 36%

Joint variational Autoencoders for Multimodal Imputation and Embedding (JAMIE)

autoencoder imputation integration multimodal variational variational-autoencoder

Updated 10 months ago

asaca-automatic-speech-analysis-for-cognitive-assessment • Science 26%

Transform speech into cognitive assessments with ASACA. Achieve accurate predictions and low error rates using our end-to-end toolkit. 🚀🔧

ai classification deep-learning feature-engineering feature-extraction machine-learning multimodal praat python python-script shap speech speech-analysis speech-and-language-processing speech-to-text training wav2vec2 wav2vec2ctc

Updated 10 months ago

https://github.com/ai4healthuol/mds-ed • Science 49%

Repository for the paper 'MDS-ED: Multimodal Decision Support in the Emergency Department – a benchmark dataset based on MIMIC-IV'.

benchmark datasets deep-learning ecg healthcare medical-dataset multimodal waveforms

Updated 10 months ago

https://github.com/autodistill/autodistill-kosmos-2 • Science 13%

Kosmos-2 base model for use with Autodistill.

computer-vision kosmos2 multimodal object-detection

Updated 10 months ago

https://github.com/amazon-science/mix-generation • Science 10%

MixGen: A New Multi-Modal Data Augmentation

data-augmentation data-efficiency multimodal pretraining vision-language

Updated 10 months ago

asaca-automatic-speech-analysis-for-cognitive-assessment • Science 44%

The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.

ai classification deep-learning feature-engineering feature-extraction machine-learning multimodal natural-language-processing praat python python-script shap speech speech-analysis speech-and-language-processing speech-to-text training wav2vec2 wav2vec2ctc

Updated 10 months ago

NeMo • Science 44%

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

asr deeplearning generative-ai large-language-models machine-translation multimodal neural-networks speaker-diariazation speaker-recognition speech-synthesis speech-translation tts

Mathematics (40%)

Updated 10 months ago

xrayglm • Science 54%

🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.

large-language-models llms medical multimodal visualglm-6b xray

Updated 10 months ago

https://github.com/alleninstitute/biomolvec • Science 13%

Notebooks and scripts used for the Nautilex Hackathon

foundation-models generative-model genes multimodal

Updated 10 months ago

2d3mf • Science 36%

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"

audio deep-learning deepfake-detection machine-learning multimodal pytorch video

Updated 10 months ago

sutd-trafficqa • Science 41%

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

annotations cvpr cvpr2021 dataset multimodal multimodal-deep-learning paper traffic-events video-qa video-reasoning vqa vqa-dataset

Updated 10 months ago

https://github.com/awslabs/guidance-for-multi-omics-and-multi-modal-data-integration-and-analysis-on-aws • Science 26%

This guidance creates a scalable environment in AWS to prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and perform interactive queries against a data lake. The solution also demonstrates the use of Amazon Omics for multi-modal analysis.

aws life-sciences multimodal multimodality

Updated 10 months ago

visualwebarena • Science 36%

VisualWebArena is a benchmark for multimodal agents.

agents llm multimodal

Updated 10 months ago

https://github.com/aehrc/imageclefmedical_caption_23 • Science 13%

MedICap: Code for the participation of team CSIRO at the ImageCLEFmedical Caption task of 2023.

image-captioning medical-image-captioning medical-imaging multimodal multimodal-learning report-generation

Updated 10 months ago

nemo • Science 13%

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

asr deeplearning generative-ai large-langage-models machine-translation multimodal neural-networks speaker-diariazation speaker-recognition speech-synthesis speech-translation tts

Updated 10 months ago

https://github.com/ai4healthuol/cardiolab • Science 36%

This is the official repository for CardioLab. A machine and deep learning framework for the estimation and monitoring of laboratory abnormalities throught ECG data.

deep-learning ecg ecg-classification haematology laboratory-analysis multimodal patient-monitoring

Updated 10 months ago

https://github.com/aehrc/cvt2distilgpt2 • Science 49%

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

chest-xray-imaging distilgpt2 gpt-2 huggingface-transformers image-captioning medical-image-analysis mimic-cxr multimodal multimodal-deep-learning pytorch pytorch-lightning vision-transformer

Updated 10 months ago

https://github.com/buaadreamer/chinese-llava-med • Science 13%

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

ai chinese gpt4v huggingface-datasets llama-factory llava medical minigpt4 mllm multimodal qwen1-5 transformers

Updated 10 months ago

awesome-japanese-llm • Science 75%

日本語LLMまとめ - Overview of Japanese LLMs

foundation-models generative-ai generative-model generative-models japanese japanese-language japanese-language-model japanese-llm language-model language-models large-language-model large-language-models llm llm-japanese llms multimodal vision-and-language vision-language vision-language-model