Updated 10 months ago

uform • Rank 15.6 • Science 64%

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Updated 10 months ago

lmms-finetune • Rank 7.1 • Science 54%

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Updated 10 months ago

ppdiffusers • Rank 19.3 • Science 36%

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated 10 months ago

llama_ros • Rank 7.2 • Science 44%

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

Updated 9 months ago

https://github.com/buaadreamer/mllm-finetuning-demo • Science 13%

使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory

Updated 9 months ago

https://github.com/buaadreamer/chinese-llava-med • Science 13%

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

Updated 10 months ago

spn4cir • Science 54%

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives