Projects

Updated 10 months ago

uform • Rank 15.6 • Science 64%

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

bert clip clustering contrastive-learning cross-attention huggingface-transformers image-search language-vision llava multi-lingual multimodal neural-network openai openclip pretrained-models pytorch representation-learning semantic-search transformer vector-search

Updated 10 months ago

lmms-finetune • Rank 7.1 • Science 54%

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning foundation-models instruction-tuning large-language-model large-multimodal-models llava llava-next multimodal multimodal-large-language-models qwen-vl vision-language visual-instruction-tuning

Updated 10 months ago

ppdiffusers • Rank 19.3 • Science 36%

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

aigc clip controlnet deepseek-vl dit eva-clip got-ocr20 image-to-text internvl2 llava minicpm-v multimodal ppdiffusers qwen2-vl sd-xl sora stable-diffusion stablevideodiffusion text-to-image text-to-video

Updated 10 months ago

llama_ros • Rank 7.2 • Science 44%

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

audio cpp embeddings ggml gguf gpt langchain llama llamacpp llava llavacpp llm multimodal rerank reranking ros2 vlm

Updated 10 months ago

lrv-instruction • Rank 5.7 • Science 41%

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

chatgpt evaluation evaluation-metrics foundation-models gpt gpt-4 hallucination iclr iclr2024 llama llava multimodal object-detection prompt-engineering vicuna vision vision-and-language vqa

Updated 9 months ago

https://github.com/buaadreamer/mllm-finetuning-demo • Science 13%

使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory

finetune-llm huggingface-datasets llama-factory llava lora mllm paligemma pretraining supervised-finetuning transformers yi-vl

Updated 9 months ago

https://github.com/buaadreamer/chinese-llava-med • Science 13%

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

ai chinese gpt4v huggingface-datasets llama-factory llava medical minigpt4 mllm multimodal qwen1-5 transformers

Updated 10 months ago

spn4cir • Science 54%

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

acmmm2024 blip blip2 clip composed-image-retrieval cross-modal-retrieval data-generation image-retrieval llama llava memory-bank multi-modal-retrieval multimodal-learning transformer

Updated 9 months ago

https://github.com/autodistill/autodistill-llava • Science 13%

LLaVA base model for use with Autodistill.

autodistill computer-vision llava multimodal-llm

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

uform • Rank 15.6 • Science 64%

lmms-finetune • Rank 7.1 • Science 54%

ppdiffusers • Rank 19.3 • Science 36%

llama_ros • Rank 7.2 • Science 44%

lrv-instruction • Rank 5.7 • Science 41%

https://github.com/buaadreamer/mllm-finetuning-demo • Science 13%

https://github.com/buaadreamer/chinese-llava-med • Science 13%

spn4cir • Science 54%

https://github.com/autodistill/autodistill-llava • Science 13%