Updated 9 months ago
cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Updated 9 months ago
https://github.com/buaadreamer/qwen2-vl-history
Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums
Updated 9 months ago
https://github.com/buaadreamer/chinese-llava-med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Updated 9 months ago
awesome-llms-meet-multimodal-generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Updated 9 months ago
https://github.com/buaadreamer/mllm-finetuning-demo
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
Updated 9 months ago
spatialfusion-lm
SpatialFusion-LM is a real-time spatial reasoning framework that combines neural depth, 3D reconstruction, and language-driven scene understanding.