Projects

Updated 10 months ago

lmms-finetune • Rank 7.1 • Science 54%

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning foundation-models instruction-tuning large-language-model large-multimodal-models llava llava-next multimodal multimodal-large-language-models qwen-vl vision-language visual-instruction-tuning

Updated 10 months ago

vlm-captioning-tools • Rank 3.7 • Science 44%

Python scripts to use for captioning images with VLMs

cogvlm image-captioning llama3 llm mistral moondream text-summarization vision-language vlm

Updated 10 months ago

vision-ai-checkup • Rank 5.4 • Science 26%

Take your LLM to the optometrist.

llm llm-benchmarking vision-language vision-language-model vlm

Updated 9 months ago

https://github.com/ahwang16/grounded-intuition-gpt-vision • Rank 1.6 • Science 20%

Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images

cv gpt-4 grounded-theory hci images llms nlp qualitative-analysis thematic-analysis vision-language

Updated 10 months ago

text2earth • Science 67%

[IEEE GRSM 2025 🔥] "Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model"

foundation-models image-generation remote-sensing vision-language

Updated 10 months ago

awesome-japanese-llm • Science 75%

日本語LLMまとめ - Overview of Japanese LLMs

foundation-models generative-ai generative-model generative-models japanese japanese-language japanese-language-model japanese-llm language-model language-models large-language-model large-language-models llm llm-japanese llms multimodal vision-and-language vision-language vision-language-model

Updated 9 months ago

https://github.com/amazon-science/mix-generation • Science 10%

MixGen: A New Multi-Modal Data Augmentation

data-augmentation data-efficiency multimodal pretraining vision-language

Updated 9 months ago

https://github.com/chen-yang-liu/awesome-rs-spatiotemporal-vlms • Science 49%

🔥Remote Sensing Spatio-Temporal Vision-Language Models: A Comprehensive Survey

change-detetion foundation-models large-language-models remote-sensing spatio-temporal-analysis vision-language

Updated 9 months ago

https://github.com/bytedance/shot2story • Science 10%

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark dataset large-language-models research video-captioning video-language video-language-pretraining video-question-answering video-story video-story-generation video-summarization vision-language

Updated 10 months ago

drivelm • Science 54%

[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering

autonomous-driving chain-of-thought graph-of-thoughts large-language-models llm prompt-engineering prompting tree-of-thoughts vision-language

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

lmms-finetune • Rank 7.1 • Science 54%

vlm-captioning-tools • Rank 3.7 • Science 44%

vision-ai-checkup • Rank 5.4 • Science 26%

https://github.com/ahwang16/grounded-intuition-gpt-vision • Rank 1.6 • Science 20%

text2earth • Science 67%

awesome-japanese-llm • Science 75%

https://github.com/amazon-science/mix-generation • Science 10%

https://github.com/chen-yang-liu/awesome-rs-spatiotemporal-vlms • Science 49%

https://github.com/bytedance/shot2story • Science 10%

drivelm • Science 54%