Projects

Updated 10 months ago

【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

caption mllms multi-modal multi-modal-learning reid thermal-imaging

Updated 10 months ago

[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

agent mllms segment-anything vlms

Updated 10 months ago

ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

active-perception active-vision grpo mllms o3 rl thinking-with-image

Updated 10 months ago

Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

grpo mllms omnimodal rl

ecosyste.ms