Updated 6 months ago
spatialfusion-lm
SpatialFusion-LM is a real-time spatial reasoning framework that combines neural depth, 3D reconstruction, and language-driven scene understanding.
Updated 6 months ago
hourvideo
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
1-hour-video-language-understanding
benchmark-dataset
egocentric-videos
evals
gemini-pro
gpt-4
long-context-understanding
long-form-video-language-understanding
multimodal-large-language-models
multiple-choice-questions
navigation
neurips-2024
perception
reasoning
spatial-intelligence
summarization
video-language-understanding
visual-reasoning