https://github.com/cv516buaa/udl

https://github.com/cv516buaa/udl

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: cv516Buaa
  • Language: Python
  • Default Branch: main
  • Size: 8.64 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme

README.md

UDL: Open Vocabulary Object Detection with LLM-based Unified Descriptive Language


Chunlei Wang · Wenquan Feng · Binghao Liu · Meng Li · Lijiang Chen · Qi Zhao

teaser

Highlight!!!!

UDL: Open Vocabulary Object Detection with LLM-based Unified Descriptive Language

Abstract

With the rapid development of vision-language approaches, more and more pioneers focus on the open vocabulary learning paradigm. These methods support a variety of vision-language tasks, aligning region features with language embeddings to enhance recognition ability of novel categories. However, existing methods neglect the diversity of input language in different tasks, which makes it difficult for the model to understand text context information and heavily rely on category names to detect objects. Capturing fine-grained features in images and text descriptions is also a challenge. To address these issues, we propose an Open Vocabulary Object Detection with LLM-based Unified Descriptive Language (UDL) with Hierarchical Gated Cross Attention (HGCA) and Pixel-level Visual Language Attention (PVLA) for more comprehensive contextual understanding and better visual-language alignment. On OminiLabel object detection benchmark, under the zero-shot detection setting, our approach can handle better open vocabulary object detection and achieve new SOTA results. Ablation studies and visualization experiments demonstrate the effectiveness of the proposed components. Codes will be publicly at https://github.com/cv516Buaa/UDL.

TODO

  • [x] Release demo
  • [x] Release checkpoints
  • [ ] Release training and inference codes

Checkpoints

teaser

Citation

If you have any question, please discuss with me by sending email to wcl_buaa@buaa.edu.cn.

Owner

  • Name: cv516Buaa
  • Login: cv516Buaa
  • Kind: user
  • Location: Beijing,China
  • Company: Beihang University

Pattern Recognition and Artificial Intelligence Group Prof.Qi Zhao & Lijiang Chen Dr. Shuchang Lyu & Binghao Liu & Chunlei Wang

GitHub Events

Total
Last Year