hydra

[ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

https://github.com/controlnet/hydra

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, springer.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

[ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Basic Info
  • Host: GitHub
  • Owner: ControlNet
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage: https://hydra-vl4ai.github.io
  • Size: 51.3 MB
Statistics
  • Stars: 17
  • Watchers: 3
  • Forks: 5
  • Open Issues: 0
  • Releases: 3
Created over 1 year ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

This is the code for the paper HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning, accepted by ECCV 2024 [Project Page]. We released the code that uses Reinforcement Learning (DQN) to fine-tune the LLM🔥🔥🔥

Release

  • [2025/02/11] 🤖 HYDRA with RL is released.
  • [2024/08/05] 🚀 PYPI package is released.
  • [2024/07/29] 🔥 HYDRA is open sourced in GitHub.

TODOs

We realize that gpt-3.5-turbo-0613 is deprecated, and gpt-3.5 will be replaced by gpt-4o-mini. We will release another version of HYDRA.

As of July 2024, gpt-4o-mini should be used in place of gpt-3.5-turbo, as it is cheaper, more capable, multimodal, and just as fast Openai API Page.

We also notice the embedding model is updated by OpenAI as shown in this link. Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models. - [x] GPT-4o-mini replacement. - [x] LLaMA3.1 (ollama) replacement. - [x] Gradio Demo - [x] GPT-4o Version. - [x] HYDRA with RL(DQN). - [ ] HYDRA with Deepseek R1.

https://github.com/user-attachments/assets/39a897ab-d457-49d2-8527-0d6fe3a3b922

Installation

Requirements

  • Python >= 3.10
  • conda

Please follow the instructions below to install the required packages and set up the environment.

1. Clone this repository.

Bash git clone https://github.com/ControlNet/HYDRA

2. Setup conda environment and install dependencies.

Option 1: Using pixi (recommended): Bash pixi install pixi shell

Option 2: Building from source: Bash bash -i build_env.sh If you meet errors, please consider going through the build_env.sh file and install the packages manually.

3. Configure the environments

Edit the file .env or setup in CLI to configure the environment variables.

``` OPENAIAPIKEY=your-api-key # if you want to use OpenAI LLMs OLLAMA_HOST=http://ollama.server:11434 # if you want to use your OLLaMA server for llama or deepseek

do not change this TORCH_HOME variable

TORCHHOME=./pretrainedmodels ```

4. Download the pretrained models

Run the scripts to download the pretrained models to the ./pretrained_models directory.

Bash python -m hydra_vl4ai.download_model --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>

For example, Bash python -m hydra_vl4ai.download_model --base_config ./config/okvqa.yaml --model_config ./config/model_config_1gpu.yaml

Inference

A worker is required to run the inference.

Bash python -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>

Inference with given one image and prompt

Bash python demo_cli.py \ --image <IMAGE_PATH> \ --prompt <PROMPT> \ --base_config <YOUR-CONFIG-DIR> \ --model_config <MODEL-PATH>

Inference with Gradio GUI

Bash python demo_gradio.py \ --base_config <YOUR-CONFIG-DIR> \ --model_config <MODEL-PATH>


Inference dataset

Bash python main.py \ --data_root <YOUR-DATA-ROOT> \ --base_config <YOUR-CONFIG-DIR> \ --model_config <MODEL-PATH>

Then the inference results are saved in the ./result directory for evaluation.

Evaluation

Bash python evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>

For example,

Bash python evaluate.py result/result_okvqa.jsonl okvqa

Training Controller with RL(DQN)

Bash python train.py \ --data_root <IMAGE_PATH> \ --base_config <YOUR-CONFIG-DIR>\ --model_config <MODEL-PATH> \ --dqn_config <YOUR-DQN-CONFIG-DIR> For example, Bash python train.py \ --data_root ../coco2014 \ --base_config ./config/okvqa.yaml\ --model_config ./config/model_config_1gpu.yaml \ --dqn_config ./config/dqn_debug.yaml

Citation

bibtex @inproceedings{ke2024hydra, title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning}, author={Ke, Fucai and Cai, Zhixi and Jahangard, Simindokht and Wang, Weiqing and Haghighi, Pari Delir and Rezatofighi, Hamid}, booktitle={European Conference on Computer Vision}, year={2024}, organization={Springer}, doi={10.1007/978-3-031-72661-3_8}, isbn={978-3-031-72661-3}, pages={132--149}, }

Acknowledgements

Some code and prompts are based on cvlab-columbia/viper.

Owner

  • Name: ControlNet
  • Login: ControlNet
  • Kind: user

Study on: Computer Vision | Artificial Intelligence

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find this work useful in your research, please cite it."
preferred-citation:
  type: conference-paper
  title: "HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning"
  authors:
  - family-names: "Ke"
    given-names: "Fucai"
  - family-names: "Cai"
    given-names: "Zhixi"
  - family-names: "Jahangard"
    given-names: "Simindokht"
  - family-names: "Wang"
    given-names: "Weiqing"
  - family-names: "Haghighi"
    given-names: "Pari Delir"
  - family-names: "Rezatofighi"
    given-names: "Hamid"
  collection-title: "European Conference on Computer Vision"
  year: 2024
  start: 132
  end: 149
  doi: 10.1007/978-3-031-72661-3_8

GitHub Events

Total
  • Create event: 3
  • Release event: 2
  • Issues event: 9
  • Watch event: 6
  • Delete event: 1
  • Issue comment event: 13
  • Push event: 39
  • Pull request review comment event: 133
  • Pull request review event: 134
  • Pull request event: 6
  • Fork event: 5
Last Year
  • Create event: 3
  • Release event: 2
  • Issues event: 9
  • Watch event: 6
  • Delete event: 1
  • Issue comment event: 13
  • Push event: 39
  • Pull request review comment event: 133
  • Pull request review event: 134
  • Pull request event: 6
  • Fork event: 5

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 6
  • Total pull requests: 2
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 3 days
  • Total issue authors: 5
  • Total pull request authors: 2
  • Average comments per issue: 0.83
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 2
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 3 days
  • Issue authors: 5
  • Pull request authors: 2
  • Average comments per issue: 0.83
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hu-my (2)
  • period248650 (1)
  • Solus-sano (1)
  • YanhuiS (1)
  • HoangLayor (1)
Pull Request Authors
  • GreyElaina (1)
  • ControlNet (1)
  • HoangLayor (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

module_repos/Grounded-Segment-Anything/Dockerfile docker
  • pytorch/pytorch 1.13.1-cuda11.6-cudnn8-devel build
module_repos/LLaVA/.devcontainer/Dockerfile docker
  • mcr.microsoft.com/devcontainers/base ubuntu-20.04 build
module_repos/GLIP/requirements.txt pypi
  • Jinja2 ==3.1.2
  • Markdown ==3.4.1
  • MarkupSafe ==2.1.1
  • Pillow ==9.2.0
  • Pillow-SIMD ==9.0.0.post1
  • PyNaCl ==1.5.0
  • PyWavelets ==1.4.1
  • PyYAML ==6.0
  • Pygments ==2.13.0
  • Shapely ==1.8.4
  • Werkzeug ==2.2.2
  • absl-py ==1.2.0
  • aiohttp ==3.8.3
  • aiosignal ==1.2.0
  • anyio ==3.6.1
  • asttokens ==2.0.8
  • async-timeout ==4.0.2
  • attrs ==22.1.0
  • av ==9.2.0
  • backcall ==0.2.0
  • bcrypt ==4.0.0
  • cachetools ==5.2.0
  • certifi ==2022.9.14
  • cffi ==1.15.1
  • charset-normalizer ==2.1.1
  • click ==8.1.3
  • cloudpickle ==2.2.0
  • configobj ==5.0.6
  • contourpy ==1.0.5
  • cryptography ==38.0.1
  • cycler ==0.11.0
  • cytoolz ==0.12.0
  • debugpy ==1.6.3
  • decorator ==5.1.1
  • decord ==0.6.0
  • easydict ==1.10
  • einops ==0.4.1
  • entrypoints ==0.4
  • executing ==1.1.0
  • fairscale ==0.4.12
  • fastapi ==0.85.0
  • ffmpy ==0.3.0
  • filelock ==3.8.0
  • fonttools ==4.37.3
  • frozenlist ==1.3.1
  • fsspec ==2022.8.2
  • ftfy ==6.1.1
  • google-auth ==2.12.0
  • google-auth-oauthlib ==0.4.6
  • gradio ==3.4.0
  • grpcio ==1.49.1
  • h11 ==0.12.0
  • httpcore ==0.15.0
  • httpx ==0.23.0
  • huggingface-hub ==0.9.1
  • idna ==3.4
  • imageio ==2.22.1
  • importlib-metadata ==5.0.0
  • inflect ==6.0.0
  • ipdb ==0.13.9
  • ipykernel ==6.16.0
  • ipython ==8.5.0
  • jedi ==0.18.1
  • joblib ==1.2.0
  • jupyter-core ==4.11.1
  • jupyter_client ==7.3.5
  • kiwisolver ==1.4.4
  • linkify-it-py ==1.0.3
  • lmdb ==1.3.0
  • lz4 ==4.0.2
  • markdown-it-py ==2.1.0
  • matplotlib ==3.6.0
  • matplotlib-inline ==0.1.6
  • mdit-py-plugins ==0.3.1
  • mdurl ==0.1.2
  • msgpack ==1.0.4
  • msgpack-numpy ==0.4.8
  • multidict ==6.0.2
  • nest-asyncio ==1.5.6
  • networkx ==2.8.7
  • nltk ==3.7
  • numpy ==1.23.3
  • oauthlib ==3.2.1
  • opencv-python ==4.6.0.66
  • orjson ==3.8.0
  • packaging ==21.3
  • pandas ==1.5.0
  • paramiko ==2.11.0
  • parso ==0.8.3
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • prettytable ==3.4.1
  • prompt-toolkit ==3.0.31
  • protobuf ==3.19.6
  • psutil ==5.9.2
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pycocotools ==2.0.5
  • pycparser ==2.21
  • pycryptodome ==3.15.0
  • pydantic ==1.10.2
  • pydub ==0.25.1
  • pymongo ==4.2.0
  • pyparsing ==3.0.9
  • python-dateutil ==2.8.2
  • python-multipart ==0.0.5
  • pytz ==2022.4
  • pyzmq ==24.0.1
  • regex ==2022.9.13
  • requests ==2.28.1
  • requests-oauthlib ==1.3.1
  • rfc3986 ==1.5.0
  • rsa ==4.9
  • ruamel.yaml ==0.17.21
  • ruamel.yaml.base ==0.3.0
  • ruamel.yaml.clib ==0.2.6
  • ruamel.yaml.cmd ==0.6.3
  • ruamel.yaml.convert ==0.3.2
  • sacremoses ==0.0.53
  • scikit-image ==0.19.3
  • scipy ==1.9.1
  • six ==1.16.0
  • sniffio ==1.3.0
  • stack-data ==0.5.1
  • starlette ==0.20.4
  • tensorboard ==2.10.1
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • tensorboardX ==2.5.1
  • tifffile ==2022.8.12
  • timm ==0.6.7
  • tokenizers ==0.10.3
  • toml ==0.10.2
  • toolz ==0.12.0
  • tornado ==6.2
  • tqdm ==4.64.1
  • traitlets ==5.4.0
  • transformers ==4.11.3
  • typing_extensions ==4.3.0
  • uc-micro-py ==1.0.1
  • ujson ==5.5.0
  • urllib3 ==1.26.12
  • uvicorn ==0.18.3
  • wcwidth ==0.2.5
  • websockets ==10.3
  • yacs ==0.1.8
  • yarl ==1.8.1
  • zipp ==3.9.0
module_repos/GLIP/setup.py pypi
module_repos/Grounded-Segment-Anything/GroundingDINO/pyproject.toml pypi
module_repos/Grounded-Segment-Anything/GroundingDINO/requirements.txt pypi
  • addict *
  • numpy *
  • opencv-python *
  • pycocotools *
  • supervision *
  • timm *
  • torch *
  • torchvision *
  • transformers *
  • yapf *
module_repos/Grounded-Segment-Anything/GroundingDINO/setup.py pypi
module_repos/Grounded-Segment-Anything/requirements.txt pypi
  • Pillow *
  • PyYAML *
  • addict *
  • diffusers *
  • fairscale *
  • gradio *
  • huggingface_hub *
  • litellm *
  • matplotlib *
  • nltk *
  • numpy *
  • onnxruntime *
  • opencv_python *
  • pycocotools *
  • requests *
  • setuptools *
  • supervision *
  • termcolor *
  • timm *
  • torch *
  • torchvision *
  • transformers *
  • yapf *
module_repos/Grounded-Segment-Anything/segment_anything/setup.py pypi
module_repos/Grounded-Segment-Anything/voxelnext_3d_box/requirements.txt pypi
  • easydict *
  • matplotlib *
  • numpy *
  • onnx *
  • onnxruntime *
  • opencv-python *
  • pycocotools *
  • pyyaml *
  • torch *
  • torchvision *
.github/workflows/release.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • marvinpinto/action-automatic-releases latest composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/lint_check.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • matias-martini/flake8-pr-comments-action main composite
module_repos/GLIP/pyproject.toml pypi
module_repos/LLaVA/pyproject.toml pypi
  • einops-exts ==0.0.4
  • fastapi *
  • gradio *
  • gradio_client *
  • markdown2 [all]
  • scikit-learn ==1.2.2
  • sentencepiece ==0.1.99
  • shortuuid *
  • uvicorn *
pyproject.toml pypi
requirements.txt pypi
  • accelerate *
  • bitsandbytes *
  • chardet *
  • fastapi *
  • ftfy *
  • gdown *
  • gradio >=5.0,<5.12
  • nltk *
  • numpy <2
  • ollama *
  • openai *
  • opencv-python *
  • pillow *
  • prettytable *
  • pycocotools *
  • pydantic *
  • python-dotenv *
  • python_dateutil *
  • requests *
  • rich *
  • scipy *
  • starlette *
  • tensorneko_util >=0.3.21,<0.4.0
  • timm ==0.9.16
  • tokenizers *
  • torch *
  • torchmetrics *
  • transformers *
  • uvicorn *
  • websockets *
  • word2number *
  • yacs *
setup.py pypi