hydra
[ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, springer.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Repository
[ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Basic Info
- Host: GitHub
- Owner: ControlNet
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: https://hydra-vl4ai.github.io
- Size: 51.3 MB
Statistics
- Stars: 17
- Watchers: 3
- Forks: 5
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
This is the code for the paper HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning, accepted by ECCV 2024 [Project Page]. We released the code that uses Reinforcement Learning (DQN) to fine-tune the LLM🔥🔥🔥
Release
- [2025/02/11] 🤖 HYDRA with RL is released.
- [2024/08/05] 🚀 PYPI package is released.
- [2024/07/29] 🔥 HYDRA is open sourced in GitHub.
TODOs
We realize that gpt-3.5-turbo-0613 is deprecated, and gpt-3.5 will be replaced by gpt-4o-mini. We will release another version of HYDRA.
As of July 2024,
gpt-4o-minishould be used in place ofgpt-3.5-turbo, as it is cheaper, more capable, multimodal, and just as fast Openai API Page.
We also notice the embedding model is updated by OpenAI as shown in this link. Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models. - [x] GPT-4o-mini replacement. - [x] LLaMA3.1 (ollama) replacement. - [x] Gradio Demo - [x] GPT-4o Version. - [x] HYDRA with RL(DQN). - [ ] HYDRA with Deepseek R1.
https://github.com/user-attachments/assets/39a897ab-d457-49d2-8527-0d6fe3a3b922
Installation
Requirements
- Python >= 3.10
- conda
Please follow the instructions below to install the required packages and set up the environment.
1. Clone this repository.
Bash
git clone https://github.com/ControlNet/HYDRA
2. Setup conda environment and install dependencies.
Option 1: Using pixi (recommended):
Bash
pixi install
pixi shell
Option 2: Building from source:
Bash
bash -i build_env.sh
If you meet errors, please consider going through the build_env.sh file and install the packages manually.
3. Configure the environments
Edit the file .env or setup in CLI to configure the environment variables.
``` OPENAIAPIKEY=your-api-key # if you want to use OpenAI LLMs OLLAMA_HOST=http://ollama.server:11434 # if you want to use your OLLaMA server for llama or deepseek
do not change this TORCH_HOME variable
TORCHHOME=./pretrainedmodels ```
4. Download the pretrained models
Run the scripts to download the pretrained models to the ./pretrained_models directory.
Bash
python -m hydra_vl4ai.download_model --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>
For example,
Bash
python -m hydra_vl4ai.download_model --base_config ./config/okvqa.yaml --model_config ./config/model_config_1gpu.yaml
Inference
A worker is required to run the inference.
Bash
python -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>
Inference with given one image and prompt
Bash
python demo_cli.py \
--image <IMAGE_PATH> \
--prompt <PROMPT> \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-PATH>
Inference with Gradio GUI
Bash
python demo_gradio.py \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-PATH>
Inference dataset
Bash
python main.py \
--data_root <YOUR-DATA-ROOT> \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-PATH>
Then the inference results are saved in the ./result directory for evaluation.
Evaluation
Bash
python evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>
For example,
Bash
python evaluate.py result/result_okvqa.jsonl okvqa
Training Controller with RL(DQN)
Bash
python train.py \
--data_root <IMAGE_PATH> \
--base_config <YOUR-CONFIG-DIR>\
--model_config <MODEL-PATH> \
--dqn_config <YOUR-DQN-CONFIG-DIR>
For example,
Bash
python train.py \
--data_root ../coco2014 \
--base_config ./config/okvqa.yaml\
--model_config ./config/model_config_1gpu.yaml \
--dqn_config ./config/dqn_debug.yaml
Citation
bibtex
@inproceedings{ke2024hydra,
title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning},
author={Ke, Fucai and Cai, Zhixi and Jahangard, Simindokht and Wang, Weiqing and Haghighi, Pari Delir and Rezatofighi, Hamid},
booktitle={European Conference on Computer Vision},
year={2024},
organization={Springer},
doi={10.1007/978-3-031-72661-3_8},
isbn={978-3-031-72661-3},
pages={132--149},
}
Acknowledgements
Some code and prompts are based on cvlab-columbia/viper.
Owner
- Name: ControlNet
- Login: ControlNet
- Kind: user
- Website: controlnet.space
- Repositories: 30
- Profile: https://github.com/ControlNet
Study on: Computer Vision | Artificial Intelligence
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you find this work useful in your research, please cite it."
preferred-citation:
type: conference-paper
title: "HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning"
authors:
- family-names: "Ke"
given-names: "Fucai"
- family-names: "Cai"
given-names: "Zhixi"
- family-names: "Jahangard"
given-names: "Simindokht"
- family-names: "Wang"
given-names: "Weiqing"
- family-names: "Haghighi"
given-names: "Pari Delir"
- family-names: "Rezatofighi"
given-names: "Hamid"
collection-title: "European Conference on Computer Vision"
year: 2024
start: 132
end: 149
doi: 10.1007/978-3-031-72661-3_8
GitHub Events
Total
- Create event: 3
- Release event: 2
- Issues event: 9
- Watch event: 6
- Delete event: 1
- Issue comment event: 13
- Push event: 39
- Pull request review comment event: 133
- Pull request review event: 134
- Pull request event: 6
- Fork event: 5
Last Year
- Create event: 3
- Release event: 2
- Issues event: 9
- Watch event: 6
- Delete event: 1
- Issue comment event: 13
- Push event: 39
- Pull request review comment event: 133
- Pull request review event: 134
- Pull request event: 6
- Fork event: 5
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 6
- Total pull requests: 2
- Average time to close issues: about 1 month
- Average time to close pull requests: 3 days
- Total issue authors: 5
- Total pull request authors: 2
- Average comments per issue: 0.83
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 2
- Average time to close issues: about 1 month
- Average time to close pull requests: 3 days
- Issue authors: 5
- Pull request authors: 2
- Average comments per issue: 0.83
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hu-my (2)
- period248650 (1)
- Solus-sano (1)
- YanhuiS (1)
- HoangLayor (1)
Pull Request Authors
- GreyElaina (1)
- ControlNet (1)
- HoangLayor (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pytorch/pytorch 1.13.1-cuda11.6-cudnn8-devel build
- mcr.microsoft.com/devcontainers/base ubuntu-20.04 build
- Jinja2 ==3.1.2
- Markdown ==3.4.1
- MarkupSafe ==2.1.1
- Pillow ==9.2.0
- Pillow-SIMD ==9.0.0.post1
- PyNaCl ==1.5.0
- PyWavelets ==1.4.1
- PyYAML ==6.0
- Pygments ==2.13.0
- Shapely ==1.8.4
- Werkzeug ==2.2.2
- absl-py ==1.2.0
- aiohttp ==3.8.3
- aiosignal ==1.2.0
- anyio ==3.6.1
- asttokens ==2.0.8
- async-timeout ==4.0.2
- attrs ==22.1.0
- av ==9.2.0
- backcall ==0.2.0
- bcrypt ==4.0.0
- cachetools ==5.2.0
- certifi ==2022.9.14
- cffi ==1.15.1
- charset-normalizer ==2.1.1
- click ==8.1.3
- cloudpickle ==2.2.0
- configobj ==5.0.6
- contourpy ==1.0.5
- cryptography ==38.0.1
- cycler ==0.11.0
- cytoolz ==0.12.0
- debugpy ==1.6.3
- decorator ==5.1.1
- decord ==0.6.0
- easydict ==1.10
- einops ==0.4.1
- entrypoints ==0.4
- executing ==1.1.0
- fairscale ==0.4.12
- fastapi ==0.85.0
- ffmpy ==0.3.0
- filelock ==3.8.0
- fonttools ==4.37.3
- frozenlist ==1.3.1
- fsspec ==2022.8.2
- ftfy ==6.1.1
- google-auth ==2.12.0
- google-auth-oauthlib ==0.4.6
- gradio ==3.4.0
- grpcio ==1.49.1
- h11 ==0.12.0
- httpcore ==0.15.0
- httpx ==0.23.0
- huggingface-hub ==0.9.1
- idna ==3.4
- imageio ==2.22.1
- importlib-metadata ==5.0.0
- inflect ==6.0.0
- ipdb ==0.13.9
- ipykernel ==6.16.0
- ipython ==8.5.0
- jedi ==0.18.1
- joblib ==1.2.0
- jupyter-core ==4.11.1
- jupyter_client ==7.3.5
- kiwisolver ==1.4.4
- linkify-it-py ==1.0.3
- lmdb ==1.3.0
- lz4 ==4.0.2
- markdown-it-py ==2.1.0
- matplotlib ==3.6.0
- matplotlib-inline ==0.1.6
- mdit-py-plugins ==0.3.1
- mdurl ==0.1.2
- msgpack ==1.0.4
- msgpack-numpy ==0.4.8
- multidict ==6.0.2
- nest-asyncio ==1.5.6
- networkx ==2.8.7
- nltk ==3.7
- numpy ==1.23.3
- oauthlib ==3.2.1
- opencv-python ==4.6.0.66
- orjson ==3.8.0
- packaging ==21.3
- pandas ==1.5.0
- paramiko ==2.11.0
- parso ==0.8.3
- pexpect ==4.8.0
- pickleshare ==0.7.5
- prettytable ==3.4.1
- prompt-toolkit ==3.0.31
- protobuf ==3.19.6
- psutil ==5.9.2
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- pyasn1 ==0.4.8
- pyasn1-modules ==0.2.8
- pycocotools ==2.0.5
- pycparser ==2.21
- pycryptodome ==3.15.0
- pydantic ==1.10.2
- pydub ==0.25.1
- pymongo ==4.2.0
- pyparsing ==3.0.9
- python-dateutil ==2.8.2
- python-multipart ==0.0.5
- pytz ==2022.4
- pyzmq ==24.0.1
- regex ==2022.9.13
- requests ==2.28.1
- requests-oauthlib ==1.3.1
- rfc3986 ==1.5.0
- rsa ==4.9
- ruamel.yaml ==0.17.21
- ruamel.yaml.base ==0.3.0
- ruamel.yaml.clib ==0.2.6
- ruamel.yaml.cmd ==0.6.3
- ruamel.yaml.convert ==0.3.2
- sacremoses ==0.0.53
- scikit-image ==0.19.3
- scipy ==1.9.1
- six ==1.16.0
- sniffio ==1.3.0
- stack-data ==0.5.1
- starlette ==0.20.4
- tensorboard ==2.10.1
- tensorboard-data-server ==0.6.1
- tensorboard-plugin-wit ==1.8.1
- tensorboardX ==2.5.1
- tifffile ==2022.8.12
- timm ==0.6.7
- tokenizers ==0.10.3
- toml ==0.10.2
- toolz ==0.12.0
- tornado ==6.2
- tqdm ==4.64.1
- traitlets ==5.4.0
- transformers ==4.11.3
- typing_extensions ==4.3.0
- uc-micro-py ==1.0.1
- ujson ==5.5.0
- urllib3 ==1.26.12
- uvicorn ==0.18.3
- wcwidth ==0.2.5
- websockets ==10.3
- yacs ==0.1.8
- yarl ==1.8.1
- zipp ==3.9.0
- addict *
- numpy *
- opencv-python *
- pycocotools *
- supervision *
- timm *
- torch *
- torchvision *
- transformers *
- yapf *
- Pillow *
- PyYAML *
- addict *
- diffusers *
- fairscale *
- gradio *
- huggingface_hub *
- litellm *
- matplotlib *
- nltk *
- numpy *
- onnxruntime *
- opencv_python *
- pycocotools *
- requests *
- setuptools *
- supervision *
- termcolor *
- timm *
- torch *
- torchvision *
- transformers *
- yapf *
- easydict *
- matplotlib *
- numpy *
- onnx *
- onnxruntime *
- opencv-python *
- pycocotools *
- pyyaml *
- torch *
- torchvision *
- actions/checkout v4 composite
- actions/setup-python v5 composite
- marvinpinto/action-automatic-releases latest composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v4 composite
- actions/setup-python v4 composite
- matias-martini/flake8-pr-comments-action main composite
- einops-exts ==0.0.4
- fastapi *
- gradio *
- gradio_client *
- markdown2 [all]
- scikit-learn ==1.2.2
- sentencepiece ==0.1.99
- shortuuid *
- uvicorn *
- accelerate *
- bitsandbytes *
- chardet *
- fastapi *
- ftfy *
- gdown *
- gradio >=5.0,<5.12
- nltk *
- numpy <2
- ollama *
- openai *
- opencv-python *
- pillow *
- prettytable *
- pycocotools *
- pydantic *
- python-dotenv *
- python_dateutil *
- requests *
- rich *
- scipy *
- starlette *
- tensorneko_util >=0.3.21,<0.4.0
- timm ==0.9.16
- tokenizers *
- torch *
- torchmetrics *
- transformers *
- uvicorn *
- websockets *
- word2number *
- yacs *