hydra

[ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

https://github.com/controlnet/hydra

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, springer.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

[ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Basic Info

Host: GitHub
Owner: ControlNet
License: apache-2.0
Language: Python
Default Branch: master
Homepage: https://hydra-vl4ai.github.io
Size: 51.3 MB

Statistics

Stars: 17
Watchers: 3
Forks: 5
Open Issues: 0
Releases: 3

Created almost 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

This is the code for the paper HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning, accepted by ECCV 2024 [Project Page]. We released the code that uses Reinforcement Learning (DQN) to fine-tune the LLM🔥🔥🔥

Release

[2025/02/11] 🤖 HYDRA with RL is released.
[2024/08/05] 🚀 PYPI package is released.
[2024/07/29] 🔥 HYDRA is open sourced in GitHub.

TODOs

We realize that gpt-3.5-turbo-0613 is deprecated, and gpt-3.5 will be replaced by gpt-4o-mini. We will release another version of HYDRA.

As of July 2024, gpt-4o-mini should be used in place of gpt-3.5-turbo, as it is cheaper, more capable, multimodal, and just as fast Openai API Page.

We also notice the embedding model is updated by OpenAI as shown in this link. Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models. - [x] GPT-4o-mini replacement. - [x] LLaMA3.1 (ollama) replacement. - [x] Gradio Demo - [x] GPT-4o Version. - [x] HYDRA with RL(DQN). - [ ] HYDRA with Deepseek R1.

https://github.com/user-attachments/assets/39a897ab-d457-49d2-8527-0d6fe3a3b922

Installation

Requirements

Python >= 3.10
conda

Please follow the instructions below to install the required packages and set up the environment.

1. Clone this repository.

Bash git clone https://github.com/ControlNet/HYDRA

2. Setup conda environment and install dependencies.

Option 1: Using pixi (recommended): Bash pixi install pixi shell

Option 2: Building from source: Bash bash -i build_env.sh If you meet errors, please consider going through the build_env.sh file and install the packages manually.

3. Configure the environments

Edit the file .env or setup in CLI to configure the environment variables.

``` OPENAIAPIKEY=your-api-key # if you want to use OpenAI LLMs OLLAMA_HOST=http://ollama.server:11434 # if you want to use your OLLaMA server for llama or deepseek

do not change this TORCH_HOME variable

TORCHHOME=./pretrainedmodels ```

4. Download the pretrained models

Run the scripts to download the pretrained models to the ./pretrained_models directory.

Bash python -m hydra_vl4ai.download_model --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>

For example, Bash python -m hydra_vl4ai.download_model --base_config ./config/okvqa.yaml --model_config ./config/model_config_1gpu.yaml

Inference

A worker is required to run the inference.

Bash python -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>

Inference with given one image and prompt

Bash python demo_cli.py \ --image <IMAGE_PATH> \ --prompt <PROMPT> \ --base_config <YOUR-CONFIG-DIR> \ --model_config <MODEL-PATH>

Inference with Gradio GUI

Bash python demo_gradio.py \ --base_config <YOUR-CONFIG-DIR> \ --model_config <MODEL-PATH>

Inference dataset

Bash python main.py \ --data_root <YOUR-DATA-ROOT> \ --base_config <YOUR-CONFIG-DIR> \ --model_config <MODEL-PATH>

Then the inference results are saved in the ./result directory for evaluation.

Evaluation

Bash python evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>

For example,

Bash python evaluate.py result/result_okvqa.jsonl okvqa

Training Controller with RL(DQN)

Bash python train.py \ --data_root <IMAGE_PATH> \ --base_config <YOUR-CONFIG-DIR>\ --model_config <MODEL-PATH> \ --dqn_config <YOUR-DQN-CONFIG-DIR> For example, Bash python train.py \ --data_root ../coco2014 \ --base_config ./config/okvqa.yaml\ --model_config ./config/model_config_1gpu.yaml \ --dqn_config ./config/dqn_debug.yaml

Citation

bibtex @inproceedings{ke2024hydra, title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning}, author={Ke, Fucai and Cai, Zhixi and Jahangard, Simindokht and Wang, Weiqing and Haghighi, Pari Delir and Rezatofighi, Hamid}, booktitle={European Conference on Computer Vision}, year={2024}, organization={Springer}, doi={10.1007/978-3-031-72661-3_8}, isbn={978-3-031-72661-3}, pages={132--149}, }

Acknowledgements

Some code and prompts are based on cvlab-columbia/viper.

Owner

Name: ControlNet
Login: ControlNet
Kind: user

Website: controlnet.space
Repositories: 30
Profile: https://github.com/ControlNet

Study on: Computer Vision | Artificial Intelligence

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find this work useful in your research, please cite it."
preferred-citation:
  type: conference-paper
  title: "HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning"
  authors:
  - family-names: "Ke"
    given-names: "Fucai"
  - family-names: "Cai"
    given-names: "Zhixi"
  - family-names: "Jahangard"
    given-names: "Simindokht"
  - family-names: "Wang"
    given-names: "Weiqing"
  - family-names: "Haghighi"
    given-names: "Pari Delir"
  - family-names: "Rezatofighi"
    given-names: "Hamid"
  collection-title: "European Conference on Computer Vision"
  year: 2024
  start: 132
  end: 149
  doi: 10.1007/978-3-031-72661-3_8

GitHub Events

Total

Create event: 3
Release event: 2
Issues event: 9
Watch event: 6
Delete event: 1
Issue comment event: 13
Push event: 39
Pull request review comment event: 133
Pull request review event: 134
Pull request event: 6
Fork event: 5

Last Year

Create event: 3
Release event: 2
Issues event: 9
Watch event: 6
Delete event: 1
Issue comment event: 13
Push event: 39
Pull request review comment event: 133
Pull request review event: 134
Pull request event: 6
Fork event: 5

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 6
Total pull requests: 2
Average time to close issues: about 1 month
Average time to close pull requests: 3 days
Total issue authors: 5
Total pull request authors: 2
Average comments per issue: 0.83
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 6
Pull requests: 2
Average time to close issues: about 1 month
Average time to close pull requests: 3 days
Issue authors: 5
Pull request authors: 2
Average comments per issue: 0.83
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

hu-my (2)
period248650 (1)
Solus-sano (1)
YanhuiS (1)
HoangLayor (1)

Pull Request Authors

GreyElaina (1)
ControlNet (1)
HoangLayor (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

module_repos/Grounded-Segment-Anything/Dockerfile docker

pytorch/pytorch 1.13.1-cuda11.6-cudnn8-devel build

module_repos/LLaVA/.devcontainer/Dockerfile docker

mcr.microsoft.com/devcontainers/base ubuntu-20.04 build

module_repos/GLIP/requirements.txt pypi

Jinja2 ==3.1.2
Markdown ==3.4.1
MarkupSafe ==2.1.1
Pillow ==9.2.0
Pillow-SIMD ==9.0.0.post1
PyNaCl ==1.5.0
PyWavelets ==1.4.1
PyYAML ==6.0
Pygments ==2.13.0
Shapely ==1.8.4
Werkzeug ==2.2.2
absl-py ==1.2.0
aiohttp ==3.8.3
aiosignal ==1.2.0
anyio ==3.6.1
asttokens ==2.0.8
async-timeout ==4.0.2
attrs ==22.1.0
av ==9.2.0
backcall ==0.2.0
bcrypt ==4.0.0
cachetools ==5.2.0
certifi ==2022.9.14
cffi ==1.15.1
charset-normalizer ==2.1.1
click ==8.1.3
cloudpickle ==2.2.0
configobj ==5.0.6
contourpy ==1.0.5
cryptography ==38.0.1
cycler ==0.11.0
cytoolz ==0.12.0
debugpy ==1.6.3
decorator ==5.1.1
decord ==0.6.0
easydict ==1.10
einops ==0.4.1
entrypoints ==0.4
executing ==1.1.0
fairscale ==0.4.12
fastapi ==0.85.0
ffmpy ==0.3.0
filelock ==3.8.0
fonttools ==4.37.3
frozenlist ==1.3.1
fsspec ==2022.8.2
ftfy ==6.1.1
google-auth ==2.12.0
google-auth-oauthlib ==0.4.6
gradio ==3.4.0
grpcio ==1.49.1
h11 ==0.12.0
httpcore ==0.15.0
httpx ==0.23.0
huggingface-hub ==0.9.1
idna ==3.4
imageio ==2.22.1
importlib-metadata ==5.0.0
inflect ==6.0.0
ipdb ==0.13.9
ipykernel ==6.16.0
ipython ==8.5.0
jedi ==0.18.1
joblib ==1.2.0
jupyter-core ==4.11.1
jupyter_client ==7.3.5
kiwisolver ==1.4.4
linkify-it-py ==1.0.3
lmdb ==1.3.0
lz4 ==4.0.2
markdown-it-py ==2.1.0
matplotlib ==3.6.0
matplotlib-inline ==0.1.6
mdit-py-plugins ==0.3.1
mdurl ==0.1.2
msgpack ==1.0.4
msgpack-numpy ==0.4.8
multidict ==6.0.2
nest-asyncio ==1.5.6
networkx ==2.8.7
nltk ==3.7
numpy ==1.23.3
oauthlib ==3.2.1
opencv-python ==4.6.0.66
orjson ==3.8.0
packaging ==21.3
pandas ==1.5.0
paramiko ==2.11.0
parso ==0.8.3
pexpect ==4.8.0
pickleshare ==0.7.5
prettytable ==3.4.1
prompt-toolkit ==3.0.31
protobuf ==3.19.6
psutil ==5.9.2
ptyprocess ==0.7.0
pure-eval ==0.2.2
pyasn1 ==0.4.8
pyasn1-modules ==0.2.8
pycocotools ==2.0.5
pycparser ==2.21
pycryptodome ==3.15.0
pydantic ==1.10.2
pydub ==0.25.1
pymongo ==4.2.0
pyparsing ==3.0.9
python-dateutil ==2.8.2
python-multipart ==0.0.5
pytz ==2022.4
pyzmq ==24.0.1
regex ==2022.9.13
requests ==2.28.1
requests-oauthlib ==1.3.1
rfc3986 ==1.5.0
rsa ==4.9
ruamel.yaml ==0.17.21
ruamel.yaml.base ==0.3.0
ruamel.yaml.clib ==0.2.6
ruamel.yaml.cmd ==0.6.3
ruamel.yaml.convert ==0.3.2
sacremoses ==0.0.53
scikit-image ==0.19.3
scipy ==1.9.1
six ==1.16.0
sniffio ==1.3.0
stack-data ==0.5.1
starlette ==0.20.4
tensorboard ==2.10.1
tensorboard-data-server ==0.6.1
tensorboard-plugin-wit ==1.8.1
tensorboardX ==2.5.1
tifffile ==2022.8.12
timm ==0.6.7
tokenizers ==0.10.3
toml ==0.10.2
toolz ==0.12.0
tornado ==6.2
tqdm ==4.64.1
traitlets ==5.4.0
transformers ==4.11.3
typing_extensions ==4.3.0
uc-micro-py ==1.0.1
ujson ==5.5.0
urllib3 ==1.26.12
uvicorn ==0.18.3
wcwidth ==0.2.5
websockets ==10.3
yacs ==0.1.8
yarl ==1.8.1
zipp ==3.9.0

module_repos/GLIP/setup.py pypi

module_repos/Grounded-Segment-Anything/GroundingDINO/pyproject.toml pypi

module_repos/Grounded-Segment-Anything/GroundingDINO/requirements.txt pypi

addict *
numpy *
opencv-python *
pycocotools *
supervision *
timm *
torch *
torchvision *
transformers *
yapf *

module_repos/Grounded-Segment-Anything/GroundingDINO/setup.py pypi

module_repos/Grounded-Segment-Anything/requirements.txt pypi

Pillow *
PyYAML *
addict *
diffusers *
fairscale *
gradio *
huggingface_hub *
litellm *
matplotlib *
nltk *
numpy *
onnxruntime *
opencv_python *
pycocotools *
requests *
setuptools *
supervision *
termcolor *
timm *
torch *
torchvision *
transformers *
yapf *

module_repos/Grounded-Segment-Anything/segment_anything/setup.py pypi

module_repos/Grounded-Segment-Anything/voxelnext_3d_box/requirements.txt pypi

easydict *
matplotlib *
numpy *
onnx *
onnxruntime *
opencv-python *
pycocotools *
pyyaml *
torch *
torchvision *

.github/workflows/release.yaml actions

actions/checkout v4 composite
actions/setup-python v5 composite
marvinpinto/action-automatic-releases latest composite
pypa/gh-action-pypi-publish release/v1 composite

.github/workflows/lint_check.yaml actions

actions/checkout v4 composite
actions/setup-python v4 composite
matias-martini/flake8-pr-comments-action main composite

module_repos/GLIP/pyproject.toml pypi

module_repos/LLaVA/pyproject.toml pypi

einops-exts ==0.0.4
fastapi *
gradio *
gradio_client *
markdown2 [all]
scikit-learn ==1.2.2
sentencepiece ==0.1.99
shortuuid *
uvicorn *

pyproject.toml pypi

requirements.txt pypi

accelerate *
bitsandbytes *
chardet *
fastapi *
ftfy *
gdown *
gradio >=5.0,<5.12
nltk *
numpy <2
ollama *
openai *
opencv-python *
pillow *
prettytable *
pycocotools *
pydantic *
python-dotenv *
python_dateutil *
requests *
rich *
scipy *
starlette *
tensorneko_util >=0.3.21,<0.4.0
timm ==0.9.16
tokenizers *
torch *
torchmetrics *
transformers *
uvicorn *
websockets *
word2number *
yacs *

setup.py pypi