https://github.com/cyberagentailab/flex-dm
Towards Flexible Multi-modal Document Models [Inoue+, CVPR2023]
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Keywords
cvpr2023
generative-ai
tensorflow
Last synced: 9 months ago
·
JSON representation
Repository
Towards Flexible Multi-modal Document Models [Inoue+, CVPR2023]
Basic Info
- Host: GitHub
- Owner: CyberAgentAILab
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://cyberagentailab.github.io/flex-dm
- Size: 11.7 MB
Statistics
- Stars: 35
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
cvpr2023
generative-ai
tensorflow
Created about 3 years ago
· Last pushed over 2 years ago
https://github.com/CyberAgentAILab/flex-dm/blob/main/
# Towards Flexible Multi-modal Document Models (CVPR2023) This repository is an official implementation of the paper titled above. Please refer to [project page](https://cyberagentailab.github.io/flex-dm/) or [paper](https://arxiv.org/abs/2303.18248) for more details. ## Setup ### Requirements We check the reproducibility under this environment. - Python3.7 - CUDA 11.3 - Tensorflow 2.8 ### How to install Install python dependencies. Perhaps this should be done inside `venv`. ```bash pip install -r requirements.txt ``` Note that Tensorflow has a version-specific system requirement for GPU environment. Check if the [compatible CUDA/CuDNN runtime](https://www.tensorflow.org/install/source#gpu) is installed. ## Crello experiments To try demo on pre-trained models - download pre-processed datasets for [crello](https://storage.googleapis.com/ailab-public/flexdm/preprocessed_data/crello.zip) / [rico](https://storage.googleapis.com/ailab-public/flexdm/preprocessed_data/rico.zip) and unzip it under `./data`. - download pre-trained checkpointsfor [crello](https://storage.googleapis.com/ailab-public/flexdm/pretrained_weights/crello.zip) / [rico](https://storage.googleapis.com/ailab-public/flexdm/pretrained_weights/rico.zip) and unzip it under `./results`. ### DEMO You can test some tasks using the pre-trained models in the [notebook](./notebooks/demo_crello.ipynb). ### Training You can train your own model. The trainer script takes a few arguments to control hyperparameters. See `src/mfp/mfp/args.py` for the list of available options. If the script slows an out-of-memory error, please make sure other processes do not occupy GPU memory and adjust `--batch_size`. ```bash bin/train_mfp.sh crello --masking_method random # Ours-IMP bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt # Ours-EXP bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt --weights# Ours-EXP-FT ``` The trainer outputs logs, evaluation results, and checkpoints to `tmp/mfp/jobs/ `. The training progress can be monitored via `tensorboard`. ### Evaluation You perform quantitative evaluation. ```bash bin/eval_mfp.sh --job_dir ( ) ``` See [eval.py](https://github.com/CyberAgentAILab/flex-dm/blob/main/eval.py#L122-L134) for ` `. ## RICO experiments ### DEMO You can test some tasks using the pre-trained models in the [notebook](./notebooks/demo_rico.ipynb). ### Training The process is almost similar as above. ```bash bin/train_mfp.sh rico --masking_method random # Ours-IMP bin/train_mfp.sh rico --masking_method elem_pos_attr # Ours-EXP bin/train_mfp.sh rico --masking_method elem_pos_attr --weights # Ours-EXP-FT ``` ### Evaluation The process is similar as above. ## Citation If you find this code useful for your research, please cite our paper. ``` @inproceedings{inoue2023document, title={{Towards Flexible Multi-modal Document Models}}, author={Naoto Inoue and Kotaro Kikuchi and Edgar Simo-Serra and Mayu Otani and Kota Yamaguchi}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2023}, pages={14287-14296}, } ```
Owner
- Name: CyberAgent AI Lab
- Login: CyberAgentAILab
- Kind: organization
- Location: Japan
- Website: https://cyberagent.ai/ailab/
- Twitter: cyberagent_ai
- Repositories: 7
- Profile: https://github.com/CyberAgentAILab
GitHub Events
Total
- Issues event: 2
- Watch event: 4
- Issue comment event: 2
- Fork event: 1
Last Year
- Issues event: 2
- Watch event: 4
- Issue comment event: 2
- Fork event: 1
Dependencies
requirements.txt
pypi
- PyYAML *
- dacite *
- einops *
- faiss-gpu *
- fsspec *
- httplib2 ==0.19.0
- jupyter *
- matplotlib *
- numpy *
- selenium *
- tensorflow-gpu *
- tensorflow_probability *
- tinycss *