https://github.com/cyberagentailab/flex-dm

Towards Flexible Multi-modal Document Models [Inoue+, CVPR2023]

https://github.com/cyberagentailab/flex-dm

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords

cvpr2023 generative-ai tensorflow
Last synced: 9 months ago · JSON representation

Repository

Towards Flexible Multi-modal Document Models [Inoue+, CVPR2023]

Basic Info
Statistics
  • Stars: 35
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
cvpr2023 generative-ai tensorflow
Created about 3 years ago · Last pushed over 2 years ago

https://github.com/CyberAgentAILab/flex-dm/blob/main/

# Towards Flexible Multi-modal Document Models (CVPR2023)
This repository is an official implementation of the paper titled above. Please refer to [project page](https://cyberagentailab.github.io/flex-dm/) or [paper](https://arxiv.org/abs/2303.18248) for more details.

## Setup

### Requirements
We check the reproducibility under this environment.
- Python3.7
- CUDA 11.3
- Tensorflow 2.8

### How to install
Install python dependencies. Perhaps this should be done inside `venv`.

```bash
pip install -r requirements.txt
```

Note that Tensorflow has a version-specific system requirement for GPU environment.
Check if the
[compatible CUDA/CuDNN runtime](https://www.tensorflow.org/install/source#gpu) is installed.


## Crello experiments
To try demo on pre-trained models
- download pre-processed datasets for [crello](https://storage.googleapis.com/ailab-public/flexdm/preprocessed_data/crello.zip) / [rico](https://storage.googleapis.com/ailab-public/flexdm/preprocessed_data/rico.zip) and unzip it under `./data`.
- download pre-trained checkpointsfor [crello](https://storage.googleapis.com/ailab-public/flexdm/pretrained_weights/crello.zip) / [rico](https://storage.googleapis.com/ailab-public/flexdm/pretrained_weights/rico.zip) and unzip it under `./results`.

### DEMO
You can test some tasks using the pre-trained models in the [notebook](./notebooks/demo_crello.ipynb).

### Training
You can train your own model.
The trainer script takes a few arguments to control hyperparameters.
See `src/mfp/mfp/args.py` for the list of available options.
If the script slows an out-of-memory error, please make sure other processes do not occupy GPU memory and adjust `--batch_size`.

```bash
bin/train_mfp.sh crello --masking_method random  # Ours-IMP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt  # Ours-EXP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt --weights    # Ours-EXP-FT
```

The trainer outputs logs, evaluation results, and checkpoints to `tmp/mfp/jobs/`.
The training progress can be monitored via `tensorboard`.

### Evaluation
You perform quantitative evaluation.
```bash
bin/eval_mfp.sh --job_dir  ()
```
See [eval.py](https://github.com/CyberAgentAILab/flex-dm/blob/main/eval.py#L122-L134) for ``.

## RICO experiments

### DEMO
You can test some tasks using the pre-trained models in the [notebook](./notebooks/demo_rico.ipynb).

### Training
The process is almost similar as above.
```bash
bin/train_mfp.sh rico --masking_method random  # Ours-IMP
bin/train_mfp.sh rico --masking_method elem_pos_attr  # Ours-EXP
bin/train_mfp.sh rico --masking_method elem_pos_attr --weights   # Ours-EXP-FT
```

### Evaluation
The process is similar as above.

## Citation

If you find this code useful for your research, please cite our paper.

```
@inproceedings{inoue2023document,
    title={{Towards Flexible Multi-modal Document Models}},
    author={Naoto Inoue and Kotaro Kikuchi and Edgar Simo-Serra and Mayu Otani and Kota Yamaguchi},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2023},
    pages={14287-14296},
  }
```

Owner

  • Name: CyberAgent AI Lab
  • Login: CyberAgentAILab
  • Kind: organization
  • Location: Japan

GitHub Events

Total
  • Issues event: 2
  • Watch event: 4
  • Issue comment event: 2
  • Fork event: 1
Last Year
  • Issues event: 2
  • Watch event: 4
  • Issue comment event: 2
  • Fork event: 1

Dependencies

requirements.txt pypi
  • PyYAML *
  • dacite *
  • einops *
  • faiss-gpu *
  • fsspec *
  • httplib2 ==0.19.0
  • jupyter *
  • matplotlib *
  • numpy *
  • selenium *
  • tensorflow-gpu *
  • tensorflow_probability *
  • tinycss *