updown-baseline

Baseline model for nocaps benchmark, ICCV 2019 paper "nocaps: novel object captioning at scale".

https://github.com/nocaps-org/updown-baseline

Science Score: 28.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, aps.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary

Keywords

computer-vision iccv iccv-2019 iccv2019 image-captioning pytorch

Last synced: 4 months ago · JSON representation ·

Repository

Baseline model for nocaps benchmark, ICCV 2019 paper "nocaps: novel object captioning at scale".

Basic Info

Host: GitHub
Owner: nocaps-org
License: mit
Language: Python
Default Branch: master
Homepage: https://nocaps.org
Size: 633 KB

Statistics

Stars: 73
Watchers: 7
Forks: 12
Open Issues: 7
Releases: 0

Topics

computer-vision iccv iccv-2019 iccv2019 image-captioning pytorch

Created over 6 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

UpDown Captioner Baseline for `nocaps`

Baseline model for nocaps benchmark, a re-implementation based on the UpDown image captioning model trained on the COCO dataset (only), and with added support of decoding using Constrained Beam Search.

predictions generated by updown model

Citation

If you find this code useful, please consider citing our paper, the paper which proposed original model, and EvalAI — the platform which hosts our evaluation server. All bibtex available in CITATION.md.

Usage Instructions

Extensive documentation available at nocaps.org/updown-baseline. Use it as an API reference to navigate through and build on top of our code.

Results

Pre-trained checkpoints with the provided configs in (configs directory) are available to download:

UpDown Captioner (no CBS):

Checkpoint (.pth file): updown.pth
Predictions on nocaps val: updownnocapsval.json

Note: While CBS is inference-only technique, it cannot be used on this checkpoint. CBS requires models to have 300-dimensional froze GloVe embeddings, this checkpoint has 1000- dimensional word embeddings which are learned during training.

in-domain		near-domain		out-of-domain		overall
CIDEr	SPICE	CIDEr	SPICE	CIDEr	SPICE	BLEU1	BLEU4	METEOR	ROUGE	CIDEr	SPICE
78.1	11.6	57.7	10.3	31.3	8.3	73.7	18.3	22.7	50.4	55.3	10.1

UpDown Captioner + Constrained Beam Search:

Checkpoint (.pth file): updownpluscbs.pth

Note: Since CBS is inference-only technique, this particular checkpoint can be used without CBS decoding. It yields similar results to the UpDown Captioner trained using learned word embeddings during training.

With CBS Decoding:

Predictions on nocaps val: updownpluscbsnocapsvalwithcbs.json

in-domain		near-domain		out-of-domain		overall
CIDEr	SPICE	CIDEr	SPICE	CIDEr	SPICE	BLEU1	BLEU4	METEOR	ROUGE	CIDEr	SPICE
78.6	12.1	73.5	11.5	68.8	9.8	75.8	17.5	22.7	51.1	73.3	11.3

Without CBS Decoding:

Predictions on nocaps val: updownpluscbsnocapsvalwithoutcbs.json

in-domain		near-domain		out-of-domain		overall
CIDEr	SPICE	CIDEr	SPICE	CIDEr	SPICE	BLEU1	BLEU4	METEOR	ROUGE	CIDEr	SPICE
75.7	11.7	58.0	10.3	32.9	8.2	73.1	18.0	22.7	50.2	55.4	10.1

Owner

Name: nocaps
Login: nocaps-org
Kind: organization

Repositories: 3
Profile: https://github.com/nocaps-org

Citation (CITATION.md)

Citation
========

If you find this code useful, consider citing our `nocaps` paper:

```bibtex
@inproceedings{nocaps2019,
  author    = {Harsh Agrawal* and Karan Desai* and Yufei Wang and Xinlei Chen and Rishabh Jain and
             Mark Johnson and Dhruv Batra and Devi Parikh and Stefan Lee and Peter Anderson},
  title     = {{nocaps}: {n}ovel {o}bject {c}aptioning {a}t {s}cale},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year      = {2019}
}
```

As well as the paper that proposed this model: 

```bibtex
@inproceedings{Anderson2017up-down,
  author    = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson
               and Stephen Gould and Lei Zhang},
  title     = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
  booktitle = {Computer Vision and Pattern Recognition (CVPR)},
  year      = {2018}
}
```


If you evaluate your models on our `nocaps` benchmark, please consider citing
[EvalAI](https://evalai.cloudcv.org) — the platform which hosts our evaluation server:

```bibtex
@inproceedings{evalai,
    title   =  {EvalAI: Towards Better Evaluation Systems for AI Agents},
    author  =  {Deshraj Yadav and Rishabh Jain and Harsh Agrawal and Prithvijit
                Chattopadhyay and Taranjeet Singh and Akash Jain and Shiv Baran
                Singh and Stefan Lee and Dhruv Batra},
    booktitle = {Workshop on AI Systems at SOSP 2019}
    year    =  {2019},
}
```

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Dependencies

requirements.txt pypi

allennlp ==0.8.4
anytree ==2.6.0
cython ==0.29.1
evalai ==1.3.0
h5py ==2.8.0
mypy_extensions ==0.4.1
nltk ==3.4.3
numpy ==1.15.4
pillow ==6.2.0
tb-nightly *
tensorboardX ==1.7
torch ==1.1.0
tqdm ==4.28.1
yacs ==0.1.6

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

updown-baseline

Science Score: 28.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

UpDown Captioner Baseline for `nocaps`

Citation

Usage Instructions

Results

UpDown Captioner (no CBS):

UpDown Captioner + Constrained Beam Search:

With CBS Decoding:

Without CBS Decoding:

Owner

Citation (CITATION.md)

GitHub Events

Total

Last Year

Dependencies

updown-baseline

Science Score: 28.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

UpDown Captioner Baseline for nocaps

Citation

Usage Instructions

Results

UpDown Captioner (no CBS):

UpDown Captioner + Constrained Beam Search:

With CBS Decoding:

Without CBS Decoding:

Owner

Citation (CITATION.md)

GitHub Events

Total

Last Year

Dependencies

UpDown Captioner Baseline for `nocaps`