https://github.com/astorfi/show-attend-and-tell

TensorFlow Implementation of "Show, Attend and Tell"

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

TensorFlow Implementation of "Show, Attend and Tell"

Basic Info

Host: GitHub
Owner: astorfi
License: mit
Language: Jupyter Notebook
Default Branch: master
Homepage:
Size: 49 MB

Statistics

Stars: 0
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Fork of yunjey/show-attend-and-tell

Created almost 9 years ago · Last pushed about 9 years ago

https://github.com/astorfi/show-attend-and-tell/blob/master/

# Show, Attend and Tell 
 Update (December 2, 2016) TensorFlow implementation of [Show, Attend and Tell: Neural Image Caption Generation with Visual Attention](http://arxiv.org/abs/1502.03044) which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.




![alt text](jpg/attention_over_time.jpg "soft attention")





## References

Author's theano code: https://github.com/kelvinxu/arctic-captions 

Another tensorflow implementation: https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow





## Getting Started

### Prerequisites

First, clone this repo and [pycocoevalcap](https://github.com/tylin/coco-caption.git) in same directory.

```bash
$ git clone https://github.com/yunjey/show-attend-and-tell-tensorflow.git
$ git clone https://github.com/tylin/coco-caption.git
```

This code is written in Python2.7 and requires [TensorFlow](https://www.tensorflow.org/versions/r0.11/get_started/os_setup.html#anaconda-installation). In addition, you need to install a few more packages to process [MSCOCO data set](http://mscoco.org/home/). I have provided a script to download the MSCOCO image dataset and [VGGNet19 model](http://www.vlfeat.org/matconvnet/pretrained/). Downloading the data may take several hours depending on the network speed. Run commands below then the images will be downloaded in `image/` directory and VGGNet19 model will be downloaded in `data/` directory.

```bash
$ cd show-attend-and-tell-tensorflow
$ pip install -r requirements.txt
$ chmod +x ./download.sh
$ ./download.sh
```


For feeding the image to the VGGNet, you should resize the MSCOCO image dataset to the fixed size of 224x224. Run command below then resized images will be stored in `image/train2014_resized/` and `image/val2014_resized/` directory.

```bash
$ python resize.py
```

Before training the model, you have to preprocess the MSCOCO caption dataset.
To generate caption dataset and image feature vectors, run command below.

```bash
$ python prepro.py
```



### Train the model 

To train the image captioning model, run command below. 

```bash
$ python train.py
```



### (optional) Tensorboard visualization

I have provided a tensorboard visualization for real-time debugging.
Open the new terminal, run command below and open `http://localhost:6005/` into your web browser.

```bash
$ tensorboard --logdir='./log' --port=6005 
```



### Evaluate the model 

To generate captions, visualize attention weights and evaluate the model, please see `evaluate_model.ipynb`.





## Results
 



#### Training data

##### (1) Generated caption: A plane flying in the sky with a landing gear down.
![alt text](jpg/train2.jpg "train image")

##### (2) Generated caption: A giraffe and two zebra standing in the field.
![alt text](jpg/train.jpg "train image")

#### Validation data

##### (1) Generated caption: A large elephant standing in a dry grass field.
![alt text](jpg/val.jpg "val image")

##### (2) Generated caption: A baby elephant standing on top of a dirt field.
![alt text](jpg/val2.jpg "val image")

#### Test data

##### (1) Generated caption: A plane flying over a body of water.
![alt text](jpg/test.jpg "test image")

##### (2) Generated caption: A zebra standing in the grass near a tree.
![alt text](jpg/test2.jpg "test image")

Owner

Name: Sina Torfi
Login: astorfi
Kind: user
Location: San Jose
Company: Meta

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/astorfi/show-attend-and-tell

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/astorfi/show-attend-and-tell/blob/master/

Owner

GitHub Events

Total

Last Year