https://github.com/astorfi/show-attend-and-tell
TensorFlow Implementation of "Show, Attend and Tell"
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
TensorFlow Implementation of "Show, Attend and Tell"
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Fork of yunjey/show-attend-and-tell
Created almost 9 years ago
· Last pushed about 9 years ago
https://github.com/astorfi/show-attend-and-tell/blob/master/
# Show, Attend and Tell Update (December 2, 2016) TensorFlow implementation of [Show, Attend and Tell: Neural Image Caption Generation with Visual Attention](http://arxiv.org/abs/1502.03044) which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.

## References Author's theano code: https://github.com/kelvinxu/arctic-captions Another tensorflow implementation: https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow
## Getting Started ### Prerequisites First, clone this repo and [pycocoevalcap](https://github.com/tylin/coco-caption.git) in same directory. ```bash $ git clone https://github.com/yunjey/show-attend-and-tell-tensorflow.git $ git clone https://github.com/tylin/coco-caption.git ``` This code is written in Python2.7 and requires [TensorFlow](https://www.tensorflow.org/versions/r0.11/get_started/os_setup.html#anaconda-installation). In addition, you need to install a few more packages to process [MSCOCO data set](http://mscoco.org/home/). I have provided a script to download the MSCOCO image dataset and [VGGNet19 model](http://www.vlfeat.org/matconvnet/pretrained/). Downloading the data may take several hours depending on the network speed. Run commands below then the images will be downloaded in `image/` directory and VGGNet19 model will be downloaded in `data/` directory. ```bash $ cd show-attend-and-tell-tensorflow $ pip install -r requirements.txt $ chmod +x ./download.sh $ ./download.sh ``` For feeding the image to the VGGNet, you should resize the MSCOCO image dataset to the fixed size of 224x224. Run command below then resized images will be stored in `image/train2014_resized/` and `image/val2014_resized/` directory. ```bash $ python resize.py ``` Before training the model, you have to preprocess the MSCOCO caption dataset. To generate caption dataset and image feature vectors, run command below. ```bash $ python prepro.py ```
### Train the model To train the image captioning model, run command below. ```bash $ python train.py ```
### (optional) Tensorboard visualization I have provided a tensorboard visualization for real-time debugging. Open the new terminal, run command below and open `http://localhost:6005/` into your web browser. ```bash $ tensorboard --logdir='./log' --port=6005 ```
### Evaluate the model To generate captions, visualize attention weights and evaluate the model, please see `evaluate_model.ipynb`.
## Results
#### Training data ##### (1) Generated caption: A plane flying in the sky with a landing gear down.  ##### (2) Generated caption: A giraffe and two zebra standing in the field.  #### Validation data ##### (1) Generated caption: A large elephant standing in a dry grass field.  ##### (2) Generated caption: A baby elephant standing on top of a dirt field.  #### Test data ##### (1) Generated caption: A plane flying over a body of water.  ##### (2) Generated caption: A zebra standing in the grass near a tree. 
Owner
- Name: Sina Torfi
- Login: astorfi
- Kind: user
- Location: San Jose
- Company: Meta
- Website: https://astorfi.github.io/
- Repositories: 196
- Profile: https://github.com/astorfi
PhD & Developer working on Deep Learning, Computer Vision & NLP