platalea

Library for training visually-grounded models of spoken language understanding.

https://github.com/spokenlanguage/platalea

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.9%) to scientific vocabulary

Keywords

deep-neural-networks flickr8k multi-tasking multimodal-learning pytorch speech-processing spoken-language-understanding spokencoco visually-grounded-speech weakly-supervised-learning

Last synced: 10 months ago · JSON representation ·

Repository

Library for training visually-grounded models of spoken language understanding.

Basic Info

Host: GitHub
Owner: spokenlanguage
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 3.03 MB

Statistics

Stars: 3
Watchers: 0
Forks: 1
Open Issues: 17
Releases: 4

Topics

deep-neural-networks flickr8k multi-tasking multimodal-learning pytorch speech-processing spoken-language-understanding spokencoco visually-grounded-speech weakly-supervised-learning

Created over 6 years ago · Last pushed about 4 years ago

Metadata Files

Readme Changelog Contributing License Citation

Platalea

Understanding visually grounded spoken language via multi-tasking

Installation

Clone this repo and cd into it:

sh git clone https://github.com/spokenlanguage/platalea.git cd platalea

To install in a conda environment, assuming conda has already been installed, run the following to download and install dependencies:

sh conda create -n platalea python==3.8 pytorch -c conda-forge -c pytorch conda activate platalea pip install torchvision

Then install platalea with:

sh pip install .

Experiment dependencies

Different experiments may have different additional dependencies. The basic experiment needs the following:

sh pip install sklearn python-Levenshtein

Datasets

Flickr8K

The repository has been developed to work with Flickr8K dataset. The code can be made to work with other datasets but this will require some adaptations.

To use Flickr8K, you need to download: * Flickr8K [1]. Note that downloading from the official website seems broken at the moment. Alternatively, the dataset can be obtained from here. * The Flickr Audio Caption Corpus [2]. * Some additional metadata files.

Create a folder to store the dataset (we will assume here that the folder is ~/corpora/flickr8k) and move all the files you downloaded there, then extract the content of the archives. You can now setup the environment and start preprocessing the data.

Configuration

We use ConfigArgParse for setting necessary input variables, including the location of the dataset. This means you can use either a configuration file (config.ini or config.yml), environment variables or command line arguments to specify the necessary configuration parameters.

To specify the location of the dataset, one option is to create a configuration file under your home directory (~/.config/platalea/config.yml)., with follwing content:

flickr8k_root /home/<user>/corpora/flickr8k

The same result can be achieved with an environment variable:

sh export FLICKR8K_ROOT=/home/<user>/corpora/flickr8k

You could also specify this option directly on the command line when running an experiment (the respective options would be --flickr8k_root=...).

Preprocessing

Run the preprocessing script to extract input features:

bash python platalea/utils/preprocessing.py flickr8k

Howto100Men-cc

This repository has support for a subset of the howto100M dataset. The subset contains all videos with creative commons license that claim to be in english language according to their metadata.

Sampling rate of the audio features is 100Hz
Sampling rate of the video features is 1Hz.
A dataset item is defined as the combination of video and audio for a fragment of 3 seconds.

Preprocessing

This code contains functionality for extracting audio features from the videos. These files need to be in the datafolder before preprocessing the dataset. Preprocessing will create an index file with references to the video feature files. The video features (S3D) need to be acquired elsewhere. Videos from howto100m need to be downloaded from youtube using the metadata from the howto100m dataset provided above.

To start preprocessing, run the following:

sh python -m platalea.utils.preprocessing howto100m-encc --howto100m_root /corpora/howto100m/

Running/Training

Running any experiment using the howto100M dataset has not yet been implemented.

Training

You can now train a model using one of the examples provided under platalea/experiments, e.g.:

sh cd platalea/experiments/flickr8k mkdir -p runs/test cd runs/test python -m platalea.experiments.flickr8k.basic

After the model is trained, results are available in results.json.

Weights and Biases (wandb)

Some experiments support the use of wandb for cloud logging of results. In the examples we provide under platalea/experiments, this option is disabled by default. To force-enable it, the call to experiment() should be changed from experiment(..., wandb_mode='disabled') to experiment(..., wandb_mode='online'). To default back to wandb normal behavior (where the mode can be set through command line or environment variable), use wandb_mode=None.

Contributing

If you want to contribute to the development of platalea, have a look at the contribution guidelines.

Changelog

We keep track of what is added, changed and removed in releases in the changelog.

References

[1] Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. Journal of Artificial Intelligence Research, 47, 853–899. https://doi.org/10.1613/jair.3994.

[2] Harwath, D., & Glass, J. (2015). Deep multimodal semantic embeddings for speech and images. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 237–244. https://doi.org/10.1109/ASRU.2015.7404800.

Owner

Name: Spoken Language
Login: spokenlanguage
Kind: organization

Website: https://www.esciencecenter.nl/project/understanding-visually-grounded-spoken-language-via-multi-taskin
Repositories: 2
Profile: https://github.com/spokenlanguage

NLeSC & UvT Project: Understanding visually grounded spoken language via multi-tasking

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Platalea
message: >-
  Please cite this software when using it for your
  research.
type: software
authors:
  - given-names: Grzegorz
    family-names: Chrupała
    email: g.a.chrupala@tilburguniversity.edu
    affiliation: Tilburg University
    orcid: 'https://orcid.org/0000-0001-9498-6912'
  - given-names: ' Bertrand'
    family-names: Higy
    email: b.j.r.higy@tilburguniversity.edu
    affiliation: Tilburg University
    orcid: 'https://orcid.org/0000-0002-8198-8676'
  - given-names: E. G. Patrick
    family-names: Bos
    email: p.bos@esciencecenter.nl
    affiliation: Netherlands eScience Center
    orcid: 'https://orcid.org/0000-0002-6033-960X'
  - given-names: Christiaan
    family-names: Meijer
    email: c.meijer@esciencecenter.nl
    affiliation: Netherlands eScience Center
    orcid: 'https://orcid.org/0000-0002-5529-5761'
  - given-names: Marvin
    family-names: Lavechin
  - given-names: Lieke
    family-names: Gelderloos
    email: l.j.gelderloos@tilburguniversity.edu
    affiliation: Tilburg University
doi: 10.5281/zenodo.4311601

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

ConfigArgParse ==1.2.3
Cython ==0.29.17
Pillow ==8.1.1
PyYAML ==5.3.1
click ==7.1.2
editdistance ==0.5.3
h5py ==2.10.0
intervaltree ==3.0.2
joblib ==0.14.1
moviepy *
nltk ==3.5
numexpr ==2.7.1
numpy ==1.18.4
pandas ==1.0.3
python-Levenshtein ==0.12.0
python-dateutil ==2.8.1
pytz ==2020.1
regex ==2020.4.4
scikit-learn ==0.22.1
scipy ==1.4.1
six ==1.14.0
sortedcontainers ==2.1.0
soundfile ==0.10.3.post1
tables ==3.6.1
torch ==1.8.1
torchvision ==0.4.0
tqdm ==4.46.0
traitlets ==4.3.3
ursa *
wandb ==0.10.10
wcwidth ==0.1.9

setup.py pypi

configargparse >=1.0
nltk >=3.4.5
numpy >=1.17.2
python-Levenshtein >=0.12.0
scikit-learn ==0.22.1
scipy >=1.3.1
soundfile >=0.10.3
torch ==1.8.1
torchvision >=0.4.0
wandb >=0.10.10

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

platalea

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Platalea

Installation

Experiment dependencies

Datasets

Flickr8K

Configuration

Preprocessing

Howto100Men-cc

Preprocessing

Running/Training

Training

Weights and Biases (wandb)

Contributing

Changelog

References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies