https://github.com/amacaluso/ssb-vae

Self-Supervised Bernoulli Autoencoders for Semi-Supervised Hashing: we investigate the robustness of hashing methods based on variational autoencoders to the lack of supervision, focusing on two semi-supervised approaches currently in use. In addition, we propose a novel supervision approach in which the model uses its own predictions of the label distribution to implement the pairwise objective. Compared to the best baseline, this procedure yields similar performance in fully-supervised settings but improves significantly the results when labelled data is scarce.

https://github.com/amacaluso/ssb-vae

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

binary-variational-autoencoder deep-learning dimension-reduction neural-networks variational-autoencoder
Last synced: 5 months ago · JSON representation

Repository

Self-Supervised Bernoulli Autoencoders for Semi-Supervised Hashing: we investigate the robustness of hashing methods based on variational autoencoders to the lack of supervision, focusing on two semi-supervised approaches currently in use. In addition, we propose a novel supervision approach in which the model uses its own predictions of the label distribution to implement the pairwise objective. Compared to the best baseline, this procedure yields similar performance in fully-supervised settings but improves significantly the results when labelled data is scarce.

Basic Info
  • Host: GitHub
  • Owner: amacaluso
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 35 MB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Topics
binary-variational-autoencoder deep-learning dimension-reduction neural-networks variational-autoencoder
Created over 5 years ago · Last pushed almost 5 years ago

https://github.com/amacaluso/SSB-VAE/blob/master/

# SSB-VAE: Self-Supervised Bernoulli Autoencoders for Semi-Supervised Hashing

This repository contains the code to reproduce the results presented in the paper 
[*Self-Supervised Bernoulli Autoencoders for Semi-Supervised Hashing*](https://arxiv.org/abs/2007.08799).

# Description

We investigate the robustness of hashing methods based on variational autoencoders 
to the lack of supervision, focusing on two semi-supervised approaches currently in use. 
In addition, we propose a novel supervision approach in which the model uses 
its own predictions of the label distribution to implement the pairwise objective. Compared to the best 
baseline, this procedure yields similar performance in 
fully-supervised settings but improves significantly the results when labelled data is scarce.


# Usage

The code is organised in four different scripts, one per dataset. 
Specifically, the scripts *test_model_[**data**].py* considers the dataset **data** and take as input 
the following parameters:


- *M* is the index of the model considered. In particular, we compare three semi-supervised
 methods based on variational autoencoders: *(M=1)* **VDHS-S** is a variational autoencoder 
 proposed in [[1]](#1) that employs Gaussian latent variables, unsupervised learning and pointwise supervision; 
 *(M=2)* **PHS-GS** is a variational autoencoder proposed in [[2]](#2) that assumes Bernoulli latent variables, 
 unsupervised learning, and both pointwise and pairwise supervision; 
 and *(M=3)* **SSB-VAE** is our proposed method based on Bernoulli latent variable, unsupervised learning, pointwise 
 supervision and self-supervision.

- *p* is the level (percentage) of supervision to consider when training the autoencoder based on semi-supervised approach.
- *a*, *b* and *g* a,b,g are the hyperparameters associated to the different components of the semi-supervised
 loss. In particular, *a* is the coefficient of the pointwise component, *g* is associated to the pairwise component 
 and *b* is the weight associated to the KL divergence when computing the unsupervised loss
- *r* is the number of experiments to perform, given a set of parameters. This is used to compute an average performance
considering multiple initialisation of the same neural network. Notice that the results reported in the paper are 
computing by averaging *r=5* experiments.
- *l* is the size of the latent sub-space generated by the encoder. This also corresponds to the number of bits of 
the generated hash codes.
- *o* is the file where the results are stored

The script utils.py is used to import the needed packages and all of the custom routines for performance evaluation.

The script base_networks.py contains the custom routines to define all the components of a neural networks.

The script supervised_BAE.py defines the three types of autoencoder (*VDSH, PHS, SSB-VAE*).

The *.sh files allow to run the all the experiments reported in the paper. In particular, 
 the test_all_[**data**]-[**n**]bits.sh compute *r* times the prediction of the three methods (*VDSH, PHS, SSB-VAE*), 
 given a dataset (**data**) and the number of bits **n**, for different levels of supervision *p = 0.1, 0.2, ... , 0.9, 1.0*

The script post_processing.py allows to collect all the results provided by the *.sh files and it computes the
 tables as reported in the paper.


## Requirements

Python 3.7

Tensorflow 2.1

## Execution

In order to obtain the results reported in the paper it is necessary execute all the *.sh files as follows:  
  ```
# run all *.sh files
./test_all_20news-16bits.sh
./test_all_20news-32bits.sh
./test_all_snippets-16bits.sh
./test_all_snippets-32bits.sh
./test_all_TMC-16bits.sh
./test_all_TMC-32bits.sh
./test_all_cifar-16bits.sh
./test_all_cifar-32bits.sh

```

At the end of the computation, the csv files containing the results are generated according to the *-o*
parameter. Finally the script post_processing.py collects all the csv and save a new csv having the same format 
 of the two table reported in the paper.

## References
[1] 
 S. Chaidaroon and Y. Fang. Variational deep semantic hashing for text documents. SIGIR. 2017, pp. 7584. 

[2]  S. Z. Dadaneh et al. Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and Self-Control Gradient Estimator. Proc. UAI. 2020

Owner

  • Name: Antonio Macaluso
  • Login: amacaluso
  • Kind: user
  • Location: Saarbrücken, Germany
  • Company: German Research Center for Artificial Intelligence

Senior Researcher in Quantum Artificial Intelligence | PhD in Computer Science and Engineering

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 5 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: 25 days
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ShahrzadZolghadr (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels