https://github.com/chris-santiago/vime

Reproducing the VIME framework for self- and semi-supervised learning to tabular domain.

https://github.com/chris-santiago/vime

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

hydra pytorch pytorch-lightning self-supervised-learning semi-supervised-learning taskfile
Last synced: 5 months ago · JSON representation

Repository

Reproducing the VIME framework for self- and semi-supervised learning to tabular domain.

Basic Info
  • Host: GitHub
  • Owner: chris-santiago
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 13.2 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
hydra pytorch pytorch-lightning self-supervised-learning semi-supervised-learning taskfile
Created over 2 years ago · Last pushed over 2 years ago

https://github.com/chris-santiago/vime/blob/master/

# VIME - PyTorch

This repo reproduces VIME framework for self- and semi-supervised learning to tabular domain.

*Authors: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar*

*Reference: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, "VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," Neural Information Processing Systems (NeurIPS), 2020.*

Original paper: https://proceedings.neurips.cc/paper/2020/hash/7d97667a3e056acab9aaf653807b4a03-Abstract.html

Original repo: https://github.com/jsyoon0823/VIME/tree/master

---------
## About

### Initial Implementation

This initial implementation follows the VIME self-supervised framework to train an encoder on
unlabeled MNIST data, which is then used to train a semi-supervised MLP on a much smaller portion
of labeled MNIST data. The final model is tested against the standard MNIST test set.


![](static/self-sl-block.png)
*Block diagram of the proposed self-supervised learning framework on tabular data. Credit: Yoon et al.*


The final model used only 10% of MNIST training set (n=6,000) as labeled data for the semi-supervised
learning and reached 93% classification accuracy on the test set.  None of the hyperparameters were 
optimized for this initial work. 


![](static/semi-sl-block.png)
*Block diagram of the proposed semi-supervised learning framework on tabular data. Credit: Yoon et al.*


Full configuration listed in `outputs/vime-encoder/train_self/2023-05-26/10-09-22/.hydra/config.yaml`
for the self-supervised encoder and in `outputs/vime-learner/train_semi/2023-05-26/10-32-51/.hydra/config.yaml`
for the semi-supervised learner.

## Install

Clone this repository, create a new Conda environment and 

```bash
git clone https://github.com/chris-santiago/vime.git
conda env create -f environment.yml
cd vime
pip install -e .
```

## Use

### Prerequisites

#### Task

This project uses [Task](https://taskfile.dev/) as a task runner. Though the underlying Python
commands can be executed without it, we recommend [installing Task](https://taskfile.dev/installation/)
for ease of use. Details located in `Taskfile.yml`.

#### Current commands

```bash
> task -l
task: Available tasks for this project:
* check-config:       Check Hydra configuration
* train-multi:        Launch multiple training jobs
* train-self:         Train the VIME encoder module
* train-semi:         Train the VIME semi-SL module
* wandb:              Login to Weights & Biases
```

#### PDM

This project was built using [this cookiecutter](https://github.com/chris-santiago/cookie) and is
setup to use [PDM](https://pdm.fming.dev/latest/) for dependency management, though it's not required
for package installation.

#### Hydra

This project uses [Hydra](https://hydra.cc/docs/intro/) for managing configuration CLI arguments. See `vime/conf` for full
configuration details.

#### Weights and Biases

This project is set up to log experiment results with [Weights and Biases](https://wandb.ai/). It
expects an API key within a `.env` file in the root directory:

```toml
WANDB_KEY=
```

Users can configure different logger(s) within the `conf/trainer/default.yaml` file.

### Training

- Run `task train-self` to train the self-supervised encoder. Once complete, check the `outputs/vime-encoder/train_self/../checkpoints`
directory for path to saved checkpoint. 
- Copy and paste this checkpoint into the semi-supervised model 
config, located at `conf/model/learner.yaml` under the `nn.encoder_ckpt` key.
- Run `task train-semi` to train the semi-supervised encoder. 

All results will populate to their respective output directories:

```
 outputs
  vime-encoder
   train_self
       2023-05-26
           10-09-22
               .hydra
               checkpoints
               wandb
  vime-learner
      train_semi
          2023-05-26
              10-32-51
               .hydra
               checkpoints
               wandb
```

## Documentation

Documentation hosted on Github Pages: [https://chris-santiago.github.io/vime/](https://chris-santiago.github.io/vime/)

Owner

  • Name: Chris Santiago
  • Login: chris-santiago
  • Kind: user

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 37
  • Total Committers: 1
  • Avg Commits per committer: 37.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
chris-santiago c****o@g****u 37
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • chris-santiago (5)
Top Labels
Issue Labels
Pull Request Labels