https://github.com/chris-santiago/vime

Reproducing the VIME framework for self- and semi-supervised learning to tabular domain.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 1 committers (100.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

hydra pytorch pytorch-lightning self-supervised-learning semi-supervised-learning taskfile

Last synced: 5 months ago · JSON representation

Repository

Reproducing the VIME framework for self- and semi-supervised learning to tabular domain.

Basic Info

Host: GitHub
Owner: chris-santiago
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 13.2 MB

Statistics

Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

hydra pytorch pytorch-lightning self-supervised-learning semi-supervised-learning taskfile

Created over 2 years ago · Last pushed over 2 years ago

https://github.com/chris-santiago/vime/blob/master/

# VIME - PyTorch

This repo reproduces VIME framework for self- and semi-supervised learning to tabular domain.

*Authors: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar*

*Reference: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, "VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," Neural Information Processing Systems (NeurIPS), 2020.*

Original paper: https://proceedings.neurips.cc/paper/2020/hash/7d97667a3e056acab9aaf653807b4a03-Abstract.html

Original repo: https://github.com/jsyoon0823/VIME/tree/master

---------
## About

### Initial Implementation

This initial implementation follows the VIME self-supervised framework to train an encoder on
unlabeled MNIST data, which is then used to train a semi-supervised MLP on a much smaller portion
of labeled MNIST data. The final model is tested against the standard MNIST test set.

![](static/self-sl-block.png)
*Block diagram of the proposed self-supervised learning framework on tabular data. Credit: Yoon et al.*

The final model used only 10% of MNIST training set (n=6,000) as labeled data for the semi-supervised
learning and reached 93% classification accuracy on the test set. None of the hyperparameters were
optimized for this initial work.

![](static/semi-sl-block.png)
*Block diagram of the proposed semi-supervised learning framework on tabular data. Credit: Yoon et al.*

Full configuration listed in `outputs/vime-encoder/train_self/2023-05-26/10-09-22/.hydra/config.yaml`
for the self-supervised encoder and in `outputs/vime-learner/train_semi/2023-05-26/10-32-51/.hydra/config.yaml`
for the semi-supervised learner.

## Install

Clone this repository, create a new Conda environment and

```bash
git clone https://github.com/chris-santiago/vime.git
conda env create -f environment.yml
cd vime
pip install -e .
```

## Use

### Prerequisites

#### Task

This project uses [Task](https://taskfile.dev/) as a task runner. Though the underlying Python
commands can be executed without it, we recommend [installing Task](https://taskfile.dev/installation/)
for ease of use. Details located in `Taskfile.yml`.

#### Current commands

```bash
> task -l
task: Available tasks for this project:
* check-config: Check Hydra configuration
* train-multi: Launch multiple training jobs
* train-self: Train the VIME encoder module
* train-semi: Train the VIME semi-SL module
* wandb: Login to Weights & Biases
```

#### PDM

This project was built using [this cookiecutter](https://github.com/chris-santiago/cookie) and is
setup to use [PDM](https://pdm.fming.dev/latest/) for dependency management, though it's not required
for package installation.

#### Hydra

This project uses [Hydra](https://hydra.cc/docs/intro/) for managing configuration CLI arguments. See `vime/conf` for full
configuration details.

#### Weights and Biases

This project is set up to log experiment results with [Weights and Biases](https://wandb.ai/). It
expects an API key within a `.env` file in the root directory:

```toml
WANDB_KEY=
```

Users can configure different logger(s) within the `conf/trainer/default.yaml` file.

### Training

- Run `task train-self` to train the self-supervised encoder. Once complete, check the `outputs/vime-encoder/train_self/../checkpoints`
directory for path to saved checkpoint.
- Copy and paste this checkpoint into the semi-supervised model
config, located at `conf/model/learner.yaml` under the `nn.encoder_ckpt` key.
- Run `task train-semi` to train the semi-supervised encoder.

All results will populate to their respective output directories:

```
outputs
vime-encoder
train_self
2023-05-26
10-09-22
.hydra
checkpoints
wandb
vime-learner
train_semi
2023-05-26
10-32-51
.hydra
checkpoints
wandb
```

## Documentation

Documentation hosted on Github Pages: [https://chris-santiago.github.io/vime/](https://chris-santiago.github.io/vime/)

Owner

Name: Chris Santiago
Login: chris-santiago
Kind: user

Repositories: 64
Profile: https://github.com/chris-santiago

GitHub Events

Total

Last Year

Committers

Last synced: over 1 year ago

All Time

Total Commits: 37
Total Committers: 1
Avg Commits per committer: 37.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
chris-santiago	c**o@g**u	37

Committer Domains (Top 20 + Academic)

gatech.edu: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/chris-santiago/vime

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

https://github.com/chris-santiago/vime/blob/master/

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels