https://github.com/animesh/tabular-dl-pretrain-objectives

Revisiting Pretrarining Objectives for Tabular Deep Learning

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Revisiting Pretrarining Objectives for Tabular Deep Learning

Basic Info

Host: GitHub
Owner: animesh
Default Branch: master
Homepage: https://arxiv.org/abs/2207.03208
Size: 36.1 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Fork of puhsu/tabular-dl-pretrain-objectives

Created over 3 years ago · Last pushed almost 4 years ago

https://github.com/animesh/tabular-dl-pretrain-objectives/blob/master/

# Revisiting Pretraining Objectives for Tabular Deep Learning
This is the official code for our paper "Revisiting Pretraining Objectives for Tabular Deep Learning" ([paper](https://arxiv.org/abs/2207.03208))

**Check out other projects on tabular Deep Learning:** [link](https://github.com/Yura52/rtdl#papers-and-projects).

Feel free to report [issues](https://github.com/puhsu/tabular-dl-pretrain-objectives/issues) and post [questions/feedback/ideas](https://github.com/puhsu/tabular-dl-pretrain-objectives/discussions).

## Results
You can view all the results and build your own tables with this [notebook](notebooks/Reports.ipynb).

## Setup the environment
1. Install [conda](https://docs.conda.io/en/latest/miniconda.html) (just to manage the env).
2. Run the following commands
    ```bash
    export REPO_DIR=/path/to/the/code
    cd $REPO_DIR

    conda create -n tdl python=3.9.7
    conda activate tdl

    pip install torch==1.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
    pip install -r requirements.txt

    # if the following commands do not succeed, update conda
    conda env config vars set PYTHONPATH=${PYTHONPATH}:${REPO_DIR}
    conda env config vars set PROJECT_DIR=${REPO_DIR}
    
    conda activate tdl
    ```

## Running the experiments

Here we describe the neccesary info for reproducing the experimental results.

### Datasets

We upload the datasets used in the paper with our train/val/test splits [here](https://www.dropbox.com/s/cj9ex11u6ri0tdy/tabular-pretrains-data.tar?dl=1). We do not impose additional restrictions to the original dataset licenses, the sources of the data are listed in the paper appendix.

You could load the datasets with the following commands:

``` bash
conda activate tdl
cd $PROJECT_DIR
wget "https://www.dropbox.com/s/cj9ex11u6ri0tdy/tabular-pretrains-data.tar?dl=1" -O tabular-pretrains-data.tar
tar -xvf tabular-pretrains-data.tar
```

### File structure

There are multiple scripts inside the `bin` directory for various pretraining objectives, finetuning from checkpoints (same script is also used to train from scratch) and GBDT baselines.

Each pretraining script follows the same structure. It constructs different models given their configs (MLPs, MLPs with numerical embeddings, ResNets, Transformers) and pretrains them with periodically calling the finetune script for early stopping (or finetuning only at the end if `early_stop_type = "pretrain"` is specified in config).

There are two variations of each script: single GPU and DDP multi-GPU (used for large dataset and models with embeddings), which are identical, except DDP related modifications. 

- `bin/finetune.py` are used to train models from scratch, or finetune pretrained checkpoints
- `bin/contrastive.py` -- contrastive objective.
- `bin/[rec|mask]_(supervised)` -- self-prediction objective variations

### Example
To run the target-aware mask prediction pretraining on the california housing dataset you could run the following code snippet. It will clone the tuning config, then tune and evaluate mlp-plr with target-aware mask prediction pretraining and create the ensemble

``` bash
conda activate tdl
cd $PROJECT_DIR
mkdir -p exp/draft
cp exp/mask-target/mlp-p-lr/california/3_tuning.toml exp/draft/example_tuning.toml

export CUDA_VISIBLE_DEVICES=0
python bin/tune.py exp/draft/example_tuning.toml
python bin/evaluate.py exp/draft/example_tuning 15
python bin/ensemble.py exp/draft/example_evaluation
```

Owner

Name: Ani
Login: animesh
Kind: user
Location: Norway
Company: Norwegian University of Science and Technology

Website: https://www.fuzzylife.org
Twitter: animesh1977
Repositories: 749
Profile: https://github.com/animesh

A medical graduate from Delhi University with post-graduation in bioinformatics from Jawaharlal Nehru University, India.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/animesh/tabular-dl-pretrain-objectives

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/animesh/tabular-dl-pretrain-objectives/blob/master/

Owner

GitHub Events

Total

Last Year