Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: ytwang13
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 5.6 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Citation

README.md

Introduction

in this repo, we provide code for project " Investigating masked model in representation learning and anti-forgetting". Here we will provide guidance for installation, peforming experiments and some results.

Install

Please refer to MMpretrain for more details, below are the copy of theirs:

Below are quick steps for installation:

shell conda create -n open-mmlab python=3.8 pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -y conda activate open-mmlab pip install openmim git clone https://github.com/open-mmlab/mmpretrain.git cd mmpretrain mim install -e .

Please refer to installation documentation for more detailed installation and dataset preparation.

For multi-modality models support, please install the extra dependencies by:

shell mim install -e ".[multimodal]" For wandb log, please install wandb using the following shell pip install wandb wandb login and uncomment the following line from the corresponding config file in cifar-img/dlres18exp ```shell

vis_backends = dict(type='WandbVisBackend', xxxxxxxxx

```


Experiments

All experiments can be seen in tools/res_trails directory \ All related configurations can be seen in cifar-img/dlres18exp

Baseline - naive training /KD training

The experiments of naive CE training can be found in /tools/cltrails/baselineCE \ Corresponding configs are in cifar-img/dlres18exp/baseline shell bash /---your---own---working--dir--/tools/cl_trails/baseline_CE/run_baselinece.sh The experiments of KD training can be found in tools/cltrails/baselineKD \ Corresponding configs are in cifar-img/dlres18exp/baseline ```shell

reference model basic swap and freeze

bash /---your---own---working--dir--/tools/cltrails/baselineKD/run_kdbaseline.sh

change the reference model using EMA

bash /---your---own---working--dir--/tools/cltrails/baselineKD/runkdemavar.sh 0.1 #emaratio

${emaratios[$SLURMARRAYTASKID]} #for slurm array job

for further classifier only(clso: we only use the classifer head as reference model) experiments, just change the config: model.traincfg.clsonly=True

```

Masked - Loss_msk type, Mode, Ratio

The masked model in this work can be illustrated in the following figure: <!-- Alt text --> Title\ <!-- where the backbone and classifier (task) head are denoted by black trapezoids, and a black rectangle represents the hidden/output feature. Here, the gray arrow denotes the baseline CE pipeline, and the purple arrow shows the KD/Masked reference pipeline.

The purple rectangle denotes the Masked Model method, where we apply a mask in the hidden feature. While the classifier head requires the input channel unchanged, we apply mask tokens to append the masked part, as shown in the purple dashed rectangle on the left.

Inside this rectangle, the gray part represents the masked appended token, and the white part is the unmasked part. On the right part, we have two inverse masked representations. Getting inspired by the diffusion model where the model predicts the noise, we provide the inverse masked feature to the model, hoping the model recovers better from the masked representation (the inverse forward has detached gradient).

  • Black trapezoid: Backbone and classifier (task) head
  • Black rectangle: Hidden/output feature
  • Gray arrow: Baseline CE pipeline
  • Purple arrow: KD/Masked reference pipeline
  • Purple rectangle: Masked Model method
  • Gray part inside the purple rectangle: Masked appended token
  • White part inside the purple rectangle: Unmasked part -->

where the backbone and classifier (task) head are denoted by #000000 black trapezoids, and a #000000 black rectangle represents the hidden/output feature. Here, the #808080 gray arrow denotes the baseline CE pipeline, and the #9370DB light purple arrow shows the KD/Masked reference pipeline.

The #9370DB light purple rectangle denotes the Masked Model method, where we apply a mask in the hidden feature. While the classifier head requires the input channel unchanged, we apply mask tokens to append the masked part, as shown in the #9370DB light purple dashed rectangle on the left.

Inside this rectangle, the #808080 gray part represents the masked appended token, and the #FFFFFF white part is the unmasked part. On the right part, we have two inverse masked representations. Getting inspired by the diffusion model where the model predicts the noise, we provide the inverse masked feature to the model, hoping the model recovers better from the masked representation (the inverse forward has detached gradient).

In addition to the base-KD/Mask model, we further introduce the cls_only line of KD/Mask model, which is illustrated with the same masked strategy and a #ADD8E6 light blue arrow.

  • #000000 Black trapezoid: Backbone and classifier (task) head
  • #000000 Black rectangle: Hidden/output feature
  • #808080 Gray arrow: Baseline CE pipeline
  • #9370DB Light purple arrow: KD/Masked reference pipeline
  • #9370DB Light purple rectangle: Masked Model method
  • #808080 Gray part inside the light purple rectangle: Masked appended token
  • #FFFFFF White part inside the light purple rectangle: Unmasked part
  • #ADD8E6 Light blue arrow: cls_only line of KD/Mask model

Maskedclassifieronly - Loss_msk type, Mode, Ratio

The experiments of Masked model training can be found in tools/cl_trails/mask \ Corresponding configs are in cifar-img/dlres18exp/mask ```shell bash /---your---own---working--dir--/tools/cltrails/mask/runmskallvar.sh 0.9 'v1' 'sgf' #maskratio #invmode #maskmode

for inv_mode

# this is just a selection between whether 
 ## v1: we add the inv_mask output logits on masked output
 ## v2: we substract the inv_mask output logits from the target outputs
 ###  p.s. target outputs[the under rectangle logits in the figure]
        #  masked output[the upper rectangle logits]

for mask_mode

# this is just difference of different selection of masked tokens to append the in_channel 
 ## s: we use single 1*1 torch tensor 
 ## a: we use all 1*(channels- channels*mask_ratio) torch tensor
 ## sgd: single tensor with gradient=false
 ## agd: all tensor with gradient=false
 ###  p.s.: [all will be expanded as same batch-size of hidden feature.]

bash /---your---own---working--dir--/tools/cltrails/mask/runmskclsovar.sh 0.9 'v1' 'sgf' #maskratio #invmode #maskmode

bash /---your---own---working--dir--/tools/cltrails/mask/runmskclsovar.sh 0.9 'v1' 'sgf' 0.1 'ema' #maskratio #invmode #maskmode #emaratio #emamode

for ema_mode

# this is just a seletion of different ema_mode
 ## 'ema' ExponentialMovingAverage
 ## 'anne' MomentumAnnealingEMA
 ## 'sto' StochasticWeightAverage

```

Results

KD, Mask

All baseline and Self-KD hyperparameter results table can be found in tools/cltrails/testcl.sh, tools/cltrails/baselineKD/run_SD.md

All method comparision using wandb

Rank(me) details see: https://arxiv.org/abs/2210.02885, can be explained as a kind of feature learning quality metric, similar to the rank of representation. Wandb results all method Here we can clearly see in the plot - So far the KD ema results have slightly better accuracy/top1 results and larger rankme value (not fully shown explained in note) [purple VS blue] - The naive training [blue] have the tendency of constant droping Rank, which may due to the influence of classifer head(which has a large-to-small compression, but do not show on SSL method, which have a small-to-large[or kind of equivilant] reconstruction head) - KD/Mask have the power of considerably increase the rank of our feature learning process, but for mask, the accuracy part do not surpass the baseline.(maybe can try a linear probing accuracy?) -

Note: Since HPC require storage limitations, so I'm annoyed by wandb's artifacts, I accidentally rm -rf * of some of the logs of wandb results while deleting the wandb cache.

Comparision on Cifar10.1 evaluation

- kd/msk method do have a smaller performance gap, is this a trade off or can be alleviated by LP(good representation learnt) or some other method need to be proposed

Comparision on Linear Probing results

  • is good representation kind of reflected by RankMe?
  • is LP properly implemented?
  • what experiements to follow?

Rank Log: - KD-EMA LP and CIFAR10.1 Results Step25 Overview [Most EMA ratio and loss weight]: - LP: 1. The LP results and original classifier predictions are totally the same for the CIFAR10.1 dataset. 2. However, the LP results on the CIFAR10 test set are sometimes slightly better than the original dataset. - Rank: 3. Based on rank, as stated before, the loss weight on the auxiliary loss plays the main part. (Distillation loss helps improve rank) 4. Higher rank (>=59) tends to exhibit higher accuracy/top1 but not quite as high as ensemble distill results. Wandb results kdema-1

  • KD-EMA LP and CIFAR10.1 Results Step1 Overview

    • With all the above LP results, what is different is that LP results of this step 1 started version downgrade slightly less on CIFAR10.1 vs CIFAR10. (Step1: 90->82.55 to Step25: 91->81)
    • Rank also exhibits a similar result. This time the bottom line is higher and the rise of rank starts almost in line with the start step of the method.
  • KD-Base LP and CIFAR10.1 Results Overview [Most EMA ratio and loss weight]:

    • LP:
    • The KD base part of the LP results vs original classifier head predictions, they are all the same.
    • The margin between CIFAR original test dataset vs CIFAR10.1 is higher compared to that of the KD-EMA mode.
    • The highest rank value is almost the lowest rank value of KD-EMA mode.
    • Test higher LP learning rate multiplier still pending. Wandb results kdema-1

TODO

  • Test masked results performance on CIFAR-10.1 dataset. ✅ ✅
  • Finish EMA masked model results. ✅ ❓
    • ema on backbone
    • ema based on lp mult best
    • mskall baseline based on kdema best configure
    • try difussion thing
  • Test longer epochs of key hyperparameter selections. ❓ ❓
  • Finish masked experiments on continual learning setting. ❓ ❓ <!-- ### Baseline- architecture modification Follow the scripts in run_baseline.sh

shell sh tools/res_trails/run_baseline.sh

Ensemble, etc.

Follow the scripts in run_ensemble.sh

shell sh tools/res_trails/run_ensemble.sh

Normalization, Activation, Dropout, Masking

Follow the scripts in run_cfgs.sh

shell sh tools/res_trails/run_cfgs.sh

Results, and evaluation.

shell sh tools/res_trails/run_evaluation.sh -->

Owner

  • Login: ytwang13
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "OpenMMLab's Pre-training Toolbox and Benchmark"
authors:
  - name: "MMPreTrain Contributors"
version: 0.15.0
date-released: 2023-04-06
repository-code: "https://github.com/open-mmlab/mmpretrain"
license: Apache-2.0

GitHub Events

Total
Last Year

Dependencies

tests/data/meta.yml cpan
docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve/Dockerfile docker
  • pytorch/torchserve latest-gpu build
projects/internimage_classification/ops_dcnv3/setup.py pypi
requirements/docs.txt pypi
  • docutils ==0.18.1
  • modelindex *
  • myst-parser *
  • pytorch_sphinx_theme *
  • sphinx ==6.1.3
  • sphinx-copybutton *
  • sphinx-notfound-page *
  • sphinx-tabs *
  • sphinxcontrib-jquery *
  • tabulate *
requirements/mminstall.txt pypi
  • mmcv >=2.0.0,<2.4.0
  • mmengine >=0.8.3,<1.0.0
requirements/multimodal.txt pypi
  • pycocotools *
  • transformers >=4.28.0
requirements/optional.txt pypi
  • albumentations >=0.3.2
  • grad-cam >=1.3.7,<1.5.0
  • requests *
  • scikit-learn *
requirements/readthedocs.txt pypi
  • mmcv-lite >=2.0.0rc4
  • mmengine *
  • pycocotools *
  • torch *
  • torchvision *
  • transformers *
requirements/runtime.txt pypi
  • einops *
  • importlib-metadata *
  • mat4py *
  • matplotlib *
  • modelindex *
  • numpy *
  • rich *
requirements/tests.txt pypi
  • coverage * test
  • interrogate * test
  • pytest * test
requirements.txt pypi
setup.py pypi