https://github.com/chenliu-1996/corrupteddataloader

Pytorch DataLoader wrapper to intentionally mess up, corrupt, shuffle, randomize the input/label correspondence.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Keywords

dataloader deep-learning deep-neural-networks label mismatch overfitting pytorch random-labelling

Last synced: 5 months ago · JSON representation

Repository

Pytorch DataLoader wrapper to intentionally mess up, corrupt, shuffle, randomize the input/label correspondence.

Basic Info

Host: GitHub
Owner: ChenLiu-1996
Language: Python
Default Branch: main
Homepage:
Size: 22.5 KB

Statistics

Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

dataloader deep-learning deep-neural-networks label mismatch overfitting pytorch random-labelling

Created almost 3 years ago · Last pushed almost 3 years ago

https://github.com/ChenLiu-1996/CorruptedDataLoader/blob/main/

# CorruptedDataLoader
Chen Liu (chen.liu.cl2482@yale.edu)

Please kindly **Star** [![Github Stars](https://img.shields.io/github/stars/ChenLiu-1996/CorruptedDataLoader.svg?style=social&label=Stars)](https://github.com/ChenLiu-1996/CorruptedDataLoader/) this repo for better reach if you find it useful.

## Contributions
We provide a simple wrapper around PyTorch DataLoader to **intentionally mess up the input/label correspondence**.

## Motivation
In the majority of times, when we train a machine learning model, we pay extra attention to make sure the inputs and labels are correctly matched. In occasional situations, however, we may want the opposite to happen. One such possibility is, as outlined in the paper ["Understanding deep learning requires rethinking generalization"](https://arxiv.org/abs/1611.03530), we may want to **corrupt the training set and intentionally overfit a model on random labels**.

Despite careful search on the internet, we were unable to find existing open-source implementations to achieve this purpose. Therefore we designed our own method to achieve this purpose and provided it to those who may have a similar need.

## Example
```
train_loader = ... # define `train_loader` as you normally would
train_loader = CorruptedLabelDataLoader(train_loader)
for (x, y) in train_loader:
...
```

## Details
This repository currently only contains a single file, which itself contains a single class called `CorruptedDataLoader`. `CorruptedDataLoader` is a wrapper around a Pytorch `DataLoader`. The `Dataloader` may hold arbitrary `dataset`s, while in the current implementation, we only support the following `dataset`s:

1. `torchvision.datasets.MNIST`
2. `torchvision.datasets.CIFAR10`
3. `torchvision.datasets.CIFAR100`
4. `torchvision.datasets.STL10`

Meanwhile, it can be easily adapted to any custom `dataset`, as long as you know under what key the `labels` are stored.

## Usage
To use, simply copy `CorruptedDataLoader` to an appropriate location in your codebase and modify as you need. Don't forget to give us a **star** if you use it and find it helpful.

## Citation
To be added

Owner

Name: Chen Liu
Login: ChenLiu-1996
Kind: user
Location: New Haven
Company: Yale University

Website: https://chenliu-1996.github.io/
Twitter: ChenLiu_1996
Repositories: 5
Profile: https://github.com/ChenLiu-1996

CS PhD student at @KrishnaswamyLab, @YaleUniversity. Reviewing Committee member at NeurIPS, ICLR, ICML.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/chenliu-1996/corrupteddataloader

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

https://github.com/ChenLiu-1996/CorruptedDataLoader/blob/main/

Owner

GitHub Events

Total

Last Year