https://github.com/chenliu-1996/corrupteddataloader
Pytorch DataLoader wrapper to intentionally mess up, corrupt, shuffle, randomize the input/label correspondence.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
dataloader
deep-learning
deep-neural-networks
label
mismatch
overfitting
pytorch
random-labelling
Last synced: 5 months ago
·
JSON representation
Repository
Pytorch DataLoader wrapper to intentionally mess up, corrupt, shuffle, randomize the input/label correspondence.
Basic Info
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
dataloader
deep-learning
deep-neural-networks
label
mismatch
overfitting
pytorch
random-labelling
Created almost 3 years ago
· Last pushed almost 3 years ago
https://github.com/ChenLiu-1996/CorruptedDataLoader/blob/main/
# CorruptedDataLoader
Chen Liu (chen.liu.cl2482@yale.edu)
Please kindly **Star** [](https://github.com/ChenLiu-1996/CorruptedDataLoader/) this repo for better reach if you find it useful.
## Contributions
We provide a simple wrapper around PyTorch DataLoader to **intentionally mess up the input/label correspondence**.
## Motivation
In the majority of times, when we train a machine learning model, we pay extra attention to make sure the inputs and labels are correctly matched. In occasional situations, however, we may want the opposite to happen. One such possibility is, as outlined in the paper ["Understanding deep learning requires rethinking generalization"](https://arxiv.org/abs/1611.03530), we may want to **corrupt the training set and intentionally overfit a model on random labels**.
Despite careful search on the internet, we were unable to find existing open-source implementations to achieve this purpose. Therefore we designed our own method to achieve this purpose and provided it to those who may have a similar need.
## Example
```
train_loader = ... # define `train_loader` as you normally would
train_loader = CorruptedLabelDataLoader(train_loader)
for (x, y) in train_loader:
...
```
## Details
This repository currently only contains a single file, which itself contains a single class called `CorruptedDataLoader`. `CorruptedDataLoader` is a wrapper around a Pytorch `DataLoader`. The `Dataloader` may hold arbitrary `dataset`s, while in the current implementation, we only support the following `dataset`s:
1. `torchvision.datasets.MNIST`
2. `torchvision.datasets.CIFAR10`
3. `torchvision.datasets.CIFAR100`
4. `torchvision.datasets.STL10`
Meanwhile, it can be easily adapted to any custom `dataset`, as long as you know under what key the `labels` are stored.
## Usage
To use, simply copy `CorruptedDataLoader` to an appropriate location in your codebase and modify as you need. Don't forget to give us a **star** if you use it and find it helpful.
## Citation
To be added
Owner
- Name: Chen Liu
- Login: ChenLiu-1996
- Kind: user
- Location: New Haven
- Company: Yale University
- Website: https://chenliu-1996.github.io/
- Twitter: ChenLiu_1996
- Repositories: 5
- Profile: https://github.com/ChenLiu-1996
CS PhD student at @KrishnaswamyLab, @YaleUniversity. Reviewing Committee member at NeurIPS, ICLR, ICML.