https://github.com/chenliu-1996/corrupteddataloader

Pytorch DataLoader wrapper to intentionally mess up, corrupt, shuffle, randomize the input/label correspondence.

https://github.com/chenliu-1996/corrupteddataloader

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

dataloader deep-learning deep-neural-networks label mismatch overfitting pytorch random-labelling
Last synced: 5 months ago · JSON representation

Repository

Pytorch DataLoader wrapper to intentionally mess up, corrupt, shuffle, randomize the input/label correspondence.

Basic Info
  • Host: GitHub
  • Owner: ChenLiu-1996
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 22.5 KB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
dataloader deep-learning deep-neural-networks label mismatch overfitting pytorch random-labelling
Created almost 3 years ago · Last pushed almost 3 years ago

https://github.com/ChenLiu-1996/CorruptedDataLoader/blob/main/

# CorruptedDataLoader
Chen Liu (chen.liu.cl2482@yale.edu)

Please kindly **Star** [![Github Stars](https://img.shields.io/github/stars/ChenLiu-1996/CorruptedDataLoader.svg?style=social&label=Stars)](https://github.com/ChenLiu-1996/CorruptedDataLoader/) this repo for better reach if you find it useful.

## Contributions
We provide a simple wrapper around PyTorch DataLoader to **intentionally mess up the input/label correspondence**.

## Motivation
In the majority of times, when we train a machine learning model, we pay extra attention to make sure the inputs and labels are correctly matched. In occasional situations, however, we may want the opposite to happen. One such possibility is, as outlined in the paper ["Understanding deep learning requires rethinking generalization"](https://arxiv.org/abs/1611.03530), we may want to **corrupt the training set and intentionally overfit a model on random labels**.

Despite careful search on the internet, we were unable to find existing open-source implementations to achieve this purpose. Therefore we designed our own method to achieve this purpose and provided it to those who may have a similar need.

## Example
```
train_loader = ...  # define `train_loader` as you normally would
train_loader = CorruptedLabelDataLoader(train_loader)
for (x, y) in train_loader:
    ...
```

## Details
This repository currently only contains a single file, which itself contains a single class called `CorruptedDataLoader`. `CorruptedDataLoader` is a wrapper around a Pytorch `DataLoader`. The `Dataloader` may hold arbitrary `dataset`s, while in the current implementation, we only support the following `dataset`s:

1. `torchvision.datasets.MNIST`
2. `torchvision.datasets.CIFAR10`
3. `torchvision.datasets.CIFAR100`
4. `torchvision.datasets.STL10`

Meanwhile, it can be easily adapted to any custom `dataset`, as long as you know under what key the `labels` are stored.

## Usage
To use, simply copy `CorruptedDataLoader` to an appropriate location in your codebase and modify as you need. Don't forget to give us a **star** if you use it and find it helpful.

## Citation
To be added

Owner

  • Name: Chen Liu
  • Login: ChenLiu-1996
  • Kind: user
  • Location: New Haven
  • Company: Yale University

CS PhD student at @KrishnaswamyLab, @YaleUniversity. Reviewing Committee member at NeurIPS, ICLR, ICML.

GitHub Events

Total
Last Year