awesome-dataset-distillation

A curated list of awesome papers on dataset distillation and related applications.

https://github.com/guang000/awesome-dataset-distillation

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, springer.com, ieee.org, acm.org
  • Committers with academic emails
    6 of 36 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

awesome-list deep-learning
Last synced: 6 months ago · JSON representation ·

Repository

A curated list of awesome papers on dataset distillation and related applications.

Basic Info
Statistics
  • Stars: 1,763
  • Watchers: 35
  • Forks: 160
  • Open Issues: 0
  • Releases: 0
Topics
awesome-list deep-learning
Created over 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation

README.md

Awesome Dataset Distillation

Awesome Contrib PaperNum Stars Forks

Awesome Dataset Distillation provides the most comprehensive and detailed information on the Dataset Distillation field.

Dataset distillation is the task of synthesizing a small dataset such that models trained on it achieve high performance on the original large dataset. A dataset distillation algorithm takes as input a large real dataset to be distilled (training set), and outputs a small synthetic distilled dataset, which is evaluated via testing models trained on this distilled dataset on a separate real dataset (validation/test set). A good small distilled dataset is not only useful in dataset understanding, but has various applications (e.g., continual learning, privacy, neural architecture search, etc.). This task was first introduced in the paper Dataset Distillation [Tongzhou Wang et al., '18], along with a proposed algorithm using backpropagation through optimization steps. Then the task was first extended to the real-world datasets in the paper Medical Dataset Distillation [Guang Li et al., '19], which also explored the privacy preservation possibilities of dataset distillation. In the paper Dataset Condensation [Bo Zhao et al., '20], gradient matching was first introduced and greatly promoted the development of the dataset distillation field.

In recent years (2022-now), dataset distillation has gained increasing attention in the research community, across many institutes and labs. More papers are now being published each year. These wonderful researches have been constantly improving dataset distillation and exploring its various variants and applications.

This project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang.

How to submit a pull request?

  • :globewithmeridians: Project Page
  • :octocat: Code
  • :book: bibtex

Latest Updates

Contents

Main

Early Work

Gradient/Trajectory Matching Surrogate Objective

Distribution/Feature Matching Surrogate Objective

Kernel-Based Distillation

Distilled Dataset Parametrization

Generative Distillation

GAN

Diffusion

Better Optimization

Better Understanding

Label Distillation

Dataset Quantization

Decoupled Distillation

Multimodal Distillation

Self-Supervised Distillation

Universal Distillation

Benchmark

Survey

Ph.D. Thesis

Workshop

Challenge

Ranking

Applications

Continual Learning

Privacy

Medical

Federated Learning

Graph Neural Network

Survey

Benchmark

No further updates will be made regarding graph distillation topics as sufficient papers and summary projects are already available on the subject

Neural Architecture Search

Fashion, Art, and Design

Recommender Systems

Blackbox Optimization

Robustness

Fairness

Text

Video

Tabular

Retrieval

Domain Adaptation

Super Resolution

Time Series

Speech

Machine Unlearning

Reinforcement Learning

Long-Tail

Learning with Noisy Labels

Object Detection

Point Cloud

Media Coverage

Star History

Star History Chart

Citing Awesome Dataset Distillation

If you find this project useful for your research, please use the following BibTeX entry. @misc{li2022awesome, author={Li, Guang and Zhao, Bo and Wang, Tongzhou}, title={Awesome Dataset Distillation}, howpublished={\url{https://github.com/Guang000/Awesome-Dataset-Distillation}}, year={2022} }

Acknowledgments

We would like to express our heartfelt thanks to Nikolaos Tsilivis, Wei Jin, Yongchao Zhou, Noveen Sachdeva, Can Chen, Guangxiang Zhao, Shiye Lei, Xinchao Wang, Dmitry Medvedev, Seungjae Shin, Jiawei Du, Yidi Jiang, Xindi Wu, Guangyi Liu, Yilun Liu, Kai Wang, Yue Xu, Anjia Cao, Jianyang Gu, Yuanzhen Feng, Peng Sun, Ahmad Sajedi, Zhihao Sui, Ziyu Wang, Haoyang Liu, Eduardo Montesuma, Shengbo Gong, Zheng Zhou, Zhenghao Zhao, Duo Su, Tianhang Zheng, Shijie Ma, Wei Wei, Yantai Yang, Shaobo Wang, Xinhao Zhong, Zhiqiang Shen, Cong Cong, Chun-Yin Huang, Dai Liu, Ruonan Yu, William Holland, Saksham Singh Kushwaha, Ping Liu, Wenliang Zhong, Ning Li, and Guochen Yan for their valuable suggestions and contributions.

The Homepage of Awesome Dataset Distillation was designed by Longzhen Li and maintained by Mingzhuo Li.

Owner

  • Name: Guang Li
  • Login: Guang000
  • Kind: user
  • Location: Sapporo, Hokkaido
  • Company: Hokkaido University

PhD Candidate at Hokkaido University

Citation (citations/bao2025ruo.txt)

@inproceedings{bao2025ruo,
  title={Dataset Distillation as Data Compression: A Rate-Utility Perspective}, 
  author={Bao, Youneng and Liu, Yiping and Chen, Zhuo and Liang, Yongsheng and Li, Mu and Ma, Kede},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}

GitHub Events

Total
  • Commit comment event: 1
  • Issues event: 3
  • Watch event: 344
  • Member event: 1
  • Issue comment event: 12
  • Push event: 270
  • Pull request event: 29
  • Fork event: 27
Last Year
  • Commit comment event: 1
  • Issues event: 3
  • Watch event: 344
  • Member event: 1
  • Issue comment event: 12
  • Push event: 270
  • Pull request event: 29
  • Fork event: 27

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,209
  • Total Committers: 36
  • Avg Commits per committer: 33.583
  • Development Distribution Score (DDS): 0.186
Past Year
  • Commits: 500
  • Committers: 18
  • Avg Commits per committer: 27.778
  • Development Distribution Score (DDS): 0.248
Top Committers
Name Email Commits
Guang Li 3****0 984
longzhen k****l@g****m 108
SumomoTaku l****t@g****m 33
Tongzhou Wang S****L 9
ZHAO, BO b****g@g****m 8
Shaobo (Steven) Wang s****9@s****n 6
GGchen1997 3****7 5
Zheng Zhou (Dylan ) z****d@1****m 5
Cong Cong 3****1 4
SJShin-AI 8****I 4
SsnL t****4@g****m 3
sp12138 s****8@1****m 3
Eduardo Fernandes Montesuma e****a@g****m 3
Ahmad Sajedi s****h@g****m 3
limingzhuot@gmail.com u****r@1****n 3
CAOANJIA c****7@g****m 2
Wang 1****6@q****m 2
Wei.Wei W****i@u****e 2
Yue Xu s****e@s****n 2
rayneholland 1****d 2
vimar-gu 5****8@q****m 2
Ning9319 l****9@g****m 2
ZhongWL C****g@1****m 1
Youth-49 y****9@g****m 1
fengyzpku y****1@n****u 1
lgy0404 2****1@q****m 1
Yongchao Zhou y****u@m****a 1
Xindi Wu w****9@g****m 1
Tianhang Zheng t****g@m****a 1
Shijie Ma m****1@i****n 1
and 6 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 15
  • Total pull requests: 65
  • Average time to close issues: about 10 hours
  • Average time to close pull requests: about 4 hours
  • Total issue authors: 13
  • Total pull request authors: 33
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.62
  • Merged pull requests: 54
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 26
  • Average time to close issues: about 11 hours
  • Average time to close pull requests: about 8 hours
  • Issue authors: 3
  • Pull request authors: 10
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.65
  • Merged pull requests: 17
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • suizhihao (2)
  • yongchaoz (2)
  • Liareee (1)
  • dm-medvedev (1)
  • Liu-Hy (1)
  • Saramandaaa (1)
  • ChandlerBang (1)
  • Brian-Moser (1)
  • noveens (1)
  • Silung (1)
  • Guang000 (1)
  • zytx121 (1)
  • JLUssh (1)
Pull Request Authors
  • zhouzhengqd (14)
  • AhmadSajedii (6)
  • gszfwsb (6)
  • sp12138 (5)
  • GGchen1997 (5)
  • SJShin-AI (4)
  • Ning9319 (4)
  • Youth-49 (2)
  • sakshamsingh1 (2)
  • Hiter-Q (2)
  • Guang000 (2)
  • rayneholland (2)
  • vimar-gu (2)
  • WeiWeic6222848 (2)
  • thomascong121 (2)
Top Labels
Issue Labels
documentation (1) good first issue (1)
Pull Request Labels