lcc

Fine-tuning classifier NNs with Latent Cluster Correction

https://github.com/altaris/lcc

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Keywords

fine-tuning huggingface latent-space pytorch-lightning torch torchvision

Last synced: 6 months ago · JSON representation ·

Repository

Fine-tuning classifier NNs with Latent Cluster Correction

Basic Info

Host: GitHub
Owner: altaris
License: mit
Language: Jupyter Notebook
Default Branch: master
Homepage:
Size: 17.6 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

fine-tuning huggingface latent-space pytorch-lightning torch torchvision

Created over 2 years ago · Last pushed 12 months ago

Metadata Files

Readme Changelog Contributing License Citation

LCC: Latent Cluster Correction

Neural networks take input samples and transform them into latent representations
Semantically similar samples tend to aggregate into latent clusters
This repository implements Latent Cluster Correction, a new technique to improve said latent clusters

Pretty images

These are examples of input datasets fed into image classifier models. Some selected latent representations are extracted and plotted in 2D (via dimensionality reduction). Initially, during the feature extraction phase, the samples are not clearly separated. But as the samples progressively get into the classification phase, visible latent clusters emerge. The goal of LCC is to help the formation of these clusters.

Installation

Make sure uv is installed. Then run

sh uv python install 3.10 uv sync --all-extras

Usage

Fine-tuning with LCC: modify and run lcc.sh, or use the CLI directly:

sh uv run python -m lcc train --help

For example:

sh uv run python -m lcc train \ microsoft/resnet-18 \ PRESET:cifar100 \ output_dir \ --batch-size 256 \ --head-name classifier.1 \ --logit-key logits \ --lcc-submodules resnet.encoder.stages.3 \ --lcc-warmup 1 \ --lcc-weight 0.01 \ --seed 123

Pretty-print a model structure from HuggingFace: run ./pretty-print.sh HF_MODEL_NAME, e.g.

sh ./pretty-print.sh microsoft/resnet-18

API overview

lcc.training: Training stuff
- lcc.training.train: Pulls and trains a model from the HuggingFace model hub (presumably pretrained on ImageNet) on a dataset also pulled from HuggingFace. This method takes the model and dataset name as argument, so it's pretty rigid.
lcc.datasets: Dataset stuff
- lcc.datasets.HuggingFaceDataset: A HuggingFace image classification dataset wrapped inside a Lightning Datamodule for easy use with PyTorch Lightning.
- lcc.datasets.get_dataset: Creating a HuggingFaceDataset required a bunch of arguments. I was tired of copy-pasting them around, so I made this method to create classical datasets more quickly. See nlnas.datasets.DATASET_PRESETS_CONFIGURATIONS for the list of available presets.
lcc.classifiers: Classifier models and wrappers
- lcc.classifiers.HuggingFaceClassifier: A HuggingFace image classification model wrapped inside a Lightning Module for easy use with PyTorch Lightning.
- lcc.classifiers.TimmClassifier: Same but for timm models, which despite also coming from the Huggingface hub, require some special considerations. See also timm.list_models.
lcc.correction: LCC stuff. You probably don't need to touch that directly since LCC is done automatically for classifier classes found in lcc.classifiers.
lcc.plotting: Cool plotting stuff.
- lcc.plotting.class_scatter: 2D scatter plot where samples are colored by class. Also support "outliers", which are samples with negative label.

Cite

Preprint

bibtex @misc{hothanhImprovingFineTuningLatent2025, title = {Improving {{Fine-Tuning}} with {{Latent Cluster Correction}}}, author = {Ho Thanh, C{\'e}dric}, year = {2025}, month = jan, number = {arXiv:2501.11919}, eprint = {2501.11919}, primaryclass = {cs}, publisher = {arXiv}, doi = {10.48550/arXiv.2501.11919}, urldate = {2025-01-22}, archiveprefix = {arXiv}, keywords = {Computer Science - Machine Learning}, }

Code

bibtex @software{Ho_Thanh_LCC_Latent_Cluster_2025, author = {Ho Thanh, Cédric}, license = {MIT}, month = jan, title = {{LCC: Latent Cluster Correction}}, url = {https://github.com/altaris/lcc}, version = {1.0.0}, year = {2025} }

Owner

Name: Cédric
Login: altaris
Kind: user
Location: Japan
Company: RIKEN

Website: https://cedric.hothanh.fr
Repositories: 45
Profile: https://github.com/altaris

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: "LCC: Latent Cluster Correction"
message: "If you use this software, please cite it as below."
type: software
authors:
  - family-names: Ho Thanh
    given-names: Cédric
    orcid: "https://orcid.org/0000-0003-4476-2034"
identifiers:
  - type: doi
    value: 10.5281/zenodo.14934702
repository-code: "https://github.com/altaris/lcc"
url: "https://github.com/altaris/lcc"
license: MIT
commit: 880b2b16bf3ca39f77e19be4550db57a4d3ae79f
version: 1.0.0
date-released: "2025-01-27"

GitHub Events

Total

Delete event: 6
Public event: 1
Push event: 27
Create event: 7

Last Year

Delete event: 6
Public event: 1
Push event: 27
Create event: 7

Committers

Last synced: 7 months ago

All Time

Total Commits: 498
Total Committers: 1
Avg Commits per committer: 498.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 259
Committers: 1
Avg Commits per committer: 259.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Cédric HT	a****s	498

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/gh-pages.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite
peaceiris/actions-gh-pages v3 composite

.etc/setup.py pypi

pyproject.toml pypi

click >=8.1.7
datasets [vision]>=3.0.0
faiss-cpu >=1.9.0
loguru >=0.7.2
more-itertools >=10.5.0
networkx >=3.3
numpy >=1.25
pillow >=10.4.0
pytorch-lightning >=2.4.0
regex >=2024.9.11
safetensors >=0.4.5
scikit-learn >=1.5.2
tensorboard >=2.17.1
timm >=1.0.9
torch >=2.4.1
transformers >=4.44.2
turbo-broccoli >=4.12.2

requirements.nocuda.txt pypi

faiss *
umap-learn *

lcc

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

LCC: Latent Cluster Correction

Pretty images

Installation

Usage

API overview

Cite

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies