Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
Repository
Fine-tuning classifier NNs with Latent Cluster Correction
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
LCC: Latent Cluster Correction
- Neural networks take input samples and transform them into latent representations
- Semantically similar samples tend to aggregate into latent clusters
- This repository implements Latent Cluster Correction, a new technique to improve said latent clusters
Pretty images
These are examples of input datasets fed into image classifier models. Some selected latent representations are extracted and plotted in 2D (via dimensionality reduction). Initially, during the feature extraction phase, the samples are not clearly separated. But as the samples progressively get into the classification phase, visible latent clusters emerge. The goal of LCC is to help the formation of these clusters.
Installation
Make sure uv is installed. Then run
sh
uv python install 3.10
uv sync --all-extras
Usage
- Fine-tuning with LCC: modify and run
lcc.sh, or use the CLI directly:
sh
uv run python -m lcc train --help
For example:
sh
uv run python -m lcc train \
microsoft/resnet-18 \
PRESET:cifar100 \
output_dir \
--batch-size 256 \
--head-name classifier.1 \
--logit-key logits \
--lcc-submodules resnet.encoder.stages.3 \
--lcc-warmup 1 \
--lcc-weight 0.01 \
--seed 123
- Pretty-print a model structure from
HuggingFace:
run
./pretty-print.sh HF_MODEL_NAME, e.g.
sh
./pretty-print.sh microsoft/resnet-18
API overview
lcc.training: Training stufflcc.training.train: Pulls and trains a model from the HuggingFace model hub (presumably pretrained on ImageNet) on a dataset also pulled from HuggingFace. This method takes the model and dataset name as argument, so it's pretty rigid.
lcc.datasets: Dataset stufflcc.datasets.HuggingFaceDataset: A HuggingFace image classification dataset wrapped inside a Lightning Datamodule for easy use with PyTorch Lightning.lcc.datasets.get_dataset: Creating aHuggingFaceDatasetrequired a bunch of arguments. I was tired of copy-pasting them around, so I made this method to create classical datasets more quickly. Seenlnas.datasets.DATASET_PRESETS_CONFIGURATIONSfor the list of available presets.
lcc.classifiers: Classifier models and wrapperslcc.classifiers.HuggingFaceClassifier: A HuggingFace image classification model wrapped inside a Lightning Module for easy use with PyTorch Lightning.lcc.classifiers.TimmClassifier: Same but fortimmmodels, which despite also coming from the Huggingface hub, require some special considerations. See alsotimm.list_models.
lcc.correction: LCC stuff. You probably don't need to touch that directly since LCC is done automatically for classifier classes found inlcc.classifiers.lcc.plotting: Cool plotting stuff.lcc.plotting.class_scatter: 2D scatter plot where samples are colored by class. Also support "outliers", which are samples with negative label.
Cite
bibtex
@misc{hothanhImprovingFineTuningLatent2025,
title = {Improving {{Fine-Tuning}} with {{Latent Cluster Correction}}},
author = {Ho Thanh, C{\'e}dric},
year = {2025},
month = jan,
number = {arXiv:2501.11919},
eprint = {2501.11919},
primaryclass = {cs},
publisher = {arXiv},
doi = {10.48550/arXiv.2501.11919},
urldate = {2025-01-22},
archiveprefix = {arXiv},
keywords = {Computer Science - Machine Learning},
}
- Code
bibtex
@software{Ho_Thanh_LCC_Latent_Cluster_2025,
author = {Ho Thanh, Cédric},
license = {MIT},
month = jan,
title = {{LCC: Latent Cluster Correction}},
url = {https://github.com/altaris/lcc},
version = {1.0.0},
year = {2025}
}
Owner
- Name: Cédric
- Login: altaris
- Kind: user
- Location: Japan
- Company: RIKEN
- Website: https://cedric.hothanh.fr
- Repositories: 45
- Profile: https://github.com/altaris
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: "LCC: Latent Cluster Correction"
message: "If you use this software, please cite it as below."
type: software
authors:
- family-names: Ho Thanh
given-names: Cédric
orcid: "https://orcid.org/0000-0003-4476-2034"
identifiers:
- type: doi
value: 10.5281/zenodo.14934702
repository-code: "https://github.com/altaris/lcc"
url: "https://github.com/altaris/lcc"
license: MIT
commit: 880b2b16bf3ca39f77e19be4550db57a4d3ae79f
version: 1.0.0
date-released: "2025-01-27"
GitHub Events
Total
- Delete event: 6
- Public event: 1
- Push event: 27
- Create event: 7
Last Year
- Delete event: 6
- Public event: 1
- Push event: 27
- Create event: 7
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/setup-python v3 composite
- peaceiris/actions-gh-pages v3 composite
- click >=8.1.7
- datasets [vision]>=3.0.0
- faiss-cpu >=1.9.0
- loguru >=0.7.2
- more-itertools >=10.5.0
- networkx >=3.3
- numpy >=1.25
- pillow >=10.4.0
- pytorch-lightning >=2.4.0
- regex >=2024.9.11
- safetensors >=0.4.5
- scikit-learn >=1.5.2
- tensorboard >=2.17.1
- timm >=1.0.9
- torch >=2.4.1
- transformers >=4.44.2
- turbo-broccoli >=4.12.2
- faiss *
- umap-learn *


