instances - Release v1.0.19

Patch release for Python 3.9 compat break in 1.0.18

July 23, 2025

Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.

July 21, 2025

ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

What's Changed

Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2552
Support setinputsize() in EVA models by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2554

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.17...v1.0.18

- Python
Published by rwightman 10 months ago

July 23, 2025

Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0

July 21, 2025

ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

What's Changed

Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2552
Support setinputsize() in EVA models by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2554

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.17...v1.0.18

- Python
Published by rwightman 10 months ago

July 7, 2025

MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
- Add stem bias (zero'd in updated weights, compat break with old weights)
- GELU -> GELU (tanh approx). A minor change to be closer to JAX
Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
Some typing, argument cleanup for norm, norm+act layers done with above
Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub

|model |imgsize|top1 |top5 |paramcount| |--------------------------------------------------|--------|------|------|-----------| |vitlargepatch16ropemixedape224.naverin1k |224 |84.84 |97.122|304.4 | |vitlargepatch16ropemixed224.naverin1k |224 |84.828|97.116|304.2 | |vitlargepatch16ropeape224.naverin1k |224 |84.65 |97.154|304.37 | |vitlargepatch16rope224.naverin1k |224 |84.648|97.122|304.17 | |vitbasepatch16ropemixedape224.naverin1k |224 |83.894|96.754|86.59 | |vitbasepatch16ropemixed224.naverin1k |224 |83.804|96.712|86.44 | |vitbasepatch16ropeape224.naverin1k |224 |83.782|96.61 |86.59 | |vitbasepatch16rope224.naverin1k |224 |83.718|96.672|86.43 | |vitsmallpatch16rope224.naverin1k |224 |81.23 |95.022|21.98 | |vitsmallpatch16ropemixed224.naverin1k |224 |81.216|95.022|21.99 | |vitsmallpatch16ropeape224.naverin1k |224 |81.004|95.016|22.06 | |vitsmallpatch16ropemixedape224.naverin1k |224 |80.986|94.976|22.06 | * Some cleanup of ROPE modules, helpers, and FX tracing leaf registration * Preparing version 1.0.17 release

What's Changed

Adding Naver rope-vit compatibility to EVA ViT by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2529
Update nograd usage to inferencemode if possible by @GuillaumeErhard in https://github.com/huggingface/pytorch-image-models/pull/2534
Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2537
Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2538
Add flag to enable float32 computation for normalization (norm + affine) by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2536
fix: mnv5 conv_stem bias and GELU with approximate=tanh by @RyanMullins in https://github.com/huggingface/pytorch-image-models/pull/2533
Fixup casting issues for weights/bias in fp32 norm layers by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2539
Fix H, W ordering for xy indexing in ROPE by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2541
Fix 3 typos in README.md by @robin-ede in https://github.com/huggingface/pytorch-image-models/pull/2544

New Contributors

@GuillaumeErhard made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2534
@RyanMullins made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2533
@robin-ede made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2544

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.16...v1.0.17

- Python
Published by rwightman 11 months ago

instances - Release v1.0.16

June 26, 2025

MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
Version 1.0.16 released

June 23, 2025

Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.

| Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len | |:---|:---:|:---:|:---:|:---:| | naflexvitbasepatch16pargap.e300s576in1k | 83.67 | 96.45 | 86.63 | 576 | | naflexvitbasepatch16parfacgap.e300s576in1k | 83.63 | 96.41 | 86.46 | 576 | | naflexvitbasepatch16gap.e300s576_in1k | 83.50 | 96.46 | 86.63 | 576 | * Support gradient checkpointing for forward_intermediates and fix some checkpointing bugs. Thanks https://github.com/brianhou0208 * Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers * Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly * Fix cuda stream bug in prefetch loader

June 5, 2025

Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
1. Encapsulated embedding and position encoding in a single module
2. Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
3. Support for NaFlex variable aspect, variable resolution (SigLip-2: https://arxiv.org/abs/2502.14786)
4. Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
5. Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
Existing vit models in vision_transformer.py can be loaded into the NaFlexVit model by adding the use_naflex=True flag to create_model
- Some native weights coming soon
A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
- To enable in train.py and validate.py add the --naflex-loader arg, must be used with a NaFlexVit
To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
- python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
The training has some extra args features worth noting
- The --naflex-train-seq-lens' argument specifies which sequence lengths to randomly pick from per batch during training
- The --naflex-max-seq-len argument sets the target sequence length for validation
- Adding --model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24 will enable random patch size selection per-batch w/ interpolation
- The --naflex-loss-scale arg changes loss scaling mode per batch relative to the batch size, timm NaFlex loading changes the batch size for each seq len

May 28, 2025

Add a number of small/fast models thanks to https://github.com/brianhou0208
- SwiftFormer - (ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
- FasterNet - (CVPR2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
- SHViT - (CVPR2024) SHViT: Single-Head Vision Transformer with Memory Efficient
- StarNet - (CVPR2024) Rewrite the Stars
- GhostNet-V3 GhostNetV3: Exploring the Training Strategies for Compact Models
Update EVA ViT (closest match) to support Perception Encoder models (https://arxiv.org/abs/2504.13181) from Meta, loading Hub weights but I still need to push dedicated timm weights
- Add some flexibility to ROPE impl
Big increase in number of models supporting forward_intermediates() and some additional fixes thanks to https://github.com/brianhou0208
- DaViT, EdgeNeXt, EfficientFormerV2, EfficientViT(MIT), EfficientViT(MSRA), FocalNet, GCViT, HGNet /V2, InceptionNeXt, Inception-V4, MambaOut, MetaFormer, NesT, Next-ViT, PiT, PVT V2, RepGhostNet, RepViT, ResNetV2, ReXNet, TinyViT, TResNet, VoV
TNT model updated w/ new weights forward_intermediates() thanks to https://github.com/brianhou0208
Add local-dir: pretrained schema, can use local-dir:/path/to/model/folder for model name to source model / pretrained cfg & weights Hugging Face Hub models (config.json + weights file) from a local folder.
Fixes, improvements for onnx export

What's Changed

Fix arg merging of sknet, old seresnet. Fix #2470 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2471
Fix onnx export by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2475
Add local-dir: schema support for model loading (config + weights) from folder by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2476
Fix: Allow img_size to be int or tuple in PatchEmbed by @sddongxh in https://github.com/huggingface/pytorch-image-models/pull/2477
Add LightlyTrain Integration for Pretraining Support by @yutong-xiang-97 in https://github.com/huggingface/pytorch-image-models/pull/2474
Check forwardintermediates features against forwardfeatures output by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2483
More models support forward_intermediates by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2482
Update README.md by @atharva-pathak in https://github.com/huggingface/pytorch-image-models/pull/2484
remove download argument from torch_kwargs for torchvision ImageNet class by @ryan-caesar-ramos in https://github.com/huggingface/pytorch-image-models/pull/2486
Update TNT-(S/B) model weights and add feature extraction support by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2480
Add EVA ViT based PE (Perceptual Encoder) impl by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2487
Add SwiftFormer, SHViT, StarNet, FasterNet and GhostNetV3 by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2499
A cleaned up beit3 remap onto vision_transformer.py vit by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2503
Initial NaFlex ViT model and training support by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2466
Forgot to compact attention pool branches after verifying by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2507
Throw exception on non-directory path for pretrained weights by @emmanuel-ferdman in https://github.com/huggingface/pytorch-image-models/pull/2510
Add corrected_weight decay to several optimizers by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2511
Doing some Claude enabled docstring, type annotation and other cleanup by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2504
Fix #2513, be explicit about stream devices by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2515
Update legacy AdamW impl so it has a multi-tensor impl like NAdamW (n… by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2517
Fix head_dim reference in AttentionRope class of attention.py by @amorehead in https://github.com/huggingface/pytorch-image-models/pull/2519
Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2518
Add initial weights for my first 3 naflexvit_base models by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2523
Support gradient checkpointing in forward_intermediates() by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2501
Update README: add references for additional supported models by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2526
MobileNetV5 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2527

New Contributors

@sddongxh made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2477
@yutong-xiang-97 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2474
@atharva-pathak made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2484
@ryan-caesar-ramos made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2486
@emmanuel-ferdman made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2510
@amorehead made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2519

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.15...v1.0.16

- Python
Published by rwightman 11 months ago

instances - Release v1.0.15

Feb 21, 2025

SigLIP 2 ViT image encoders added (https://huggingface.co/collections/timm/siglip-2-67b8e72ba08b09dd97aecaf9)
- Variable resolution / aspect NaFlex versions are a WIP
Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w/ less training.
- vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k - 88.1% top-1
- vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k - 87.9% top-1
- vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k - 87.3% top-1
- vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k
Updated InternViT-300M '2.5' weights
Release 1.0.15

Feb 1, 2025

FYI PyTorch 2.6 & Python 3.13 are tested and working w/ current main and released version of timm

Jan 27, 2025

Add Kron Optimizer (PSGD w/ Kronecker-factored preconditioner)
- Code from https://github.com/evanatyourservice/kron_torch
- See also https://sites.google.com/site/lixilinx/home/psgd

What's Changed

Fix metavar for --input-size by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2417
Add arguments to the respective argument groups by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2416
Add missing training flag to convertsyncbatchnorm by @collinmccarthy in https://github.com/huggingface/pytorch-image-models/pull/2423
Fix numclasses update in resetclassifier and RDNet forward head call by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2421
timm: add all to init by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2399
Fiddling with Kron (PSGD) optimizer by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2427
Try to force numpy<2.0 for torch 1.13 tests, update newest tested torch to 2.5.1 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2429
Kron flatten improvements + stochastic weight decay by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2431
PSGD: unify RNG by @ClashLuke in https://github.com/huggingface/pytorch-image-models/pull/2433
Add vit so150m2 weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2439
adaptinputconv: add type hints by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2441
SigLIP 2 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2440
timm.models: explicitly export attributes by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2442

New Contributors

@collinmccarthy made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2423
@ClashLuke made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2433

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.14...v1.0.15

- Python
Published by rwightman over 1 year ago

instances - Release v1.0.14

Jan 19, 2025

Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
- vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k - 86.7% top-1
- vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k - 87.4% top-1
- vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k
Misc typing, typo, etc. cleanup
1.0.14 release to get above LeViT fix out

What's Changed

Fix nn.Module type hints by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2400
Add missing paper title by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2405
fix 'timm recipe scripts' link by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2404
fix typo in EfficientNet docs by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2403
disable abbreviating csv inference output with ellipses by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2402
fix incorrect LaTeX formulas by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2406
VGG ConvMlp: fix layer defaults/types by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2409
Implement --no-console-results in inference.py by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2408
LeViT safetensors load is broken by conversion code that wasn't deactivated by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2412
A few more weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2413
Fix typos by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2415

New Contributors

@adamjstewart made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2400

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.13...v1.0.14

- Python
Published by rwightman over 1 year ago

instances - Release v1.0.13

Jan 9, 2025

Add support to train and validate in pure bfloat16 or float16
wandb project name arg added by https://github.com/caojiaolong, use arg.experiment for name
Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
1.0.13 release

Jan 6, 2025

Add torch.utils.checkpoint.checkpoint() wrapper in timm.models that defaults use_reentrant=False, unless TIMM_REENTRANT_CKPT=1 is set in env.

Dec 31, 2024

convnext_nano 384x384 ImageNet-12k pretrain & fine-tune. https://huggingface.co/models?search=convnext_nano%20r384
Add AIM-v2 encoders from https://github.com/apple/ml-aim, see on Hub: https://huggingface.co/models?search=timm%20aimv2
Add PaliGemma2 encoders from https://github.com/google-research/big_vision to existing PaliGemma, see on Hub: https://huggingface.co/models?search=timm%20pali2
Add missing L/14 DFN2B 39B CLIP ViT, vit_large_patch14_clip_224.dfn2b_s39b
Fix existing RmsNorm layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl to SimpleNorm layer, it's LN w/o centering or bias. There were only two timm models using it, and they have been updated.
Allow override of cache_dir arg for model creation
Pass through trust_remote_code for HF datasets wrapper
inception_next_atto model added by creator
Adan optimizer caution, and Lamb decoupled weighgt decay options
Some feature_info metadata fixed by https://github.com/brianhou0208
All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with hf-hub: based loading, and thus will work with new Transformers TimmWrapperModel

What's Changed

Punch cache_dir through model factory / builder / pretrain helpers by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2356
Yuweihao inception next atto merge by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2360
Dataset trust remote tweaks by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2361
Add --dataset-trust-remote-code to the train.py and validate.py scripts by @grodino in https://github.com/huggingface/pytorch-image-models/pull/2328
Fix feature_info.reduction by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2369
Add caution to Adan. Add decouple decay option to LAMB. by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2357
Switching to timm specific weight instances for open_clip image encoders by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2376
Fix broken image link in Quickstart doc by @ariG23498 in https://github.com/huggingface/pytorch-image-models/pull/2381
Supporting aimv2 encoders by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2379
fix: minor typos in markdowns by @ruidazeng in https://github.com/huggingface/pytorch-image-models/pull/2382
Add 384x384 in12k pretrain and finetune for convnext_nano by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2384
Fixed unfused attn2d scale by @laclouis5 in https://github.com/huggingface/pytorch-image-models/pull/2387
Fix MQA V2 by @laclouis5 in https://github.com/huggingface/pytorch-image-models/pull/2388
Wrap torch checkpoint() fn to default use_reentrant flag to False and allow env var override by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2394
Add half-precision (bfloat16, float16) support to train & validate scripts by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2397
Merging wandb project name chages w/ addition by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2398

New Contributors

@brianhou0208 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2369
@ariG23498 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2381
@ruidazeng made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2382
@laclouis5 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2387

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.12...v1.0.13

- Python
Published by rwightman over 1 year ago

instances - Release v1.0.12

Nov 28, 2024

More optimizers
- Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
- Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
- Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
- Cleanup some docstrings and type annotations re optimizers and factory
Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
- https://huggingface.co/timm/mobilenetv4convmedium.e250r384in12kftin1k
- https://huggingface.co/timm/mobilenetv4convmedium.e250r384in12k
- https://huggingface.co/timm/mobilenetv4convmedium.e180adr384_in12k
- https://huggingface.co/timm/mobilenetv4convmedium.e180r384in12k
Add small cs3darknet, quite good for the speed
- https://huggingface.co/timm/cs3darknetfocuss.ra4e3600r256_in1k

Nov 12, 2024

Optimizer factory refactor
- New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
- Add list_optimizers, get_optimizer_class, get_optimizer_info to reworked create_optimizer_v2 fn to explore optimizers, get info or class
- deprecate optim.optim_factory, move fns to optim/_optim_factory.py and optim/_param_groups.py and encourage import via timm.optim
Add Adopt (https://github.com/iShohei220/adopt) optimizer
Add 'Big Vision' variant of Adafactor (https://github.com/google-research/bigvision/blob/main/bigvision/optax.py) optimizer
Fix original Adafactor to pick better factorization dims for convolutions
Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https://github.com/wojtke

Oct 31, 2024

Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat

Oct 19, 2024

Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from MengqingCao that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.

What's Changed

mambaout.py: fixed bug by @NightMachinery in https://github.com/huggingface/pytorch-image-models/pull/2305
Cleanup some amp related behaviour to better support different (non-cuda) devices by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2308
Add NPU backend support for val and inference by @MengqingCao in https://github.com/huggingface/pytorch-image-models/pull/2109
Update some clip pretrained weights to point to new hub locations by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2311
ResNet vs MNV4 v1/v2 18 & 34 weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2316
Replace deprecated positional argument with --data-dir by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2322
Fix typo in train.py: bathes > batches by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2321
Fix positional embedding resampling for non-square inputs in ViT by @wojtke in https://github.com/huggingface/pytorch-image-models/pull/2317
Add trustremotecode argument to ReaderHfds by @grodino in https://github.com/huggingface/pytorch-image-models/pull/2326
Extend train epoch schedule by warmupepochs if warmupprefix enabled by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2325
Extend existing unit tests using Cover-Agent by @mrT23 in https://github.com/huggingface/pytorch-image-models/pull/2331
An impl of adafactor as per big vision (scaling vit) changes by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2320
Add py.typed file as recommended by PEP 561 by @antoinebrl in https://github.com/huggingface/pytorch-image-models/pull/2252
Add CODEOFCONDUCT.md and CITATION.cff files by @AlinaImtiaz018 in https://github.com/huggingface/pytorch-image-models/pull/2333
Add some 384x384 small model weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2334
In dist training, update loss running avg every step, sync on log by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2340
Improve WandB logging by @sinahmr in https://github.com/huggingface/pytorch-image-models/pull/2341
A few weights to merge Friday by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2343
Update timm torchvision resnet weight urls to the updated urls in torchvision by @JohannesTheo in https://github.com/huggingface/pytorch-image-models/pull/2346
More optimizer updates, add MARS, LaProp, add Adopt fix and more by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2347
Cautious optimizer impl plus some typing cleanup. by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2349
Add cautious mars, improve test reliability by skipping grad diff for… by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2351
See if we can avoid some model / layer pickle issues with the aa attr in ConvNormAct by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2353

New Contributors

@MengqingCao made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2109
@JosuaRieder made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2322
@wojtke made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2317
@grodino made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2326
@AlinaImtiaz018 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2333
@sinahmr made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2341
@JohannesTheo made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2346

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.11...v1.0.12

- Python
Published by rwightman over 1 year ago

instances - v1.0.11 Release

Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested.

Oct 16, 2024

Fix error on importing from deprecated path timm.models.registry, increased priority of existing deprecation warnings to be visible
Port weights of InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) to timm as vit_intern300m_patch14_448

Oct 14, 2024

Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
Release 1.0.10

Oct 11, 2024

MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.

|model |imgsize|top1 |top5 |paramcount| |---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------| |mambaoutbaseplusrw.swe150r384in12kftin1k|384 |87.506|98.428|101.66 | |mambaoutbaseplusrw.swe150in12kft_in1k|288 |86.912|98.236|101.66 | |mambaoutbaseplusrw.swe150in12kft_in1k|224 |86.632|98.156|101.66 | |mambaoutbasetallrw.swe500_in1k |288 |84.974|97.332|86.48 | |mambaoutbasewiderw.swe500_in1k |288 |84.962|97.208|94.45 | |mambaoutbaseshortrw.swe500_in1k |288 |84.832|97.27 |88.83 | |mambaout_base.in1k |288 |84.72 |96.93 |84.81 | |mambaoutsmallrw.swe450in1k |288 |84.598|97.098|48.5 | |mambaout_small.in1k |288 |84.5 |96.974|48.49 | |mambaoutbasewiderw.swe500_in1k |224 |84.454|96.864|94.45 | |mambaoutbasetallrw.swe500_in1k |224 |84.434|96.958|86.48 | |mambaoutbaseshortrw.swe500_in1k |224 |84.362|96.952|88.83 | |mambaout_base.in1k |224 |84.168|96.68 |84.81 | |mambaout_small.in1k |224 |84.086|96.63 |48.49 | |mambaoutsmallrw.swe450in1k |224 |84.024|96.752|48.5 | |mambaout_tiny.in1k |288 |83.448|96.538|26.55 | |mambaout_tiny.in1k |224 |82.736|96.1 |26.55 | |mambaout_kobe.in1k |288 |81.054|95.718|9.14 | |mambaout_kobe.in1k |224 |79.986|94.986|9.14 | |mambaout_femto.in1k |288 |79.848|95.14 |7.3 | |mambaout_femto.in1k |224 |78.87 |94.408|7.3 |

SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
- vitso400mpatch14siglip378.webliftin1k - 89.42 top-1
- vitso400mpatch14siglipgap378.weblift_in1k - 89.03
SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
- convnextzeptormsols.ra4e3600r224in1k - 73.20 top-1 @ 224
- convnextzeptorms.ra4e3600r224_in1k - 72.81 @ 224

Sept 2024

Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4convsmall050.e3000r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
Add MobileNetV3-Large variants trained with MNV4 Small recipe
- mobilenetv3large150d.ra4e3600r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3large100.ra4e3600r224_in1k - 77.16 @ 256, 76.31 @ 224

- Python
Published by rwightman over 1 year ago

instances - Release v1.0.10

Oct 14, 2024

Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
Release 1.0.10

Oct 11, 2024

MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.

|model |imgsize|top1 |top5 |paramcount| |---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------| |mambaoutbaseplusrw.swe150r384in12kftin1k|384 |87.506|98.428|101.66 | |mambaoutbaseplusrw.swe150in12kft_in1k|288 |86.912|98.236|101.66 | |mambaoutbaseplusrw.swe150in12kft_in1k|224 |86.632|98.156|101.66 | |mambaoutbasetallrw.swe500_in1k |288 |84.974|97.332|86.48 | |mambaoutbasewiderw.swe500_in1k |288 |84.962|97.208|94.45 | |mambaoutbaseshortrw.swe500_in1k |288 |84.832|97.27 |88.83 | |mambaout_base.in1k |288 |84.72 |96.93 |84.81 | |mambaoutsmallrw.swe450in1k |288 |84.598|97.098|48.5 | |mambaout_small.in1k |288 |84.5 |96.974|48.49 | |mambaoutbasewiderw.swe500_in1k |224 |84.454|96.864|94.45 | |mambaoutbasetallrw.swe500_in1k |224 |84.434|96.958|86.48 | |mambaoutbaseshortrw.swe500_in1k |224 |84.362|96.952|88.83 | |mambaout_base.in1k |224 |84.168|96.68 |84.81 | |mambaout_small.in1k |224 |84.086|96.63 |48.49 | |mambaoutsmallrw.swe450in1k |224 |84.024|96.752|48.5 | |mambaout_tiny.in1k |288 |83.448|96.538|26.55 | |mambaout_tiny.in1k |224 |82.736|96.1 |26.55 | |mambaout_kobe.in1k |288 |81.054|95.718|9.14 | |mambaout_kobe.in1k |224 |79.986|94.986|9.14 | |mambaout_femto.in1k |288 |79.848|95.14 |7.3 | |mambaout_femto.in1k |224 |78.87 |94.408|7.3 |

SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
- vitso400mpatch14siglip378.webliftin1k - 89.42 top-1
- vitso400mpatch14siglipgap378.weblift_in1k - 89.03
SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
- convnextzeptormsols.ra4e3600r224in1k - 73.20 top-1 @ 224
- convnextzeptorms.ra4e3600r224_in1k - 72.81 @ 224

Sept 2024

Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4convsmall050.e3000r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
Add MobileNetV3-Large variants trained with MNV4 Small recipe
- mobilenetv3large150d.ra4e3600r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3large100.ra4e3600r224_in1k - 77.16 @ 256, 76.31 @ 224

- Python
Published by rwightman over 1 year ago

instances - Release v1.0.9

Aug 21, 2024

Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models

| model | top1 | top5 | paramcount | imgsize | | -------------------------------------------------- | ------ | ------ | ----------- | -------- | | vitmediumdpatch16reg4gap384.sbb2e200in12kft_in1k | 87.438 | 98.256 | 64.11 | 384 | | vitmediumdpatch16reg4gap256.sbb2e200in12kft_in1k | 86.608 | 97.934 | 64.11 | 256 | | vitbetwixtpatch16reg4gap384.sbb2e200in12kft_in1k | 86.594 | 98.02 | 60.4 | 384 | | vitbetwixtpatch16reg4gap256.sbb2e200in12kft_in1k | 85.734 | 97.61 | 60.4 | 256 |

MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w/ MNV4 baseline challenge recipe

| model | top1 | top5 | paramcount | imgsize | |--------------------------------------------------------------------------------------------------------------------------|--------|--------|-------------|----------| | resnet50d.ra4e3600r224_in1k | 81.838 | 95.922 | 25.58 | 288 | | efficientnetb1.ra4e3600r240in1k | 81.440 | 95.700 | 7.79 | 288 | | resnet50d.ra4e3600r224_in1k | 80.952 | 95.384 | 25.58 | 224 | | efficientnetb1.ra4e3600r240in1k | 80.406 | 95.152 | 7.79 | 240 | | mobilenetv1125.ra4e3600r224in1k | 77.600 | 93.804 | 6.27 | 256 | | mobilenetv1125.ra4e3600r224in1k | 76.924 | 93.234 | 6.27 | 224 |

Add SAM2 (HieraDet) backbone arch & weight loading support
Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k

|model |top1 |top5 |paramcount| |---------------------------------|------|------|-----------| |hierasmallabswin256.sbb2e200in12kftin1k |84.912|97.260|35.01 | |hierasmallabswin256.sbb2pde200in12kftin1k |84.560|97.106|35.01 |

Aug 8, 2024

Add RDNet ('DenseNets Reloaded', https://arxiv.org/abs/2403.19588), thanks Donghyun Kim

- Python
Published by rwightman almost 2 years ago

instances - Release v1.0.8

July 28, 2024

Add mobilenet_edgetpu_v2_m weights w/ ra4 mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256.
Release 1.0.8

July 26, 2024

More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models

| model |top1 |top1err|top5 |top5err|paramcount|imgsize| |--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| | mobilenetv4convaalarge.e230r448in12kft_in1k|84.99 |15.01 |97.294|2.706 |32.59 |544 | | mobilenetv4convaalarge.e230r384in12kft_in1k|84.772|15.228 |97.344|2.656 |32.59 |480 | | mobilenetv4convaalarge.e230r448in12kft_in1k|84.64 |15.36 |97.114|2.886 |32.59 |448 | | mobilenetv4convaalarge.e230r384in12kft_in1k|84.314|15.686 |97.102|2.898 |32.59 |384 | | mobilenetv4convaalarge.e600r384_in1k |83.824|16.176 |96.734|3.266 |32.59 |480 | | mobilenetv4convaalarge.e600r384_in1k |83.244|16.756 |96.392|3.608 |32.59 |384 | | mobilenetv4hybridmedium.e200r256in12kftin1k|82.99 |17.01 |96.67 |3.33 |11.07 |320 | | mobilenetv4hybridmedium.e200r256in12kftin1k|82.364|17.636 |96.256|3.744 |11.07 |256 |

Impressive MobileNet-V1 and EfficientNet-B0 baseline challenges (https://huggingface.co/blog/rwightman/mobilenet-baselines)

| model |top1 |top1err|top5 |top5err|paramcount|imgsize| |--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| | efficientnetb0.ra4e3600r224in1k |79.364|20.636 |94.754|5.246 |5.29 |256 | | efficientnetb0.ra4e3600r224in1k |78.584|21.416 |94.338|5.662 |5.29 |224 |
| mobilenetv1100h.ra4e3600r224in1k |76.596|23.404 |93.272|6.728 |5.28 |256 | | mobilenetv1100.ra4e3600r224in1k |76.094|23.906 |93.004|6.996 |4.23 |256 | | mobilenetv1100h.ra4e3600r224in1k |75.662|24.338 |92.504|7.496 |5.28 |224 | | mobilenetv1100.ra4e3600r224in1k |75.382|24.618 |92.312|7.688 |4.23 |224 |

Prototype of set_input_size() added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation.
Improved support in swin for different size handling, in addition to set_input_size, always_partition and strict_img_size args have been added to __init__ to allow more flexible input size constraints
Fix out of order indices info for intermediate 'Getter' feature wrapper, check out or range indices for same.
Add several tiny < .5M param models for testing that are actually trained on ImageNet-1k

|model |top1 |top1err|top5 |top5err|paramcount|imgsize|croppct| |----------------------------|------|--------|------|--------|-----------|--------|--------| |testefficientnet.r160in1k |47.156|52.844 |71.726|28.274 |0.36 |192 |1.0 | |testbyobnet.r160in1k |46.698|53.302 |71.674|28.326 |0.46 |192 |1.0 | |testefficientnet.r160in1k |46.426|53.574 |70.928|29.072 |0.36 |160 |0.875 | |testbyobnet.r160in1k |45.378|54.622 |70.572|29.428 |0.46 |160 |0.875 | |testvit.r160in1k|42.0 |58.0 |68.664|31.336 |0.37 |192 |1.0 | |testvit.r160_in1k|40.822|59.178 |67.212|32.788 |0.37 |160 |0.875 |

Fix vit reg token init, thanks Promisery
Other misc fixes

June 24, 2024

3 more MobileNetV4 hyrid weights with different MQA weight init scheme

| model |top1 |top1err|top5 |top5err|paramcount|imgsize| |--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| | mobilenetv4hybridlarge.ixe600r384_in1k |84.356|15.644 |96.892 |3.108 |37.76 |448 | | mobilenetv4hybridlarge.ixe600r384_in1k |83.990|16.010 |96.702 |3.298 |37.76 |384 | | mobilenetv4hybridmedium.ixe550r384_in1k |83.394|16.606 |96.760|3.240 |11.07 |448 | | mobilenetv4hybridmedium.ixe550r384_in1k |82.968|17.032 |96.474|3.526 |11.07 |384 | | mobilenetv4hybridmedium.ixe550r256_in1k |82.492|17.508 |96.278|3.722 |11.07 |320 | | mobilenetv4hybridmedium.ixe550r256_in1k |81.446|18.554 |95.704|4.296 |11.07 |256 | * florence2 weight loading in DaViT model

- Python
Published by rwightman almost 2 years ago

instances - Release v1.0.7

June 12, 2024

MobileNetV4 models and initial set of timm trained weights added:

| model |top1 |top1err|top5 |--------------------------------------------------------------------------------------------------|------|--------|------|------ | mobilenetv4hybridlarge.e600r384in1k | mobilenetv4hybridlarge.e600r384in1k | mobilenetv4convlarge.e600r384in1k |83.392|16.608 | mobilenetv4convlarge.e600r384in1k |82.952|17.048 | mobilenetv4convlarge.e500r256in1k |82.674|17.326 | mobilenetv4convlarge.e500r256in1k | mobilenetv4hybridmedium.e500r224in1k | mobilenetv4convmedium.e500r256in1k | mobilenetv4hybridmedium.e500r224in1k | mobilenetv4convblurmedium.e500r224_in1k | mobilenetv4convmedium.e500r256in1k | mobilenetv4convmedium.e500r224in1k | mobilenetv4convblurmedium.e500r224_in1k | mobilenetv4convmedium.e500r224in1k | mobilenetv4convsmall.e2400r224in1k | mobilenetv4convsmall.e1200r224in1k | mobilenetv4convsmall.e2400r224in1k | mobilenetv4convsmall.e1200r224in1k |top5err|paramcount|imgsize| --|-----------|--------| |84.266|15.734 |96.936 |3.064 |37.76 |448 | |83.800|16.200 |96.770 |3.230 |37.76 |384 | |96.622 |3.378 |32.59 |448 | |96.266 |3.734 |32.59 |384 | |96.31 |3.69 |32.59 |320 | |81.862|18.138 |95.69 |4.31 |32.59 |256 | |81.276|18.724 |95.742|4.258 |11.07 |256 | |80.858|19.142 |95.768|4.232 |9.72 |320 | |80.442|19.558 |95.38 |4.62 |11.07 |224 | |80.142|19.858 |95.298|4.702 |9.72 |256 | |79.928|20.072 |95.184|4.816 |9.72 |256 | |79.808|20.192 |95.186|4.814 |9.72 |256 | |79.438|20.562 |94.932|5.068 |9.72 |224 | |79.094|20.906 |94.77 |5.23 |9.72 |224 | |74.616|25.384 |92.072|7.928 |3.77 |256 | |74.292|25.708 |92.116|7.884 |3.77 |256 | |73.756|26.244 |91.422|8.578 |3.77 |224 | |73.454|26.546 |91.34 |8.66 |3.77 |224 |

Apple MobileCLIP (https://arxiv.org/pdf/2311.17049, FastViT and ViT-B) image tower model support & weights added (part of OpenCLIP support).
ViTamin (https://arxiv.org/abs/2404.02132) CLIP image tower model & weights added (part of OpenCLIP support).
OpenAI CLIP Modified ResNet image tower modelling & weight support (via ByobNet). Refactor AttentionPool2d.
Refactoring & improvements, especially related to classifierreset and numfeatures vs headhiddensize for forwardfeatures() vs prelogits

- Python
Published by rwightman almost 2 years ago

instances - Release v1.0.3

May 14, 2024

Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
Add normalize= flag for transorms, return non-normalized torch.Tensor with original dytpe (for chug)
Version 1.0.3 release

May 11, 2024

Searching for Better ViT Baselines (For the GPU Poor) weights and vit variants released. Exploring model shapes between Tiny and Base.

| model | top1 | top5 | paramcount | imgsize | | -------------------------------------------------- | ------ | ------ | ----------- | -------- | | vitmediumdpatch16reg4gap256.sbbin12kftin1k | 86.202 | 97.874 | 64.11 | 256 | | vitbetwixtpatch16reg4gap256.sbbin12kftin1k | 85.418 | 97.48 | 60.4 | 256 | | vitmediumdpatch16ropereg1gap256.sbb_in1k | 84.322 | 96.812 | 63.95 | 256 | | vitbetwixtpatch16ropereg4gap256.sbb_in1k | 83.906 | 96.684 | 60.23 | 256 | | vitbasepatch16ropereg1gap256.sbb_in1k | 83.866 | 96.67 | 86.43 | 256 | | vitmediumpatch16ropereg1gap256.sbb_in1k | 83.81 | 96.824 | 38.74 | 256 | | vitbetwixtpatch16reg4gap256.sbbin1k | 83.706 | 96.616 | 60.4 | 256 | | vitbetwixtpatch16reg1gap256.sbbin1k | 83.628 | 96.544 | 60.4 | 256 | | vitmediumpatch16reg4gap256.sbbin1k | 83.47 | 96.622 | 38.88 | 256 | | vitmediumpatch16reg1gap256.sbbin1k | 83.462 | 96.548 | 38.88 | 256 | | vitlittlepatch16reg4gap256.sbbin1k | 82.514 | 96.262 | 22.52 | 256 | | vitweepatch16reg1gap256.sbbin1k | 80.256 | 95.360 | 13.42 | 256 | | vitpweepatch16reg1gap256.sbbin1k | 80.072 | 95.136 | 15.25 | 256 | | vitmediumdpatch16reg4gap256.sbbin12k | N/A | N/A | 64.11 | 256 | | vitbetwixtpatch16reg4gap256.sbbin12k | N/A | N/A | 60.4 | 256 |

AttentionExtract helper added to extract attention maps from timm models. See example in https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949
forward_intermediates() API refined and added to more models including some ConvNets that have other extraction methods.
1017 of 1047 model architectures support features_only=True feature extraction. Remaining 34 architectures can be supported but based on priority requests.
Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.

April 11, 2024

Prepping for a long overdue 1.0 release, things have been stable for a while now.
Significant feature that's been missing for a while, features_only=True support for ViT models with flat hidden states or non-std module layouts (so far covering 'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*')
Above feature support achieved through a new forward_intermediates() API that can be used with a feature wrapping module or direclty. ```python model = timm.createmodel('vitbasepatch16224') finalfeat, intermediates = model.forwardintermediates(input) output = model.forwardhead(finalfeat) # pooling + classifier head

print(final_feat.shape) torch.Size([2, 197, 768])

for f in intermediates: print(f.shape) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14])

print(output.shape) torch.Size([2, 1000]) ```

```python model = timm.createmodel('eva02basepatch16clip224', pretrained=True, imgsize=512, featuresonly=True, outindices=(-3, -2,)) output = model(torch.randn(2, 3, 512, 512))

for o in output:
print(o.shape)
torch.Size([2, 768, 32, 32]) torch.Size([2, 768, 32, 32]) ``` * TinyCLIP vision tower weights added, thx Thien Tran

- Python
Published by rwightman about 2 years ago

instances - Release v0.9.16

Feb 19, 2024

Next-ViT models added. Adapted from https://github.com/bytedance/Next-ViT
HGNet and PP-HGNetV2 models added. Adapted from https://github.com/PaddlePaddle/PaddleClas by SeeFun
Removed setup.py, moved to pyproject.toml based build supported by PDM
Add updated model EMA impl using foreach for less overhead
Support device args in train script for non GPU devices
Other misc fixes and small additions
Min supported Python version increased to 3.8
Release 0.9.16

Jan 8, 2024

Datasets & transform refactoring * HuggingFace streaming (iterable) dataset support (--dataset hfids:org/dataset) * Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset * Tested HF datasets and webdataset wrapper streaming from HF hub with recent timm ImageNet uploads to https://huggingface.co/timm * Make input & target column/field keys consistent across datasets and pass via args * Full monochrome support when using e:g: --input-size 1 224 224 or --in-chans 1, sets PIL image conversion appropriately in dataset * Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project * Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args * Allow train without validation set (--val-split '') in train script * Add --bce-sum (sum over class dim) and --bce-pos-weight (positive weighting) args for training as they're common BCE loss tweaks I was often hard coding