Recent Releases of instances
instances - Release v1.0.19
Patch release for Python 3.9 compat break in 1.0.18
July 23, 2025
- Add
set_input_size()method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models. - Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
- Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.
July 21, 2025
- ROPE support added to NaFlexViT. All models covered by the EVA base (
eva.py) including EVA, EVA02, Meta PE ViT,timmSBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT whenuse_naflex=Truepassed at model creation time - More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
- PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
- Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).
What's Changed
- Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2552
- Support setinputsize() in EVA models by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2554
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.17...v1.0.18
- Python
Published by rwightman 10 months ago
instances - Release v1.0.18
July 23, 2025
- Add
set_input_size()method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models. - Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
July 21, 2025
- ROPE support added to NaFlexViT. All models covered by the EVA base (
eva.py) including EVA, EVA02, Meta PE ViT,timmSBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT whenuse_naflex=Truepassed at model creation time - More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
- PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
- Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).
What's Changed
- Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2552
- Support setinputsize() in EVA models by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2554
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.17...v1.0.18
- Python
Published by rwightman 10 months ago
instances - Release v1.0.17
July 7, 2025
- MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
- Add stem bias (zero'd in updated weights, compat break with old weights)
- GELU -> GELU (tanh approx). A minor change to be closer to JAX
- Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
- Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
- Some typing, argument cleanup for norm, norm+act layers done with above
- Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in
eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
|model |imgsize|top1 |top5 |paramcount| |--------------------------------------------------|--------|------|------|-----------| |vitlargepatch16ropemixedape224.naverin1k |224 |84.84 |97.122|304.4 | |vitlargepatch16ropemixed224.naverin1k |224 |84.828|97.116|304.2 | |vitlargepatch16ropeape224.naverin1k |224 |84.65 |97.154|304.37 | |vitlargepatch16rope224.naverin1k |224 |84.648|97.122|304.17 | |vitbasepatch16ropemixedape224.naverin1k |224 |83.894|96.754|86.59 | |vitbasepatch16ropemixed224.naverin1k |224 |83.804|96.712|86.44 | |vitbasepatch16ropeape224.naverin1k |224 |83.782|96.61 |86.59 | |vitbasepatch16rope224.naverin1k |224 |83.718|96.672|86.43 | |vitsmallpatch16rope224.naverin1k |224 |81.23 |95.022|21.98 | |vitsmallpatch16ropemixed224.naverin1k |224 |81.216|95.022|21.99 | |vitsmallpatch16ropeape224.naverin1k |224 |81.004|95.016|22.06 | |vitsmallpatch16ropemixedape224.naverin1k |224 |80.986|94.976|22.06 | * Some cleanup of ROPE modules, helpers, and FX tracing leaf registration * Preparing version 1.0.17 release
What's Changed
- Adding Naver rope-vit compatibility to EVA ViT by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2529
- Update nograd usage to inferencemode if possible by @GuillaumeErhard in https://github.com/huggingface/pytorch-image-models/pull/2534
- Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2537
- Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2538
- Add flag to enable float32 computation for normalization (norm + affine) by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2536
- fix: mnv5 conv_stem bias and GELU with approximate=tanh by @RyanMullins in https://github.com/huggingface/pytorch-image-models/pull/2533
- Fixup casting issues for weights/bias in fp32 norm layers by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2539
- Fix H, W ordering for xy indexing in ROPE by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2541
- Fix 3 typos in README.md by @robin-ede in https://github.com/huggingface/pytorch-image-models/pull/2544
New Contributors
- @GuillaumeErhard made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2534
- @RyanMullins made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2533
- @robin-ede made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2544
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.16...v1.0.17
- Python
Published by rwightman 11 months ago
instances - Release v1.0.16
June 26, 2025
- MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
- Version 1.0.16 released
June 23, 2025
- Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
- Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
- Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.
| Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len |
|:---|:---:|:---:|:---:|:---:|
| naflexvitbasepatch16pargap.e300s576in1k | 83.67 | 96.45 | 86.63 | 576 |
| naflexvitbasepatch16parfacgap.e300s576in1k | 83.63 | 96.41 | 86.46 | 576 |
| naflexvitbasepatch16gap.e300s576_in1k | 83.50 | 96.46 | 86.63 | 576 |
* Support gradient checkpointing for forward_intermediates and fix some checkpointing bugs. Thanks https://github.com/brianhou0208
* Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
* Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
* Fix cuda stream bug in prefetch loader
June 5, 2025
- Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
- Encapsulated embedding and position encoding in a single module
- Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
- Support for NaFlex variable aspect, variable resolution (SigLip-2: https://arxiv.org/abs/2502.14786)
- Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
- Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
- Existing vit models in
vision_transformer.pycan be loaded into the NaFlexVit model by adding theuse_naflex=Trueflag tocreate_model- Some native weights coming soon
- A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
- To enable in
train.pyandvalidate.pyadd the--naflex-loaderarg, must be used with a NaFlexVit
- To enable in
- To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
- The training has some extra args features worth noting
- The
--naflex-train-seq-lens'argument specifies which sequence lengths to randomly pick from per batch during training - The
--naflex-max-seq-lenargument sets the target sequence length for validation - Adding
--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24will enable random patch size selection per-batch w/ interpolation - The
--naflex-loss-scalearg changes loss scaling mode per batch relative to the batch size,timmNaFlex loading changes the batch size for each seq len
- The
May 28, 2025
- Add a number of small/fast models thanks to https://github.com/brianhou0208
- SwiftFormer - (ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
- FasterNet - (CVPR2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
- SHViT - (CVPR2024) SHViT: Single-Head Vision Transformer with Memory Efficient
- StarNet - (CVPR2024) Rewrite the Stars
- GhostNet-V3 GhostNetV3: Exploring the Training Strategies for Compact Models
- Update EVA ViT (closest match) to support Perception Encoder models (https://arxiv.org/abs/2504.13181) from Meta, loading Hub weights but I still need to push dedicated
timmweights- Add some flexibility to ROPE impl
- Big increase in number of models supporting
forward_intermediates()and some additional fixes thanks to https://github.com/brianhou0208- DaViT, EdgeNeXt, EfficientFormerV2, EfficientViT(MIT), EfficientViT(MSRA), FocalNet, GCViT, HGNet /V2, InceptionNeXt, Inception-V4, MambaOut, MetaFormer, NesT, Next-ViT, PiT, PVT V2, RepGhostNet, RepViT, ResNetV2, ReXNet, TinyViT, TResNet, VoV
- TNT model updated w/ new weights
forward_intermediates()thanks to https://github.com/brianhou0208 - Add
local-dir:pretrained schema, can uselocal-dir:/path/to/model/folderfor model name to source model / pretrained cfg & weights Hugging Face Hub models (config.json + weights file) from a local folder. - Fixes, improvements for onnx export
What's Changed
- Fix arg merging of sknet, old seresnet. Fix #2470 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2471
- Fix onnx export by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2475
- Add local-dir: schema support for model loading (config + weights) from folder by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2476
- Fix: Allow img_size to be int or tuple in PatchEmbed by @sddongxh in https://github.com/huggingface/pytorch-image-models/pull/2477
- Add LightlyTrain Integration for Pretraining Support by @yutong-xiang-97 in https://github.com/huggingface/pytorch-image-models/pull/2474
- Check forwardintermediates features against forwardfeatures output by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2483
- More models support forward_intermediates by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2482
- Update README.md by @atharva-pathak in https://github.com/huggingface/pytorch-image-models/pull/2484
- remove
downloadargument from torch_kwargs for torchvisionImageNetclass by @ryan-caesar-ramos in https://github.com/huggingface/pytorch-image-models/pull/2486 - Update TNT-(S/B) model weights and add feature extraction support by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2480
- Add EVA ViT based PE (Perceptual Encoder) impl by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2487
- Add SwiftFormer, SHViT, StarNet, FasterNet and GhostNetV3 by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2499
- A cleaned up beit3 remap onto vision_transformer.py vit by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2503
- Initial NaFlex ViT model and training support by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2466
- Forgot to compact attention pool branches after verifying by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2507
- Throw exception on non-directory path for pretrained weights by @emmanuel-ferdman in https://github.com/huggingface/pytorch-image-models/pull/2510
- Add corrected_weight decay to several optimizers by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2511
- Doing some Claude enabled docstring, type annotation and other cleanup by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2504
- Fix #2513, be explicit about stream devices by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2515
- Update legacy AdamW impl so it has a multi-tensor impl like NAdamW (n… by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2517
- Fix
head_dimreference inAttentionRopeclass ofattention.pyby @amorehead in https://github.com/huggingface/pytorch-image-models/pull/2519 - Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2518
- Add initial weights for my first 3 naflexvit_base models by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2523
- Support gradient checkpointing in
forward_intermediates()by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2501 - Update README: add references for additional supported models by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2526
- MobileNetV5 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2527
New Contributors
- @sddongxh made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2477
- @yutong-xiang-97 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2474
- @atharva-pathak made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2484
- @ryan-caesar-ramos made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2486
- @emmanuel-ferdman made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2510
- @amorehead made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2519
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.15...v1.0.16
- Python
Published by rwightman 11 months ago
instances - Release v1.0.15
Feb 21, 2025
- SigLIP 2 ViT image encoders added (https://huggingface.co/collections/timm/siglip-2-67b8e72ba08b09dd97aecaf9)
- Variable resolution / aspect NaFlex versions are a WIP
- Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w/ less training.
vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k- 88.1% top-1vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k- 87.9% top-1vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k- 87.3% top-1vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k
- Updated InternViT-300M '2.5' weights
- Release 1.0.15
Feb 1, 2025
- FYI PyTorch 2.6 & Python 3.13 are tested and working w/ current main and released version of
timm
Jan 27, 2025
- Add Kron Optimizer (PSGD w/ Kronecker-factored preconditioner)
- Code from https://github.com/evanatyourservice/kron_torch
- See also https://sites.google.com/site/lixilinx/home/psgd
What's Changed
- Fix metavar for
--input-sizeby @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2417 - Add arguments to the respective argument groups by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2416
- Add missing training flag to convertsyncbatchnorm by @collinmccarthy in https://github.com/huggingface/pytorch-image-models/pull/2423
- Fix numclasses update in resetclassifier and RDNet forward head call by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2421
- timm: add all to init by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2399
- Fiddling with Kron (PSGD) optimizer by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2427
- Try to force numpy<2.0 for torch 1.13 tests, update newest tested torch to 2.5.1 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2429
- Kron flatten improvements + stochastic weight decay by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2431
- PSGD: unify RNG by @ClashLuke in https://github.com/huggingface/pytorch-image-models/pull/2433
- Add vit so150m2 weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2439
- adaptinputconv: add type hints by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2441
- SigLIP 2 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2440
- timm.models: explicitly export attributes by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2442
New Contributors
- @collinmccarthy made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2423
- @ClashLuke made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2433
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.14...v1.0.15
- Python
Published by rwightman over 1 year ago
instances - Release v1.0.14
Jan 19, 2025
- Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
- Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k- 86.7% top-1vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k- 87.4% top-1vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k
- Misc typing, typo, etc. cleanup
- 1.0.14 release to get above LeViT fix out
What's Changed
- Fix nn.Module type hints by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2400
- Add missing paper title by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2405
- fix 'timm recipe scripts' link by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2404
- fix typo in EfficientNet docs by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2403
- disable abbreviating csv inference output with ellipses by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2402
- fix incorrect LaTeX formulas by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2406
- VGG ConvMlp: fix layer defaults/types by @adamjstewart in https://github.com/huggingface/pytorch-image-models/pull/2409
- Implement --no-console-results in inference.py by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2408
- LeViT safetensors load is broken by conversion code that wasn't deactivated by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2412
- A few more weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2413
- Fix typos by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2415
New Contributors
- @adamjstewart made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2400
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.13...v1.0.14
- Python
Published by rwightman over 1 year ago
instances - Release v1.0.13
Jan 9, 2025
- Add support to train and validate in pure
bfloat16orfloat16 wandbproject name arg added by https://github.com/caojiaolong, use arg.experiment for name- Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
- 1.0.13 release
Jan 6, 2025
- Add
torch.utils.checkpoint.checkpoint()wrapper intimm.modelsthat defaultsuse_reentrant=False, unlessTIMM_REENTRANT_CKPT=1is set in env.
Dec 31, 2024
convnext_nano384x384 ImageNet-12k pretrain & fine-tune. https://huggingface.co/models?search=convnext_nano%20r384- Add AIM-v2 encoders from https://github.com/apple/ml-aim, see on Hub: https://huggingface.co/models?search=timm%20aimv2
- Add PaliGemma2 encoders from https://github.com/google-research/big_vision to existing PaliGemma, see on Hub: https://huggingface.co/models?search=timm%20pali2
- Add missing L/14 DFN2B 39B CLIP ViT,
vit_large_patch14_clip_224.dfn2b_s39b - Fix existing
RmsNormlayer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl toSimpleNormlayer, it's LN w/o centering or bias. There were only twotimmmodels using it, and they have been updated. - Allow override of
cache_dirarg for model creation - Pass through
trust_remote_codefor HF datasets wrapper inception_next_attomodel added by creator- Adan optimizer caution, and Lamb decoupled weighgt decay options
- Some feature_info metadata fixed by https://github.com/brianhou0208
- All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with
hf-hub:based loading, and thus will work with new TransformersTimmWrapperModel
What's Changed
- Punch cache_dir through model factory / builder / pretrain helpers by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2356
- Yuweihao inception next atto merge by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2360
- Dataset trust remote tweaks by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2361
- Add --dataset-trust-remote-code to the train.py and validate.py scripts by @grodino in https://github.com/huggingface/pytorch-image-models/pull/2328
- Fix feature_info.reduction by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2369
- Add caution to Adan. Add decouple decay option to LAMB. by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2357
- Switching to timm specific weight instances for open_clip image encoders by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2376
- Fix broken image link in
Quickstartdoc by @ariG23498 in https://github.com/huggingface/pytorch-image-models/pull/2381 - Supporting aimv2 encoders by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2379
- fix: minor typos in markdowns by @ruidazeng in https://github.com/huggingface/pytorch-image-models/pull/2382
- Add 384x384 in12k pretrain and finetune for convnext_nano by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2384
- Fixed unfused attn2d scale by @laclouis5 in https://github.com/huggingface/pytorch-image-models/pull/2387
- Fix MQA V2 by @laclouis5 in https://github.com/huggingface/pytorch-image-models/pull/2388
- Wrap torch checkpoint() fn to default use_reentrant flag to False and allow env var override by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2394
- Add half-precision (bfloat16, float16) support to train & validate scripts by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2397
- Merging wandb project name chages w/ addition by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2398
New Contributors
- @brianhou0208 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2369
- @ariG23498 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2381
- @ruidazeng made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2382
- @laclouis5 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2387
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.12...v1.0.13
- Python
Published by rwightman over 1 year ago
instances - Release v1.0.12
Nov 28, 2024
- More optimizers
- Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
- Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
- Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
- Cleanup some docstrings and type annotations re optimizers and factory
- Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
- https://huggingface.co/timm/mobilenetv4convmedium.e250r384in12kftin1k
- https://huggingface.co/timm/mobilenetv4convmedium.e250r384in12k
- https://huggingface.co/timm/mobilenetv4convmedium.e180adr384_in12k
- https://huggingface.co/timm/mobilenetv4convmedium.e180r384in12k
- Add small cs3darknet, quite good for the speed
- https://huggingface.co/timm/cs3darknetfocuss.ra4e3600r256_in1k
Nov 12, 2024
- Optimizer factory refactor
- New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
- Add
list_optimizers,get_optimizer_class,get_optimizer_infoto reworkedcreate_optimizer_v2fn to explore optimizers, get info or class - deprecate
optim.optim_factory, move fns tooptim/_optim_factory.pyandoptim/_param_groups.pyand encourage import viatimm.optim
- Add Adopt (https://github.com/iShohei220/adopt) optimizer
- Add 'Big Vision' variant of Adafactor (https://github.com/google-research/bigvision/blob/main/bigvision/optax.py) optimizer
- Fix original Adafactor to pick better factorization dims for convolutions
- Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
- dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https://github.com/wojtke
Oct 31, 2024
Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat
Oct 19, 2024
- Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from MengqingCao that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.
What's Changed
- mambaout.py: fixed bug by @NightMachinery in https://github.com/huggingface/pytorch-image-models/pull/2305
- Cleanup some amp related behaviour to better support different (non-cuda) devices by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2308
- Add NPU backend support for val and inference by @MengqingCao in https://github.com/huggingface/pytorch-image-models/pull/2109
- Update some clip pretrained weights to point to new hub locations by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2311
- ResNet vs MNV4 v1/v2 18 & 34 weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2316
- Replace deprecated positional argument with --data-dir by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2322
- Fix typo in train.py: bathes > batches by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2321
- Fix positional embedding resampling for non-square inputs in ViT by @wojtke in https://github.com/huggingface/pytorch-image-models/pull/2317
- Add trustremotecode argument to ReaderHfds by @grodino in https://github.com/huggingface/pytorch-image-models/pull/2326
- Extend train epoch schedule by warmupepochs if warmupprefix enabled by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2325
- Extend existing unit tests using Cover-Agent by @mrT23 in https://github.com/huggingface/pytorch-image-models/pull/2331
- An impl of adafactor as per big vision (scaling vit) changes by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2320
- Add py.typed file as recommended by PEP 561 by @antoinebrl in https://github.com/huggingface/pytorch-image-models/pull/2252
- Add CODEOFCONDUCT.md and CITATION.cff files by @AlinaImtiaz018 in https://github.com/huggingface/pytorch-image-models/pull/2333
- Add some 384x384 small model weights by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2334
- In dist training, update loss running avg every step, sync on log by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2340
- Improve WandB logging by @sinahmr in https://github.com/huggingface/pytorch-image-models/pull/2341
- A few weights to merge Friday by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2343
- Update timm torchvision resnet weight urls to the updated urls in torchvision by @JohannesTheo in https://github.com/huggingface/pytorch-image-models/pull/2346
- More optimizer updates, add MARS, LaProp, add Adopt fix and more by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2347
- Cautious optimizer impl plus some typing cleanup. by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2349
- Add cautious mars, improve test reliability by skipping grad diff for… by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2351
- See if we can avoid some model / layer pickle issues with the aa attr in ConvNormAct by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2353
New Contributors
- @MengqingCao made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2109
- @JosuaRieder made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2322
- @wojtke made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2317
- @grodino made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2326
- @AlinaImtiaz018 made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2333
- @sinahmr made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2341
- @JohannesTheo made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2346
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.11...v1.0.12
- Python
Published by rwightman over 1 year ago
instances - v1.0.11 Release
Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested.
Oct 16, 2024
- Fix error on importing from deprecated path
timm.models.registry, increased priority of existing deprecation warnings to be visible - Port weights of InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) to
timmasvit_intern300m_patch14_448
Oct 14, 2024
- Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
- Release 1.0.10
Oct 11, 2024
- MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
|model |imgsize|top1 |top5 |paramcount| |---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------| |mambaoutbaseplusrw.swe150r384in12kftin1k|384 |87.506|98.428|101.66 | |mambaoutbaseplusrw.swe150in12kft_in1k|288 |86.912|98.236|101.66 | |mambaoutbaseplusrw.swe150in12kft_in1k|224 |86.632|98.156|101.66 | |mambaoutbasetallrw.swe500_in1k |288 |84.974|97.332|86.48 | |mambaoutbasewiderw.swe500_in1k |288 |84.962|97.208|94.45 | |mambaoutbaseshortrw.swe500_in1k |288 |84.832|97.27 |88.83 | |mambaout_base.in1k |288 |84.72 |96.93 |84.81 | |mambaoutsmallrw.swe450in1k |288 |84.598|97.098|48.5 | |mambaout_small.in1k |288 |84.5 |96.974|48.49 | |mambaoutbasewiderw.swe500_in1k |224 |84.454|96.864|94.45 | |mambaoutbasetallrw.swe500_in1k |224 |84.434|96.958|86.48 | |mambaoutbaseshortrw.swe500_in1k |224 |84.362|96.952|88.83 | |mambaout_base.in1k |224 |84.168|96.68 |84.81 | |mambaout_small.in1k |224 |84.086|96.63 |48.49 | |mambaoutsmallrw.swe450in1k |224 |84.024|96.752|48.5 | |mambaout_tiny.in1k |288 |83.448|96.538|26.55 | |mambaout_tiny.in1k |224 |82.736|96.1 |26.55 | |mambaout_kobe.in1k |288 |81.054|95.718|9.14 | |mambaout_kobe.in1k |224 |79.986|94.986|9.14 | |mambaout_femto.in1k |288 |79.848|95.14 |7.3 | |mambaout_femto.in1k |224 |78.87 |94.408|7.3 |
- SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
- vitso400mpatch14siglip378.webliftin1k - 89.42 top-1
- vitso400mpatch14siglipgap378.weblift_in1k - 89.03
- SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
- Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
- convnextzeptormsols.ra4e3600r224in1k - 73.20 top-1 @ 224
- convnextzeptorms.ra4e3600r224_in1k - 72.81 @ 224
Sept 2024
- Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
- Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4convsmall050.e3000r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
- Add MobileNetV3-Large variants trained with MNV4 Small recipe
- mobilenetv3large150d.ra4e3600r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3large100.ra4e3600r224_in1k - 77.16 @ 256, 76.31 @ 224
- Python
Published by rwightman over 1 year ago
instances - Release v1.0.10
Oct 14, 2024
- Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
- Release 1.0.10
Oct 11, 2024
- MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
|model |imgsize|top1 |top5 |paramcount| |---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------| |mambaoutbaseplusrw.swe150r384in12kftin1k|384 |87.506|98.428|101.66 | |mambaoutbaseplusrw.swe150in12kft_in1k|288 |86.912|98.236|101.66 | |mambaoutbaseplusrw.swe150in12kft_in1k|224 |86.632|98.156|101.66 | |mambaoutbasetallrw.swe500_in1k |288 |84.974|97.332|86.48 | |mambaoutbasewiderw.swe500_in1k |288 |84.962|97.208|94.45 | |mambaoutbaseshortrw.swe500_in1k |288 |84.832|97.27 |88.83 | |mambaout_base.in1k |288 |84.72 |96.93 |84.81 | |mambaoutsmallrw.swe450in1k |288 |84.598|97.098|48.5 | |mambaout_small.in1k |288 |84.5 |96.974|48.49 | |mambaoutbasewiderw.swe500_in1k |224 |84.454|96.864|94.45 | |mambaoutbasetallrw.swe500_in1k |224 |84.434|96.958|86.48 | |mambaoutbaseshortrw.swe500_in1k |224 |84.362|96.952|88.83 | |mambaout_base.in1k |224 |84.168|96.68 |84.81 | |mambaout_small.in1k |224 |84.086|96.63 |48.49 | |mambaoutsmallrw.swe450in1k |224 |84.024|96.752|48.5 | |mambaout_tiny.in1k |288 |83.448|96.538|26.55 | |mambaout_tiny.in1k |224 |82.736|96.1 |26.55 | |mambaout_kobe.in1k |288 |81.054|95.718|9.14 | |mambaout_kobe.in1k |224 |79.986|94.986|9.14 | |mambaout_femto.in1k |288 |79.848|95.14 |7.3 | |mambaout_femto.in1k |224 |78.87 |94.408|7.3 |
- SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
- vitso400mpatch14siglip378.webliftin1k - 89.42 top-1
- vitso400mpatch14siglipgap378.weblift_in1k - 89.03
- SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
- Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
- convnextzeptormsols.ra4e3600r224in1k - 73.20 top-1 @ 224
- convnextzeptorms.ra4e3600r224_in1k - 72.81 @ 224
Sept 2024
- Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
- Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4convsmall050.e3000r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
- Add MobileNetV3-Large variants trained with MNV4 Small recipe
- mobilenetv3large150d.ra4e3600r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3large100.ra4e3600r224_in1k - 77.16 @ 256, 76.31 @ 224
- Python
Published by rwightman over 1 year ago
instances - Release v1.0.9
Aug 21, 2024
- Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models
| model | top1 | top5 | paramcount | imgsize | | -------------------------------------------------- | ------ | ------ | ----------- | -------- | | vitmediumdpatch16reg4gap384.sbb2e200in12kft_in1k | 87.438 | 98.256 | 64.11 | 384 | | vitmediumdpatch16reg4gap256.sbb2e200in12kft_in1k | 86.608 | 97.934 | 64.11 | 256 | | vitbetwixtpatch16reg4gap384.sbb2e200in12kft_in1k | 86.594 | 98.02 | 60.4 | 384 | | vitbetwixtpatch16reg4gap256.sbb2e200in12kft_in1k | 85.734 | 97.61 | 60.4 | 256 |
- MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w/ MNV4 baseline challenge recipe
| model | top1 | top5 | paramcount | imgsize | |--------------------------------------------------------------------------------------------------------------------------|--------|--------|-------------|----------| | resnet50d.ra4e3600r224_in1k | 81.838 | 95.922 | 25.58 | 288 | | efficientnetb1.ra4e3600r240in1k | 81.440 | 95.700 | 7.79 | 288 | | resnet50d.ra4e3600r224_in1k | 80.952 | 95.384 | 25.58 | 224 | | efficientnetb1.ra4e3600r240in1k | 80.406 | 95.152 | 7.79 | 240 | | mobilenetv1125.ra4e3600r224in1k | 77.600 | 93.804 | 6.27 | 256 | | mobilenetv1125.ra4e3600r224in1k | 76.924 | 93.234 | 6.27 | 224 |
Add SAM2 (HieraDet) backbone arch & weight loading support
Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k
|model |top1 |top5 |paramcount| |---------------------------------|------|------|-----------| |hierasmallabswin256.sbb2e200in12kftin1k |84.912|97.260|35.01 | |hierasmallabswin256.sbb2pde200in12kftin1k |84.560|97.106|35.01 |
Aug 8, 2024
- Add RDNet ('DenseNets Reloaded', https://arxiv.org/abs/2403.19588), thanks Donghyun Kim
- Python
Published by rwightman almost 2 years ago
instances - Release v1.0.8
July 28, 2024
- Add
mobilenet_edgetpu_v2_mweights w/ra4mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256. - Release 1.0.8
July 26, 2024
- More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models
| model |top1 |top1err|top5 |top5err|paramcount|imgsize| |--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| | mobilenetv4convaalarge.e230r448in12kft_in1k|84.99 |15.01 |97.294|2.706 |32.59 |544 | | mobilenetv4convaalarge.e230r384in12kft_in1k|84.772|15.228 |97.344|2.656 |32.59 |480 | | mobilenetv4convaalarge.e230r448in12kft_in1k|84.64 |15.36 |97.114|2.886 |32.59 |448 | | mobilenetv4convaalarge.e230r384in12kft_in1k|84.314|15.686 |97.102|2.898 |32.59 |384 | | mobilenetv4convaalarge.e600r384_in1k |83.824|16.176 |96.734|3.266 |32.59 |480 | | mobilenetv4convaalarge.e600r384_in1k |83.244|16.756 |96.392|3.608 |32.59 |384 | | mobilenetv4hybridmedium.e200r256in12kftin1k|82.99 |17.01 |96.67 |3.33 |11.07 |320 | | mobilenetv4hybridmedium.e200r256in12kftin1k|82.364|17.636 |96.256|3.744 |11.07 |256 |
- Impressive MobileNet-V1 and EfficientNet-B0 baseline challenges (https://huggingface.co/blog/rwightman/mobilenet-baselines)
| model |top1 |top1err|top5 |top5err|paramcount|imgsize|
|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------|
| efficientnetb0.ra4e3600r224in1k |79.364|20.636 |94.754|5.246 |5.29 |256 |
| efficientnetb0.ra4e3600r224in1k |78.584|21.416 |94.338|5.662 |5.29 |224 |
| mobilenetv1100h.ra4e3600r224in1k |76.596|23.404 |93.272|6.728 |5.28 |256 |
| mobilenetv1100.ra4e3600r224in1k |76.094|23.906 |93.004|6.996 |4.23 |256 |
| mobilenetv1100h.ra4e3600r224in1k |75.662|24.338 |92.504|7.496 |5.28 |224 |
| mobilenetv1100.ra4e3600r224in1k |75.382|24.618 |92.312|7.688 |4.23 |224 |
- Prototype of
set_input_size()added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation. - Improved support in swin for different size handling, in addition to
set_input_size,always_partitionandstrict_img_sizeargs have been added to__init__to allow more flexible input size constraints - Fix out of order indices info for intermediate 'Getter' feature wrapper, check out or range indices for same.
- Add several
tiny< .5M param models for testing that are actually trained on ImageNet-1k
|model |top1 |top1err|top5 |top5err|paramcount|imgsize|croppct| |----------------------------|------|--------|------|--------|-----------|--------|--------| |testefficientnet.r160in1k |47.156|52.844 |71.726|28.274 |0.36 |192 |1.0 | |testbyobnet.r160in1k |46.698|53.302 |71.674|28.326 |0.46 |192 |1.0 | |testefficientnet.r160in1k |46.426|53.574 |70.928|29.072 |0.36 |160 |0.875 | |testbyobnet.r160in1k |45.378|54.622 |70.572|29.428 |0.46 |160 |0.875 | |testvit.r160in1k|42.0 |58.0 |68.664|31.336 |0.37 |192 |1.0 | |testvit.r160_in1k|40.822|59.178 |67.212|32.788 |0.37 |160 |0.875 |
- Fix vit reg token init, thanks Promisery
- Other misc fixes
June 24, 2024
- 3 more MobileNetV4 hyrid weights with different MQA weight init scheme
| model |top1 |top1err|top5 |top5err|paramcount|imgsize| |--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| | mobilenetv4hybridlarge.ixe600r384_in1k |84.356|15.644 |96.892 |3.108 |37.76 |448 | | mobilenetv4hybridlarge.ixe600r384_in1k |83.990|16.010 |96.702 |3.298 |37.76 |384 | | mobilenetv4hybridmedium.ixe550r384_in1k |83.394|16.606 |96.760|3.240 |11.07 |448 | | mobilenetv4hybridmedium.ixe550r384_in1k |82.968|17.032 |96.474|3.526 |11.07 |384 | | mobilenetv4hybridmedium.ixe550r256_in1k |82.492|17.508 |96.278|3.722 |11.07 |320 | | mobilenetv4hybridmedium.ixe550r256_in1k |81.446|18.554 |95.704|4.296 |11.07 |256 | * florence2 weight loading in DaViT model
- Python
Published by rwightman almost 2 years ago
instances - Release v1.0.7
June 12, 2024
- MobileNetV4 models and initial set of
timmtrained weights added:
| model |top1 |top1err|top5 |top5err|paramcount|imgsize| |--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| | mobilenetv4hybridlarge.e600r384in1k |84.266|15.734 |96.936 |3.064 |37.76 |448 | | mobilenetv4hybridlarge.e600r384in1k |83.800|16.200 |96.770 |3.230 |37.76 |384 | | mobilenetv4convlarge.e600r384in1k |83.392|16.608 |96.622 |3.378 |32.59 |448 | | mobilenetv4convlarge.e600r384in1k |82.952|17.048 |96.266 |3.734 |32.59 |384 | | mobilenetv4convlarge.e500r256in1k |82.674|17.326 |96.31 |3.69 |32.59 |320 | | mobilenetv4convlarge.e500r256in1k |81.862|18.138 |95.69 |4.31 |32.59 |256 | | mobilenetv4hybridmedium.e500r224in1k |81.276|18.724 |95.742|4.258 |11.07 |256 | | mobilenetv4convmedium.e500r256in1k |80.858|19.142 |95.768|4.232 |9.72 |320 | | mobilenetv4hybridmedium.e500r224in1k |80.442|19.558 |95.38 |4.62 |11.07 |224 | | mobilenetv4convblurmedium.e500r224_in1k |80.142|19.858 |95.298|4.702 |9.72 |256 | | mobilenetv4convmedium.e500r256in1k |79.928|20.072 |95.184|4.816 |9.72 |256 | | mobilenetv4convmedium.e500r224in1k |79.808|20.192 |95.186|4.814 |9.72 |256 | | mobilenetv4convblurmedium.e500r224_in1k |79.438|20.562 |94.932|5.068 |9.72 |224 | | mobilenetv4convmedium.e500r224in1k |79.094|20.906 |94.77 |5.23 |9.72 |224 | | mobilenetv4convsmall.e2400r224in1k |74.616|25.384 |92.072|7.928 |3.77 |256 | | mobilenetv4convsmall.e1200r224in1k |74.292|25.708 |92.116|7.884 |3.77 |256 | | mobilenetv4convsmall.e2400r224in1k |73.756|26.244 |91.422|8.578 |3.77 |224 | | mobilenetv4convsmall.e1200r224in1k |73.454|26.546 |91.34 |8.66 |3.77 |224 |
- Apple MobileCLIP (https://arxiv.org/pdf/2311.17049, FastViT and ViT-B) image tower model support & weights added (part of OpenCLIP support).
- ViTamin (https://arxiv.org/abs/2404.02132) CLIP image tower model & weights added (part of OpenCLIP support).
- OpenAI CLIP Modified ResNet image tower modelling & weight support (via ByobNet). Refactor AttentionPool2d.
- Refactoring & improvements, especially related to classifierreset and numfeatures vs headhiddensize for forwardfeatures() vs prelogits
- Python
Published by rwightman almost 2 years ago
instances - Release v1.0.3
May 14, 2024
- Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
- Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
- Add
normalize=flag for transorms, return non-normalized torch.Tensor with original dytpe (forchug) - Version 1.0.3 release
May 11, 2024
Searching for Better ViT Baselines (For the GPU Poor)weights and vit variants released. Exploring model shapes between Tiny and Base.
| model | top1 | top5 | paramcount | imgsize | | -------------------------------------------------- | ------ | ------ | ----------- | -------- | | vitmediumdpatch16reg4gap256.sbbin12kftin1k | 86.202 | 97.874 | 64.11 | 256 | | vitbetwixtpatch16reg4gap256.sbbin12kftin1k | 85.418 | 97.48 | 60.4 | 256 | | vitmediumdpatch16ropereg1gap256.sbb_in1k | 84.322 | 96.812 | 63.95 | 256 | | vitbetwixtpatch16ropereg4gap256.sbb_in1k | 83.906 | 96.684 | 60.23 | 256 | | vitbasepatch16ropereg1gap256.sbb_in1k | 83.866 | 96.67 | 86.43 | 256 | | vitmediumpatch16ropereg1gap256.sbb_in1k | 83.81 | 96.824 | 38.74 | 256 | | vitbetwixtpatch16reg4gap256.sbbin1k | 83.706 | 96.616 | 60.4 | 256 | | vitbetwixtpatch16reg1gap256.sbbin1k | 83.628 | 96.544 | 60.4 | 256 | | vitmediumpatch16reg4gap256.sbbin1k | 83.47 | 96.622 | 38.88 | 256 | | vitmediumpatch16reg1gap256.sbbin1k | 83.462 | 96.548 | 38.88 | 256 | | vitlittlepatch16reg4gap256.sbbin1k | 82.514 | 96.262 | 22.52 | 256 | | vitweepatch16reg1gap256.sbbin1k | 80.256 | 95.360 | 13.42 | 256 | | vitpweepatch16reg1gap256.sbbin1k | 80.072 | 95.136 | 15.25 | 256 | | vitmediumdpatch16reg4gap256.sbbin12k | N/A | N/A | 64.11 | 256 | | vitbetwixtpatch16reg4gap256.sbbin12k | N/A | N/A | 60.4 | 256 |
- AttentionExtract helper added to extract attention maps from
timmmodels. See example in https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949 forward_intermediates()API refined and added to more models including some ConvNets that have other extraction methods.- 1017 of 1047 model architectures support
features_only=Truefeature extraction. Remaining 34 architectures can be supported but based on priority requests. - Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.
April 11, 2024
- Prepping for a long overdue 1.0 release, things have been stable for a while now.
- Significant feature that's been missing for a while,
features_only=Truesupport for ViT models with flat hidden states or non-std module layouts (so far covering'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*') - Above feature support achieved through a new
forward_intermediates()API that can be used with a feature wrapping module or direclty. ```python model = timm.createmodel('vitbasepatch16224') finalfeat, intermediates = model.forwardintermediates(input) output = model.forwardhead(finalfeat) # pooling + classifier head
print(final_feat.shape) torch.Size([2, 197, 768])
for f in intermediates: print(f.shape) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14]) torch.Size([2, 768, 14, 14])
print(output.shape) torch.Size([2, 1000]) ```
```python model = timm.createmodel('eva02basepatch16clip224', pretrained=True, imgsize=512, featuresonly=True, outindices=(-3, -2,)) output = model(torch.randn(2, 3, 512, 512))
for o in output:
print(o.shape)
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])
```
* TinyCLIP vision tower weights added, thx Thien Tran
- Python
Published by rwightman about 2 years ago
instances - Release v0.9.16
Feb 19, 2024
- Next-ViT models added. Adapted from https://github.com/bytedance/Next-ViT
- HGNet and PP-HGNetV2 models added. Adapted from https://github.com/PaddlePaddle/PaddleClas by SeeFun
- Removed setup.py, moved to pyproject.toml based build supported by PDM
- Add updated model EMA impl using foreach for less overhead
- Support device args in train script for non GPU devices
- Other misc fixes and small additions
- Min supported Python version increased to 3.8
- Release 0.9.16
Jan 8, 2024
Datasets & transform refactoring
* HuggingFace streaming (iterable) dataset support (--dataset hfids:org/dataset)
* Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset
* Tested HF datasets and webdataset wrapper streaming from HF hub with recent timm ImageNet uploads to https://huggingface.co/timm
* Make input & target column/field keys consistent across datasets and pass via args
* Full monochrome support when using e:g: --input-size 1 224 224 or --in-chans 1, sets PIL image conversion appropriately in dataset
* Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project
* Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args
* Allow train without validation set (--val-split '') in train script
* Add --bce-sum (sum over class dim) and --bce-pos-weight (positive weighting) args for training as they're common BCE loss tweaks I was often hard coding
- Python
Published by rwightman over 2 years ago
instances - Release v0.9.12
Nov 23, 2023
- Added EfficientViT-Large models, thanks SeeFun
- Fix Python 3.7 compat, will be dropping support for it soon
- Other misc fixes
- Release 0.9.12
- Python
Published by rwightman over 2 years ago
instances - Release v0.9.11
Nov 20, 2023
- Added significant flexibility for Hugging Face Hub based timm models via
model_argsconfig entry.model_argswill be passed as kwargs through to models on creation.- See example at https://huggingface.co/gaunernst/vitbasepatch161024128.audiomaeas2mft_as20k/blob/main/config.json
- Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035
- Updated imagenet eval and test set csv files with latest models
vision_transformer.pytyping and doc cleanup by Laureηt- 0.9.11 release
- Python
Published by rwightman over 2 years ago
instances - Release v0.9.10
Nov 4
- Patch fix for 0.9.9 to fix FrozenBatchnorm2d import path for old torchvision (~2 years )
Nov 3, 2023
- DFN (Data Filtering Networks) and MetaCLIP ViT weights added
- DINOv2 'register' ViT model weights added
- Add
quickgeluViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient) - Improved typing added to ResNet, MobileNet-v3 thanks to Aryan
- ImageNet-12k fine-tuned (from LAION-2B CLIP)
convnext_xxlarge - 0.9.9 release
- Python
Published by rwightman over 2 years ago
instances - Release v0.9.9
Nov 3, 2023
- DFN (Data Filtering Networks) and MetaCLIP ViT weights added
- DINOv2 'register' ViT model weights added
- Add
quickgeluViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient) - Improved typing added to ResNet, MobileNet-v3 thanks to Aryan
- ImageNet-12k fine-tuned (from LAION-2B CLIP)
convnext_xxlarge - 0.9.9 release
- Python
Published by rwightman over 2 years ago
instances - Release v0.9.8
Oct 20, 2023
- SigLIP image tower weights supported in
vision_transformer.py.- Great potential for fine-tune and downstream feature use.
- Experimental 'register' support in vit models as per Vision Transformers Need Registers
- Updated RepViT with new weight release. Thanks wangao
- Add patch resizing support (on pretrained weight load) to Swin models
- 0.9.8 release
- Python
Published by rwightman over 2 years ago
instances - Release v0.9.7
Small bug fix & extra model from v0.9.6
Sep 1, 2023
- TinyViT added by SeeFun
- Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10
- 0.9.7 release
- Python
Published by rwightman over 2 years ago
instances - Release v0.9.6
Aug 28, 2023
- Add dynamic img size support to models in
vision_transformer.py,vision_transformer_hybrid.py,deit.py, andeva.pyw/o breaking backward compat.- Add
dynamic_img_size=Trueto args at model creation time to allow changing the grid size (interpolate abs and/or ROPE pos embed each forward pass). - Add
dynamic_img_pad=Trueto allow image sizes that aren't divisible by patch size (pad bottom right to patch size each forward pass). - Enabling either dynamic mode will break FX tracing unless PatchEmbed module added as leaf.
- Existing method of resizing position embedding by passing different
img_size(interpolate pretrained embed weights once) on creation still works. - Existing method of changing
patch_size(resize pretrained patch_embed weights once) on creation still works. - Example validation cmd
python validate.py /imagenet --model vit_base_patch16_224 --amp --amp-dtype bfloat16 --img-size 255 --crop-pct 1.0 --model-kwargs dynamic_img_size=True dyamic_img_pad=True
- Add
Aug 25, 2023
- Many new models since last release
- FastViT - https://arxiv.org/abs/2303.14189
- MobileOne - https://arxiv.org/abs/2206.04040
- InceptionNeXt - https://arxiv.org/abs/2303.16900
- RepGhostNet - https://arxiv.org/abs/2211.06088 (thanks https://github.com/ChengpengChen)
- GhostNetV2 - https://arxiv.org/abs/2211.12905 (thanks https://github.com/yehuitang)
- EfficientViT (MSRA) - https://arxiv.org/abs/2305.07027 (thanks https://github.com/seefun)
- EfficientViT (MIT) - https://arxiv.org/abs/2205.14756 (thanks https://github.com/seefun)
- Add
--reparamarg tobenchmark.py,onnx_export.py, andvalidate.pyto trigger layer reparameterization / fusion for models with any one ofreparameterize(),switch_to_deploy()orfuse()- Including FastViT, MobileOne, RepGhostNet, EfficientViT (MSRA), RepViT, RepVGG, and LeViT
- Preparing 0.9.6 'back to school' release
Aug 11, 2023
- Swin, MaxViT, CoAtNet, and BEiT models support resizing of image/window size on creation with adaptation of pretrained weights
- Example validation cmd to test w/ non-square resize
python validate.py /imagenet --model swin_base_patch4_window7_224.ms_in22k_ft_in1k --amp --amp-dtype bfloat16 --input-size 3 256 320 --model-kwargs window_size=8,10 img_size=256,320
- Python
Published by rwightman almost 3 years ago
instances - Release v0.9.5
Minor updates and bug fixes. New ResNeXT w/ highest ImageNet eval I'm aware of in the ResNe(X)t family (seresnextaa201d_32x8d.sw_in12k_ft_in1k_384)
Aug 3, 2023
- Add GluonCV weights for HRNet w18small and w18small_v2. Converted by SeeFun
- Fix
selecsls*model naming regression - Patch and position embedding for ViT/EVA works for bfloat16/float16 weights on load (or activations for on-the-fly resize)
- v0.9.5 release prep
July 27, 2023
- Added timm trained
seresnextaa201d_32x8d.sw_in12k_ft_in1k_384weights (and.sw_in12kpretrain) with 87.3% top-1 on ImageNet-1k, best ImageNet ResNet family model I'm aware of. - RepViT model and weights (https://arxiv.org/abs/2307.09283) added by wangao
- I-JEPA ViT feature weights (no classifier) added by SeeFun
- SAM-ViT (segment anything) feature weights (no classifier) added by SeeFun
- Add support for alternative feat extraction methods and -ve indices to EfficientNet
- Add NAdamW optimizer
- Misc fixes
- Python
Published by rwightman almost 3 years ago
instances - Release v0.9.2
- Fix _hub deprecation pass through import
- Python
Published by rwightman about 3 years ago
instances - Release v0.9.1
The first non pre-release since Oct 2022 with a long list of changes from 0.6.x releases...
May 12, 2023
- Fix Python 3.7 import error re Final[] typing annotation
May 11, 2023
timm0.9 released, transition from 0.8.xdev releases
May 10, 2023
- Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in
timm - DINOv2 vit feature backbone weights added thanks to Leng Yue
- FB MAE vit feature backbone weights added
- OpenCLIP DataComp-XL L/14 feat backbone weights added
- MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
- Experimental
get_intermediate_layersfunction on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome. - Model creation throws error if
pretrained=Trueand no weights exist (instead of continuing with random initialization) - Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
- bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use
bnbprefix, iebnbadam8bit - Misc cleanup and fixes
- Final testing before switching to a 0.9 and bringing
timmout of pre-release state
April 27, 2023
- 97% of
timmmodels uploaded to HF Hub and almost all updated to support multi-weight pretrained configs - Minor cleanup and refactoring of another batch of models as multi-weight added. More fusedattn (F.sdpa) and featuresonly support, and torchscript fixes.
April 21, 2023
- Gradient accumulation support added to train script and tested (
--grad-accum-steps), thanks Taeksang Kim - More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
- Added
--head-init-scaleand--head-init-biasto train.py to scale classiifer head and set fixed bias for fine-tune - Remove all InplaceABN (
inplace_abn) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
April 12, 2023
- Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
- Refactor dropout args for vit and vit-like models, separate droprate into `droprate
(classifier dropout),projdroprate(block mlp / out projections),posdroprate(position embedding drop),attndroprate` (attention dropout). Also add patch dropout (FLIP) to vit and eva models. - fused F.scaleddotproductattention support to more vit models, add env var (TIMMFUSED_ATTN) to control, and config interface to enable/disable
- Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
April 5, 2023
- ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- All past
timmtrained weights added with recipe based tags to differentiate - All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
- Add torchvision v2 recipe weights to existing torchvision originals
- See comparison table in https://huggingface.co/timm/seresnextaa101d32x8d.swin12kftin1k_288#model-comparison
- All past
- New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
resnetaa50d.sw_in12k_ft_in1k- 81.7 @ 224, 82.6 @ 288resnetaa101d.sw_in12k_ft_in1k- 83.5 @ 224, 84.1 @ 288seresnextaa101d_32x8d.sw_in12k_ft_in1k- 86.0 @ 224, 86.5 @ 288seresnextaa101d_32x8d.sw_in12k_ft_in1k_288- 86.5 @ 288, 86.7 @ 320
March 31, 2023
- Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
| model |top1 |top5 |imgsize|paramcount|gmacs |macts | |----------------------------------------------------------------------------------------------------------------------|------|------|--------|-----------|------|------| | convnextxxlarge.cliplaion2bsoupft_in1k |88.612|98.704|256 |846.47 |198.09|124.45| | convnextlargemlp.cliplaion2bsoupftin12kin1k384 |88.312|98.578|384 |200.13 |101.11|126.74| | convnextlargemlp.cliplaion2bsoupftin12kin1k320 |87.968|98.47 |320 |200.13 |70.21 |88.02 | | convnextbase.cliplaion2baugregftin12kin1k384 |87.138|98.212|384 |88.59 |45.21 |84.49 | | convnextbase.cliplaion2baugregftin12k_in1k |86.344|97.97 |256 |88.59 |20.09 |37.55 |
- Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
| model |top1 |top5 |paramcount|imgsize| |----------------------------------------------------|------|------|-----------|--------| | eva02largepatch14448.mimm38mftin22k_in1k |90.054|99.042|305.08 |448 | | eva02largepatch14448.mimin22kftin22kin1k |89.946|99.01 |305.08 |448 | | evagiantpatch14560.m30mftin22kin1k |89.792|98.992|1014.45 |560 | | eva02largepatch14448.mimin22kftin1k |89.626|98.954|305.08 |448 | | eva02largepatch14448.mimm38mftin1k |89.57 |98.918|305.08 |448 | | evagiantpatch14336.m30mftin22kin1k |89.56 |98.956|1013.01 |336 | | evagiantpatch14336.clipftin1k |89.466|98.82 |1013.01 |336 | | evalargepatch14336.in22kftin22kin1k |89.214|98.854|304.53 |336 | | evagiantpatch14224.clipftin1k |88.882|98.678|1012.56 |224 | | eva02basepatch14448.mimin22kftin22kin1k |88.692|98.722|87.12 |448 | | evalargepatch14336.in22kftin1k |88.652|98.722|304.53 |336 | | evalargepatch14196.in22kftin22kin1k |88.592|98.656|304.14 |196 | | eva02basepatch14448.mimin22kftin1k |88.23 |98.564|87.12 |448 | | evalargepatch14196.in22kftin1k |87.934|98.504|304.14 |196 | | eva02smallpatch14336.mimin22kftin1k |85.74 |97.614|22.13 |336 | | eva02tinypatch14336.mimin22kftin1k |80.658|95.524|5.76 |336 |
- Multi-weight and HF hub for DeiT and MLP-Mixer based models
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py,rexnet.py,byobnet.py,resnetv2.py,swin_transformer.py,swin_transformer_v2.py,swin_transformer_v2_cr.py - Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*, and NHWC for all others) and spatial embedding outputs. - FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timmweights:rexnetr_200.sw_in12k_ft_in1k- 82.6 @ 224, 83.2 @ 288rexnetr_300.sw_in12k_ft_in1k- 84.0 @ 224, 84.5 @ 288regnety_120.sw_in12k_ft_in1k- 85.0 @ 224, 85.4 @ 288regnety_160.lion_in12k_ft_in1k- 85.6 @ 224, 86.0 @ 288regnety_160.sw_in12k_ft_in1k- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlargedefault LayerNorm eps to 1e-5 (for CLIP weights, improved stability) - 0.8.15dev0
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320andconvnext_lage_mlp.clip_laion2b_ft_soup_320CLIP image tower weights for features & fine-tune - 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensorcheckpoint support added- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaleddotproductattention support (PyTorch 2.0 only) to `vit
,vit_relpos,coatnet/maxxvit` (to start) - Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
- gradient checkpointing works with
features_only=True
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k- 86.2% @ 256x256convnext_base.clip_laiona_augreg_ft_in1k_384- 86.5% @ 384x384convnext_large_mlp.clip_laion2b_augreg_ft_in1k- 87.3% @ 256x256convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384- 87.9% @ 384x384
- Add DaViT models. Supports
features_only=True. Adapted from https://github.com/dingmyu/davit by Fredo. - Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
- Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
features_only=True. - Minor updates to EfficientFormer.
- Refactor LeViT models to stages, add
features_only=Truesupport to newconvvariants, weight remap required.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
- Move ImageNet meta-data (synsets, indices) from
/resultstotimm/data/_info. - Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in
timm- Update
inference.pyto use, try:python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
- Update
- Ready for 0.8.10 pypi pre-release (final testing).
Jan 20, 2023
Add two convnext 12k -> 1k fine-tunes at 384x384
convnext_tiny.in12k_ft_in1k_384- 85.1 @ 384convnext_small.in12k_ft_in1k_384- 86.2 @ 384
Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for
rwbase MaxViT and CoAtNet 1/2 models
|model |top1 |top5 |samples / sec |Params (M) |GMAC |Act (M)| |------------------------------------------------------------------------------------------------------------------------|----:|----:|--------------:|--------------:|-----:|------:| |maxvitxlargetf512.in21kft_in1k |88.53|98.64| 21.76| 475.77|534.14|1413.22| |maxvitxlargetf384.in21kft_in1k |88.32|98.54| 42.53| 475.32|292.78| 668.76| |maxvitbasetf512.in21kft_in1k |88.20|98.53| 50.87| 119.88|138.02| 703.99| |maxvitlargetf512.in21kft_in1k |88.04|98.40| 36.42| 212.33|244.75| 942.15| |maxvitlargetf384.in21kft_in1k |87.98|98.56| 71.75| 212.03|132.55| 445.84| |maxvitbasetf384.in21kft_in1k |87.92|98.54| 104.71| 119.65| 73.80| 332.90| |maxvitrmlpbaserw384.swin12kft_in1k |87.81|98.37| 106.55| 116.14| 70.97| 318.95| |maxxvitv2rmlpbaserw384.swin12kft_in1k |87.47|98.37| 149.49| 116.09| 72.98| 213.74| |coatnetrmlp2rw384.swin12kft_in1k |87.39|98.31| 160.80| 73.88| 47.69| 209.43| |maxvitrmlpbaserw224.swin12kft_in1k |86.89|98.02| 375.86| 116.14| 23.15| 92.64| |maxxvitv2rmlpbaserw224.swin12kft_in1k |86.64|98.02| 501.03| 116.09| 24.20| 62.77| |maxvitbasetf_512.in1k |86.60|97.92| 50.75| 119.88|138.02| 703.99| |coatnet2rw224.swin12kftin1k |86.57|97.89| 631.88| 73.87| 15.09| 49.22| |maxvitlargetf_512.in1k |86.52|97.88| 36.04| 212.33|244.75| 942.15| |coatnetrmlp2rw224.swin12kft_in1k |86.49|97.90| 620.58| 73.88| 15.18| 54.78| |maxvitbasetf_384.in1k |86.29|97.80| 101.09| 119.65| 73.80| 332.90| |maxvitlargetf_384.in1k |86.23|97.69| 70.56| 212.03|132.55| 445.84| |maxvitsmalltf_512.in1k |86.10|97.76| 88.63| 69.13| 67.26| 383.77| |maxvittinytf_512.in1k |85.67|97.58| 144.25| 31.05| 33.49| 257.59| |maxvitsmalltf_384.in1k |85.54|97.46| 188.35| 69.02| 35.87| 183.65| |maxvittinytf_384.in1k |85.11|97.38| 293.46| 30.98| 17.53| 123.42| |maxvitlargetf_224.in1k |84.93|96.97| 247.71| 211.79| 43.68| 127.35| |coatnetrmlp1rw2224.swin12kft_in1k |84.90|96.96| 1025.45| 41.72| 8.11| 40.13| |maxvitbasetf_224.in1k |84.85|96.99| 358.25| 119.47| 24.04| 95.01| |maxxvitrmlpsmallrw256.sw_in1k |84.63|97.06| 575.53| 66.01| 14.67| 58.38| |coatnetrmlp2rw224.sw_in1k |84.61|96.74| 625.81| 73.88| 15.18| 54.78| |maxvitrmlpsmallrw224.sw_in1k |84.49|96.76| 693.82| 64.90| 10.75| 49.30| |maxvitsmalltf_224.in1k |84.43|96.83| 647.96| 68.93| 11.66| 53.17| |maxvitrmlptinyrw256.sw_in1k |84.23|96.78| 807.21| 29.15| 6.77| 46.92| |coatnet1rw224.swin1k |83.62|96.38| 989.59| 41.72| 8.04| 34.60| |maxvittinyrw224.swin1k |83.50|96.50| 1100.53| 29.06| 5.11| 33.11| |maxvittinytf_224.in1k |83.41|96.59| 1004.94| 30.92| 5.60| 35.78| |coatnetrmlp1rw224.sw_in1k |83.36|96.45| 1093.03| 41.69| 7.85| 35.47| |maxxvitv2nanorw256.swin1k |83.11|96.33| 1276.88| 23.70| 6.26| 23.05| |maxxvitrmlpnanorw256.sw_in1k |83.03|96.34| 1341.24| 16.78| 4.37| 26.05| |maxvitrmlpnanorw256.sw_in1k |82.96|96.26| 1283.24| 15.50| 4.47| 31.92| |maxvitnanorw256.swin1k |82.93|96.23| 1218.17| 15.45| 4.46| 30.28| |coatnetbn0rw224.sw_in1k |82.39|96.19| 1600.14| 27.44| 4.67| 22.04| |coatnet0rw224.swin1k |82.39|95.84| 1831.21| 27.44| 4.43| 18.73| |coatnetrmlpnanorw224.sw_in1k |82.05|95.87| 2109.09| 15.15| 2.62| 20.34| |coatnextnanorw224.swin1k |81.95|95.92| 2525.52| 14.70| 2.47| 12.80| |coatnetnanorw224.swin1k |81.70|95.64| 2344.52| 15.14| 2.41| 15.41| |maxvitrmlppicorw256.sw_in1k |80.53|95.21| 1594.71| 7.52| 1.85| 24.86|
Jan 11, 2023
- Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT
.in12ktags)convnext_nano.in12k_ft_in1k- 82.3 @ 224, 82.9 @ 288 (previously released)convnext_tiny.in12k_ft_in1k- 84.2 @ 224, 84.5 @ 288convnext_small.in12k_ft_in1k- 85.2 @ 224, 85.3 @ 288
Jan 6, 2023
- Finally got around to adding
--model-kwargsand--opt-kwargsto scripts to pass through rare args directly to model classes from cmd linetrain.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silutrain.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
- Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.
Jan 5, 2023
- ConvNeXt-V2 models and weights added to existing
convnext.py- Paper: ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
- Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)
Dec 23, 2022 🎄☃
- Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
- Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
- More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
- More ImageNet-12k (subset of 22k) pretrain models popping up:
efficientnet_b5.in12k_ft_in1k- 85.9 @ 448x448vit_medium_patch16_gap_384.in12k_ft_in1k- 85.5 @ 384x384vit_medium_patch16_gap_256.in12k_ft_in1k- 84.5 @ 256x256convnext_nano.in12k_ft_in1k- 82.9 @ 288x288
Dec 8, 2022
- Add 'EVA l' to
vision_transformer.py, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)- original source: https://github.com/baaivision/EVA
| model | top1 | paramcount | gmac | macts | hub | |:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------| | evalargepatch14336.in22kftin22kin1k | 89.2 | 304.5 | 191.1 | 270.2 | link | | evalargepatch14336.in22kftin1k | 88.7 | 304.5 | 191.1 | 270.2 | link | | evalargepatch14196.in22kftin22kin1k | 88.6 | 304.1 | 61.6 | 63.5 | link | | evalargepatch14196.in22kft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | link |
Dec 6, 2022
- Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to
beit.py.- original source: https://github.com/baaivision/EVA
- paper: https://arxiv.org/abs/2211.07636
| model | top1 | paramcount | gmac | macts | hub | |:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------| | evagiantpatch14560.m30mftin22kin1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | link | | evagiantpatch14336.m30mftin22kin1k | 89.6 | 1013 | 620.6 | 550.7 | link | | evagiantpatch14336.clipftin1k | 89.4 | 1013 | 620.6 | 550.7 | link | | evagiantpatch14224.clipft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | link |
Dec 5, 2022
- Pre-release (
0.8.0dev0) of multi-weight support (model_arch.pretrained_tag). Install withpip install --pre timm- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
- Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use
--torchcompileargument - Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
- Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
| model | top1 | paramcount | gmac | macts | hub | |:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------| | vithugepatch14clip336.laion2bftin12kin1k | 88.6 | 632.5 | 391 | 407.5 | link | | vitlargepatch14clip336.openaiftin12kin1k | 88.3 | 304.5 | 191.1 | 270.2 | link | | vithugepatch14clip224.laion2bftin12kin1k | 88.2 | 632 | 167.4 | 139.4 | link | | vitlargepatch14clip336.laion2bftin12kin1k | 88.2 | 304.5 | 191.1 | 270.2 | link | | vitlargepatch14clip224.openaiftin12kin1k | 88.2 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip224.laion2bftin12kin1k | 87.9 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip224.openaiftin1k | 87.9 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip336.laion2bftin1k | 87.9 | 304.5 | 191.1 | 270.2 | link | | vithugepatch14clip224.laion2bftin1k | 87.6 | 632 | 167.4 | 139.4 | link | | vitlargepatch14clip224.laion2bftin1k | 87.3 | 304.2 | 81.1 | 88.8 | link | | vitbasepatch16clip384.laion2bftin12kin1k | 87.2 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.openaiftin12kin1k | 87 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.laion2bftin1k | 86.6 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.openaiftin1k | 86.2 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip224.laion2bftin12kin1k | 86.2 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch16clip224.openaiftin12kin1k | 85.9 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip448.laion2bftin12kin1k | 85.8 | 88.3 | 17.9 | 23.9 | link | | vitbasepatch16clip224.laion2bftin1k | 85.5 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip384.laion2bftin12kin1k | 85.4 | 88.3 | 13.1 | 16.5 | link | | vitbasepatch16clip224.openaiftin1k | 85.3 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip384.openaiftin12kin1k | 85.2 | 88.3 | 13.1 | 16.5 | link | | vitbasepatch32clip224.laion2bftin12kin1k | 83.3 | 88.2 | 4.4 | 5 | link | | vitbasepatch32clip224.laion2bftin1k | 82.6 | 88.2 | 4.4 | 5 | link | | vitbasepatch32clip224.openaift_in1k | 81.9 | 88.2 | 4.4 | 5 | link |
- Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
| model | top1 | paramcount | gmac | macts | hub | |:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------| | maxvitxlargetf512.in21kftin1k | 88.5 | 475.8 | 534.1 | 1413.2 | link | | maxvitxlargetf384.in21kftin1k | 88.3 | 475.3 | 292.8 | 668.8 | link | | maxvitbasetf512.in21kftin1k | 88.2 | 119.9 | 138 | 704 | link | | maxvitlargetf512.in21kftin1k | 88 | 212.3 | 244.8 | 942.2 | link | | maxvitlargetf384.in21kftin1k | 88 | 212 | 132.6 | 445.8 | link | | maxvitbasetf384.in21kftin1k | 87.9 | 119.6 | 73.8 | 332.9 | link | | maxvitbasetf512.in1k | 86.6 | 119.9 | 138 | 704 | link | | maxvitlargetf512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | link | | maxvitbasetf384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | link | | maxvitlargetf384.in1k | 86.2 | 212 | 132.6 | 445.8 | link | | maxvitsmalltf512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | link | | maxvittinytf512.in1k | 85.7 | 31 | 33.5 | 257.6 | link | | maxvitsmalltf384.in1k | 85.5 | 69 | 35.9 | 183.6 | link | | maxvittinytf384.in1k | 85.1 | 31 | 17.5 | 123.4 | link | | maxvitlargetf224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | link | | maxvitbasetf224.in1k | 84.9 | 119.5 | 24 | 95 | link | | maxvitsmalltf224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | link | | maxvittinytf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | link |
Oct 15, 2022
- Train and validation script enhancements
- Non-GPU (ie CPU) device support
- SLURM compatibility for train script
- HF datasets support (via ReaderHfds)
- TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
- in_chans !=3 support for scripts / loader
- Adan optimizer
- Can enable per-step LR scheduling via args
- Dataset 'parsers' renamed to 'readers', more descriptive of purpose
- AMP args changed, APEX via
--amp-impl apex, bfloat16 supportedf via--amp-dtype bfloat16 - main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
- master -> main branch rename
- Python
Published by rwightman about 3 years ago
instances - Release v0.9.0
First non pre-release in a loooong while, changelog from 0.6.x below...
May 11, 2023
timm0.9 released, transition from 0.8.xdev releases
May 10, 2023
- Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in
timm - DINOv2 vit feature backbone weights added thanks to Leng Yue
- FB MAE vit feature backbone weights added
- OpenCLIP DataComp-XL L/14 feat backbone weights added
- MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
- Experimental
get_intermediate_layersfunction on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome. - Model creation throws error if
pretrained=Trueand no weights exist (instead of continuing with random initialization) - Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
- bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use
bnbprefix, iebnbadam8bit - Misc cleanup and fixes
- Final testing before switching to a 0.9 and bringing
timmout of pre-release state
April 27, 2023
- 97% of
timmmodels uploaded to HF Hub and almost all updated to support multi-weight pretrained configs - Minor cleanup and refactoring of another batch of models as multi-weight added. More fusedattn (F.sdpa) and featuresonly support, and torchscript fixes.
April 21, 2023
- Gradient accumulation support added to train script and tested (
--grad-accum-steps), thanks Taeksang Kim - More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
- Added
--head-init-scaleand--head-init-biasto train.py to scale classiifer head and set fixed bias for fine-tune - Remove all InplaceABN (
inplace_abn) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
April 12, 2023
- Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
- Refactor dropout args for vit and vit-like models, separate droprate into `droprate
(classifier dropout),projdroprate(block mlp / out projections),posdroprate(position embedding drop),attndroprate` (attention dropout). Also add patch dropout (FLIP) to vit and eva models. - fused F.scaleddotproductattention support to more vit models, add env var (TIMMFUSED_ATTN) to control, and config interface to enable/disable
- Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
April 5, 2023
- ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- All past
timmtrained weights added with recipe based tags to differentiate - All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
- Add torchvision v2 recipe weights to existing torchvision originals
- See comparison table in https://huggingface.co/timm/seresnextaa101d32x8d.swin12kftin1k_288#model-comparison
- All past
- New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
resnetaa50d.sw_in12k_ft_in1k- 81.7 @ 224, 82.6 @ 288resnetaa101d.sw_in12k_ft_in1k- 83.5 @ 224, 84.1 @ 288seresnextaa101d_32x8d.sw_in12k_ft_in1k- 86.0 @ 224, 86.5 @ 288seresnextaa101d_32x8d.sw_in12k_ft_in1k_288- 86.5 @ 288, 86.7 @ 320
March 31, 2023
- Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
| model |top1 |top5 |imgsize|paramcount|gmacs |macts | |----------------------------------------------------------------------------------------------------------------------|------|------|--------|-----------|------|------| | convnextxxlarge.cliplaion2bsoupft_in1k |88.612|98.704|256 |846.47 |198.09|124.45| | convnextlargemlp.cliplaion2bsoupftin12kin1k384 |88.312|98.578|384 |200.13 |101.11|126.74| | convnextlargemlp.cliplaion2bsoupftin12kin1k320 |87.968|98.47 |320 |200.13 |70.21 |88.02 | | convnextbase.cliplaion2baugregftin12kin1k384 |87.138|98.212|384 |88.59 |45.21 |84.49 | | convnextbase.cliplaion2baugregftin12k_in1k |86.344|97.97 |256 |88.59 |20.09 |37.55 |
- Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
| model |top1 |top5 |paramcount|imgsize| |----------------------------------------------------|------|------|-----------|--------| | eva02largepatch14448.mimm38mftin22k_in1k |90.054|99.042|305.08 |448 | | eva02largepatch14448.mimin22kftin22kin1k |89.946|99.01 |305.08 |448 | | evagiantpatch14560.m30mftin22kin1k |89.792|98.992|1014.45 |560 | | eva02largepatch14448.mimin22kftin1k |89.626|98.954|305.08 |448 | | eva02largepatch14448.mimm38mftin1k |89.57 |98.918|305.08 |448 | | evagiantpatch14336.m30mftin22kin1k |89.56 |98.956|1013.01 |336 | | evagiantpatch14336.clipftin1k |89.466|98.82 |1013.01 |336 | | evalargepatch14336.in22kftin22kin1k |89.214|98.854|304.53 |336 | | evagiantpatch14224.clipftin1k |88.882|98.678|1012.56 |224 | | eva02basepatch14448.mimin22kftin22kin1k |88.692|98.722|87.12 |448 | | evalargepatch14336.in22kftin1k |88.652|98.722|304.53 |336 | | evalargepatch14196.in22kftin22kin1k |88.592|98.656|304.14 |196 | | eva02basepatch14448.mimin22kftin1k |88.23 |98.564|87.12 |448 | | evalargepatch14196.in22kftin1k |87.934|98.504|304.14 |196 | | eva02smallpatch14336.mimin22kftin1k |85.74 |97.614|22.13 |336 | | eva02tinypatch14336.mimin22kftin1k |80.658|95.524|5.76 |336 |
- Multi-weight and HF hub for DeiT and MLP-Mixer based models
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py,rexnet.py,byobnet.py,resnetv2.py,swin_transformer.py,swin_transformer_v2.py,swin_transformer_v2_cr.py - Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*, and NHWC for all others) and spatial embedding outputs. - FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timmweights:rexnetr_200.sw_in12k_ft_in1k- 82.6 @ 224, 83.2 @ 288rexnetr_300.sw_in12k_ft_in1k- 84.0 @ 224, 84.5 @ 288regnety_120.sw_in12k_ft_in1k- 85.0 @ 224, 85.4 @ 288regnety_160.lion_in12k_ft_in1k- 85.6 @ 224, 86.0 @ 288regnety_160.sw_in12k_ft_in1k- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlargedefault LayerNorm eps to 1e-5 (for CLIP weights, improved stability) - 0.8.15dev0
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320andconvnext_lage_mlp.clip_laion2b_ft_soup_320CLIP image tower weights for features & fine-tune - 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensorcheckpoint support added- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaleddotproductattention support (PyTorch 2.0 only) to `vit
,vit_relpos,coatnet/maxxvit` (to start) - Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
- gradient checkpointing works with
features_only=True
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k- 86.2% @ 256x256convnext_base.clip_laiona_augreg_ft_in1k_384- 86.5% @ 384x384convnext_large_mlp.clip_laion2b_augreg_ft_in1k- 87.3% @ 256x256convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384- 87.9% @ 384x384
- Add DaViT models. Supports
features_only=True. Adapted from https://github.com/dingmyu/davit by Fredo. - Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
- Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
features_only=True. - Minor updates to EfficientFormer.
- Refactor LeViT models to stages, add
features_only=Truesupport to newconvvariants, weight remap required.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
- Move ImageNet meta-data (synsets, indices) from
/resultstotimm/data/_info. - Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in
timm- Update
inference.pyto use, try:python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
- Update
- Ready for 0.8.10 pypi pre-release (final testing).
Jan 20, 2023
Add two convnext 12k -> 1k fine-tunes at 384x384
convnext_tiny.in12k_ft_in1k_384- 85.1 @ 384convnext_small.in12k_ft_in1k_384- 86.2 @ 384
Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for
rwbase MaxViT and CoAtNet 1/2 models
|model |top1 |top5 |samples / sec |Params (M) |GMAC |Act (M)| |------------------------------------------------------------------------------------------------------------------------|----:|----:|--------------:|--------------:|-----:|------:| |maxvitxlargetf512.in21kft_in1k |88.53|98.64| 21.76| 475.77|534.14|1413.22| |maxvitxlargetf384.in21kft_in1k |88.32|98.54| 42.53| 475.32|292.78| 668.76| |maxvitbasetf512.in21kft_in1k |88.20|98.53| 50.87| 119.88|138.02| 703.99| |maxvitlargetf512.in21kft_in1k |88.04|98.40| 36.42| 212.33|244.75| 942.15| |maxvitlargetf384.in21kft_in1k |87.98|98.56| 71.75| 212.03|132.55| 445.84| |maxvitbasetf384.in21kft_in1k |87.92|98.54| 104.71| 119.65| 73.80| 332.90| |maxvitrmlpbaserw384.swin12kft_in1k |87.81|98.37| 106.55| 116.14| 70.97| 318.95| |maxxvitv2rmlpbaserw384.swin12kft_in1k |87.47|98.37| 149.49| 116.09| 72.98| 213.74| |coatnetrmlp2rw384.swin12kft_in1k |87.39|98.31| 160.80| 73.88| 47.69| 209.43| |maxvitrmlpbaserw224.swin12kft_in1k |86.89|98.02| 375.86| 116.14| 23.15| 92.64| |maxxvitv2rmlpbaserw224.swin12kft_in1k |86.64|98.02| 501.03| 116.09| 24.20| 62.77| |maxvitbasetf_512.in1k |86.60|97.92| 50.75| 119.88|138.02| 703.99| |coatnet2rw224.swin12kftin1k |86.57|97.89| 631.88| 73.87| 15.09| 49.22| |maxvitlargetf_512.in1k |86.52|97.88| 36.04| 212.33|244.75| 942.15| |coatnetrmlp2rw224.swin12kft_in1k |86.49|97.90| 620.58| 73.88| 15.18| 54.78| |maxvitbasetf_384.in1k |86.29|97.80| 101.09| 119.65| 73.80| 332.90| |maxvitlargetf_384.in1k |86.23|97.69| 70.56| 212.03|132.55| 445.84| |maxvitsmalltf_512.in1k |86.10|97.76| 88.63| 69.13| 67.26| 383.77| |maxvittinytf_512.in1k |85.67|97.58| 144.25| 31.05| 33.49| 257.59| |maxvitsmalltf_384.in1k |85.54|97.46| 188.35| 69.02| 35.87| 183.65| |maxvittinytf_384.in1k |85.11|97.38| 293.46| 30.98| 17.53| 123.42| |maxvitlargetf_224.in1k |84.93|96.97| 247.71| 211.79| 43.68| 127.35| |coatnetrmlp1rw2224.swin12kft_in1k |84.90|96.96| 1025.45| 41.72| 8.11| 40.13| |maxvitbasetf_224.in1k |84.85|96.99| 358.25| 119.47| 24.04| 95.01| |maxxvitrmlpsmallrw256.sw_in1k |84.63|97.06| 575.53| 66.01| 14.67| 58.38| |coatnetrmlp2rw224.sw_in1k |84.61|96.74| 625.81| 73.88| 15.18| 54.78| |maxvitrmlpsmallrw224.sw_in1k |84.49|96.76| 693.82| 64.90| 10.75| 49.30| |maxvitsmalltf_224.in1k |84.43|96.83| 647.96| 68.93| 11.66| 53.17| |maxvitrmlptinyrw256.sw_in1k |84.23|96.78| 807.21| 29.15| 6.77| 46.92| |coatnet1rw224.swin1k |83.62|96.38| 989.59| 41.72| 8.04| 34.60| |maxvittinyrw224.swin1k |83.50|96.50| 1100.53| 29.06| 5.11| 33.11| |maxvittinytf_224.in1k |83.41|96.59| 1004.94| 30.92| 5.60| 35.78| |coatnetrmlp1rw224.sw_in1k |83.36|96.45| 1093.03| 41.69| 7.85| 35.47| |maxxvitv2nanorw256.swin1k |83.11|96.33| 1276.88| 23.70| 6.26| 23.05| |maxxvitrmlpnanorw256.sw_in1k |83.03|96.34| 1341.24| 16.78| 4.37| 26.05| |maxvitrmlpnanorw256.sw_in1k |82.96|96.26| 1283.24| 15.50| 4.47| 31.92| |maxvitnanorw256.swin1k |82.93|96.23| 1218.17| 15.45| 4.46| 30.28| |coatnetbn0rw224.sw_in1k |82.39|96.19| 1600.14| 27.44| 4.67| 22.04| |coatnet0rw224.swin1k |82.39|95.84| 1831.21| 27.44| 4.43| 18.73| |coatnetrmlpnanorw224.sw_in1k |82.05|95.87| 2109.09| 15.15| 2.62| 20.34| |coatnextnanorw224.swin1k |81.95|95.92| 2525.52| 14.70| 2.47| 12.80| |coatnetnanorw224.swin1k |81.70|95.64| 2344.52| 15.14| 2.41| 15.41| |maxvitrmlppicorw256.sw_in1k |80.53|95.21| 1594.71| 7.52| 1.85| 24.86|
Jan 11, 2023
- Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT
.in12ktags)convnext_nano.in12k_ft_in1k- 82.3 @ 224, 82.9 @ 288 (previously released)convnext_tiny.in12k_ft_in1k- 84.2 @ 224, 84.5 @ 288convnext_small.in12k_ft_in1k- 85.2 @ 224, 85.3 @ 288
Jan 6, 2023
- Finally got around to adding
--model-kwargsand--opt-kwargsto scripts to pass through rare args directly to model classes from cmd linetrain.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silutrain.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
- Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.
Jan 5, 2023
- ConvNeXt-V2 models and weights added to existing
convnext.py- Paper: ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
- Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)
Dec 23, 2022 🎄☃
- Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
- Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
- More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
- More ImageNet-12k (subset of 22k) pretrain models popping up:
efficientnet_b5.in12k_ft_in1k- 85.9 @ 448x448vit_medium_patch16_gap_384.in12k_ft_in1k- 85.5 @ 384x384vit_medium_patch16_gap_256.in12k_ft_in1k- 84.5 @ 256x256convnext_nano.in12k_ft_in1k- 82.9 @ 288x288
Dec 8, 2022
- Add 'EVA l' to
vision_transformer.py, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)- original source: https://github.com/baaivision/EVA
| model | top1 | paramcount | gmac | macts | hub | |:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------| | evalargepatch14336.in22kftin22kin1k | 89.2 | 304.5 | 191.1 | 270.2 | link | | evalargepatch14336.in22kftin1k | 88.7 | 304.5 | 191.1 | 270.2 | link | | evalargepatch14196.in22kftin22kin1k | 88.6 | 304.1 | 61.6 | 63.5 | link | | evalargepatch14196.in22kft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | link |
Dec 6, 2022
- Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to
beit.py.- original source: https://github.com/baaivision/EVA
- paper: https://arxiv.org/abs/2211.07636
| model | top1 | paramcount | gmac | macts | hub | |:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------| | evagiantpatch14560.m30mftin22kin1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | link | | evagiantpatch14336.m30mftin22kin1k | 89.6 | 1013 | 620.6 | 550.7 | link | | evagiantpatch14336.clipftin1k | 89.4 | 1013 | 620.6 | 550.7 | link | | evagiantpatch14224.clipft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | link |
Dec 5, 2022
- Pre-release (
0.8.0dev0) of multi-weight support (model_arch.pretrained_tag). Install withpip install --pre timm- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
- Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use
--torchcompileargument - Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
- Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
| model | top1 | paramcount | gmac | macts | hub | |:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------| | vithugepatch14clip336.laion2bftin12kin1k | 88.6 | 632.5 | 391 | 407.5 | link | | vitlargepatch14clip336.openaiftin12kin1k | 88.3 | 304.5 | 191.1 | 270.2 | link | | vithugepatch14clip224.laion2bftin12kin1k | 88.2 | 632 | 167.4 | 139.4 | link | | vitlargepatch14clip336.laion2bftin12kin1k | 88.2 | 304.5 | 191.1 | 270.2 | link | | vitlargepatch14clip224.openaiftin12kin1k | 88.2 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip224.laion2bftin12kin1k | 87.9 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip224.openaiftin1k | 87.9 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip336.laion2bftin1k | 87.9 | 304.5 | 191.1 | 270.2 | link | | vithugepatch14clip224.laion2bftin1k | 87.6 | 632 | 167.4 | 139.4 | link | | vitlargepatch14clip224.laion2bftin1k | 87.3 | 304.2 | 81.1 | 88.8 | link | | vitbasepatch16clip384.laion2bftin12kin1k | 87.2 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.openaiftin12kin1k | 87 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.laion2bftin1k | 86.6 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.openaiftin1k | 86.2 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip224.laion2bftin12kin1k | 86.2 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch16clip224.openaiftin12kin1k | 85.9 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip448.laion2bftin12kin1k | 85.8 | 88.3 | 17.9 | 23.9 | link | | vitbasepatch16clip224.laion2bftin1k | 85.5 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip384.laion2bftin12kin1k | 85.4 | 88.3 | 13.1 | 16.5 | link | | vitbasepatch16clip224.openaiftin1k | 85.3 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip384.openaiftin12kin1k | 85.2 | 88.3 | 13.1 | 16.5 | link | | vitbasepatch32clip224.laion2bftin12kin1k | 83.3 | 88.2 | 4.4 | 5 | link | | vitbasepatch32clip224.laion2bftin1k | 82.6 | 88.2 | 4.4 | 5 | link | | vitbasepatch32clip224.openaift_in1k | 81.9 | 88.2 | 4.4 | 5 | link |
- Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
| model | top1 | paramcount | gmac | macts | hub | |:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------| | maxvitxlargetf512.in21kftin1k | 88.5 | 475.8 | 534.1 | 1413.2 | link | | maxvitxlargetf384.in21kftin1k | 88.3 | 475.3 | 292.8 | 668.8 | link | | maxvitbasetf512.in21kftin1k | 88.2 | 119.9 | 138 | 704 | link | | maxvitlargetf512.in21kftin1k | 88 | 212.3 | 244.8 | 942.2 | link | | maxvitlargetf384.in21kftin1k | 88 | 212 | 132.6 | 445.8 | link | | maxvitbasetf384.in21kftin1k | 87.9 | 119.6 | 73.8 | 332.9 | link | | maxvitbasetf512.in1k | 86.6 | 119.9 | 138 | 704 | link | | maxvitlargetf512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | link | | maxvitbasetf384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | link | | maxvitlargetf384.in1k | 86.2 | 212 | 132.6 | 445.8 | link | | maxvitsmalltf512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | link | | maxvittinytf512.in1k | 85.7 | 31 | 33.5 | 257.6 | link | | maxvitsmalltf384.in1k | 85.5 | 69 | 35.9 | 183.6 | link | | maxvittinytf384.in1k | 85.1 | 31 | 17.5 | 123.4 | link | | maxvitlargetf224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | link | | maxvitbasetf224.in1k | 84.9 | 119.5 | 24 | 95 | link | | maxvitsmalltf224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | link | | maxvittinytf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | link |
Oct 15, 2022
- Train and validation script enhancements
- Non-GPU (ie CPU) device support
- SLURM compatibility for train script
- HF datasets support (via ReaderHfds)
- TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
- in_chans !=3 support for scripts / loader
- Adan optimizer
- Can enable per-step LR scheduling via args
- Dataset 'parsers' renamed to 'readers', more descriptive of purpose
- AMP args changed, APEX via
--amp-impl apex, bfloat16 supportedf via--amp-dtype bfloat16 - main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
- master -> main branch rename
- Python
Published by rwightman about 3 years ago
instances - Release v0.6.13
Release from 0.6.x stable branch with fix for Python 3.11. NOTE original 0.6.13 release tag was against wrong branch.
- Python
Published by rwightman about 3 years ago
instances - Release v0.8.17dev0
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py,rexnet.py,byobnet.py,resnetv2.py,swin_transformer.py,swin_transformer_v2.py,swin_transformer_v2_cr.py - Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*, and NHWC for all others) and spatial embedding outputs. - FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timmweights:rexnetr_200.sw_in12k_ft_in1k- 82.6 @ 224, 83.2 @ 288rexnetr_300.sw_in12k_ft_in1k- 84.0 @ 224, 84.5 @ 288regnety_120.sw_in12k_ft_in1k- 85.0 @ 224, 85.4 @ 288regnety_160.lion_in12k_ft_in1k- 85.6 @ 224, 86.0 @ 288regnety_160.sw_in12k_ft_in1k- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlargedefault LayerNorm eps to 1e-5 (for CLIP weights, improved stability) - 0.8.15dev0
- Python
Published by rwightman about 3 years ago
instances - v0.8.13dev0 Release
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320andconvnext_lage_mlp.clip_laion2b_ft_soup_320CLIP image tower weights for features & fine-tune - 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensorcheckpoint support added- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaleddotproductattention support (PyTorch 2.0 only) to `vit
,vit_relpos,coatnet/maxxvit` (to start) - Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
- Python
Published by rwightman over 3 years ago
instances - v0.8.10dev0 Release
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k- 86.2% @ 256x256convnext_base.clip_laiona_augreg_ft_in1k_384- 86.5% @ 384x384convnext_large_mlp.clip_laion2b_augreg_ft_in1k- 87.3% @ 256x256convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384- 87.9% @ 384x384
- Add DaViT models. Supports
features_only=True. Adapted from https://github.com/dingmyu/davit by Fredo. - Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
- Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
features_only=True. - Minor updates to EfficientFormer.
- Refactor LeViT models to stages, add
features_only=Truesupport to newconvvariants, weight remap required.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
- Move ImageNet meta-data (synsets, indices) from
/resultstotimm/data/_info. - Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in
timm- Update
inference.pyto use, try:python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
- Update
- Ready for 0.8.10 pypi pre-release (final testing).
Jan 20, 2023
Add two convnext 12k -> 1k fine-tunes at 384x384
convnext_tiny.in12k_ft_in1k_384- 85.1 @ 384convnext_small.in12k_ft_in1k_384- 86.2 @ 384
Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for
rwbase MaxViT and CoAtNet 1/2 models
|model |top1 |top5 |samples / sec |Params (M) |GMAC |Act (M)| |------------------------------------------------------------------------------------------------------------------------|----:|----:|--------------:|--------------:|-----:|------:| |maxvitxlargetf512.in21kft_in1k |88.53|98.64| 21.76| 475.77|534.14|1413.22| |maxvitxlargetf384.in21kft_in1k |88.32|98.54| 42.53| 475.32|292.78| 668.76| |maxvitbasetf512.in21kft_in1k |88.20|98.53| 50.87| 119.88|138.02| 703.99| |maxvitlargetf512.in21kft_in1k |88.04|98.40| 36.42| 212.33|244.75| 942.15| |maxvitlargetf384.in21kft_in1k |87.98|98.56| 71.75| 212.03|132.55| 445.84| |maxvitbasetf384.in21kft_in1k |87.92|98.54| 104.71| 119.65| 73.80| 332.90| |maxvitrmlpbaserw384.swin12kft_in1k |87.81|98.37| 106.55| 116.14| 70.97| 318.95| |maxxvitv2rmlpbaserw384.swin12kft_in1k |87.47|98.37| 149.49| 116.09| 72.98| 213.74| |coatnetrmlp2rw384.swin12kft_in1k |87.39|98.31| 160.80| 73.88| 47.69| 209.43| |maxvitrmlpbaserw224.swin12kft_in1k |86.89|98.02| 375.86| 116.14| 23.15| 92.64| |maxxvitv2rmlpbaserw224.swin12kft_in1k |86.64|98.02| 501.03| 116.09| 24.20| 62.77| |maxvitbasetf_512.in1k |86.60|97.92| 50.75| 119.88|138.02| 703.99| |coatnet2rw224.swin12kftin1k |86.57|97.89| 631.88| 73.87| 15.09| 49.22| |maxvitlargetf_512.in1k |86.52|97.88| 36.04| 212.33|244.75| 942.15| |coatnetrmlp2rw224.swin12kft_in1k |86.49|97.90| 620.58| 73.88| 15.18| 54.78| |maxvitbasetf_384.in1k |86.29|97.80| 101.09| 119.65| 73.80| 332.90| |maxvitlargetf_384.in1k |86.23|97.69| 70.56| 212.03|132.55| 445.84| |maxvitsmalltf_512.in1k |86.10|97.76| 88.63| 69.13| 67.26| 383.77| |maxvittinytf_512.in1k |85.67|97.58| 144.25| 31.05| 33.49| 257.59| |maxvitsmalltf_384.in1k |85.54|97.46| 188.35| 69.02| 35.87| 183.65| |maxvittinytf_384.in1k |85.11|97.38| 293.46| 30.98| 17.53| 123.42| |maxvitlargetf_224.in1k |84.93|96.97| 247.71| 211.79| 43.68| 127.35| |coatnetrmlp1rw2224.swin12kft_in1k |84.90|96.96| 1025.45| 41.72| 8.11| 40.13| |maxvitbasetf_224.in1k |84.85|96.99| 358.25| 119.47| 24.04| 95.01| |maxxvitrmlpsmallrw256.sw_in1k |84.63|97.06| 575.53| 66.01| 14.67| 58.38| |coatnetrmlp2rw224.sw_in1k |84.61|96.74| 625.81| 73.88| 15.18| 54.78| |maxvitrmlpsmallrw224.sw_in1k |84.49|96.76| 693.82| 64.90| 10.75| 49.30| |maxvitsmalltf_224.in1k |84.43|96.83| 647.96| 68.93| 11.66| 53.17| |maxvitrmlptinyrw256.sw_in1k |84.23|96.78| 807.21| 29.15| 6.77| 46.92| |coatnet1rw224.swin1k |83.62|96.38| 989.59| 41.72| 8.04| 34.60| |maxvittinyrw224.swin1k |83.50|96.50| 1100.53| 29.06| 5.11| 33.11| |maxvittinytf_224.in1k |83.41|96.59| 1004.94| 30.92| 5.60| 35.78| |coatnetrmlp1rw224.sw_in1k |83.36|96.45| 1093.03| 41.69| 7.85| 35.47| |maxxvitv2nanorw256.swin1k |83.11|96.33| 1276.88| 23.70| 6.26| 23.05| |maxxvitrmlpnanorw256.sw_in1k |83.03|96.34| 1341.24| 16.78| 4.37| 26.05| |maxvitrmlpnanorw256.sw_in1k |82.96|96.26| 1283.24| 15.50| 4.47| 31.92| |maxvitnanorw256.swin1k |82.93|96.23| 1218.17| 15.45| 4.46| 30.28| |coatnetbn0rw224.sw_in1k |82.39|96.19| 1600.14| 27.44| 4.67| 22.04| |coatnet0rw224.swin1k |82.39|95.84| 1831.21| 27.44| 4.43| 18.73| |coatnetrmlpnanorw224.sw_in1k |82.05|95.87| 2109.09| 15.15| 2.62| 20.34| |coatnextnanorw224.swin1k |81.95|95.92| 2525.52| 14.70| 2.47| 12.80| |coatnetnanorw224.swin1k |81.70|95.64| 2344.52| 15.14| 2.41| 15.41| |maxvitrmlppicorw256.sw_in1k |80.53|95.21| 1594.71| 7.52| 1.85| 24.86|
- Python
Published by rwightman over 3 years ago
instances - v0.8.6dev0 Release
Jan 11, 2023
- Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT
.in12ktags)convnext_nano.in12k_ft_in1k- 82.3 @ 224, 82.9 @ 288 (previously released)convnext_tiny.in12k_ft_in1k- 84.2 @ 224, 84.5 @ 288convnext_small.in12k_ft_in1k- 85.2 @ 224, 85.3 @ 288
Jan 6, 2023
- Finally got around to adding
--model-kwargsand--opt-kwargsto scripts to pass through rare args directly to model classes from cmd linetrain.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silutrain.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
- Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.
Jan 5, 2023
- ConvNeXt-V2 models and weights added to existing
convnext.py- Paper: ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
- Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)
- Python
Published by rwightman over 3 years ago
instances - v0.8.2dev0 Release
Part way through the conversion of models to multi-weight support (model_arch.pretrain_tag), module reorg for future building, and lots of new weights and model additions as we go...
This is considered a development release. Please stick to 0.6.x if you need stability. Some of the model names, tags will shift a bit, some old names have already been deprecated and remapping support not added yet. For code 0.6.x branch is considered 'stable' https://github.com/rwightman/pytorch-image-models/tree/0.6.x
Dec 23, 2022 🎄☃
- Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
- Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
- More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
- More ImageNet-12k (subset of 22k) pretrain models popping up:
efficientnet_b5.in12k_ft_in1k- 85.9 @ 448x448vit_medium_patch16_gap_384.in12k_ft_in1k- 85.5 @ 384x384vit_medium_patch16_gap_256.in12k_ft_in1k- 84.5 @ 256x256convnext_nano.in12k_ft_in1k- 82.9 @ 288x288
Dec 8, 2022
- Add 'EVA l' to
vision_transformer.py, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)- original source: https://github.com/baaivision/EVA
| model | top1 | paramcount | gmac | macts | hub | |:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------| | evalargepatch14336.in22kftin22kin1k | 89.2 | 304.5 | 191.1 | 270.2 | link | | evalargepatch14336.in22kftin1k | 88.7 | 304.5 | 191.1 | 270.2 | link | | evalargepatch14196.in22kftin22kin1k | 88.6 | 304.1 | 61.6 | 63.5 | link | | evalargepatch14196.in22kft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | link |
Dec 6, 2022
- Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to
beit.py.- original source: https://github.com/baaivision/EVA
- paper: https://arxiv.org/abs/2211.07636
| model | top1 | paramcount | gmac | macts | hub | |:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------| | evagiantpatch14560.m30mftin22kin1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | link | | evagiantpatch14336.m30mftin22kin1k | 89.6 | 1013 | 620.6 | 550.7 | link | | evagiantpatch14336.clipftin1k | 89.4 | 1013 | 620.6 | 550.7 | link | | evagiantpatch14224.clipft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | link |
Dec 5, 2022
- Pre-release (
0.8.0dev0) of multi-weight support (model_arch.pretrained_tag). Install withpip install --pre timm- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
- Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use
--torchcompileargument - Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
- Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
| model | top1 | paramcount | gmac | macts | hub | |:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------| | vithugepatch14clip336.laion2bftin12kin1k | 88.6 | 632.5 | 391 | 407.5 | link | | vitlargepatch14clip336.openaiftin12kin1k | 88.3 | 304.5 | 191.1 | 270.2 | link | | vithugepatch14clip224.laion2bftin12kin1k | 88.2 | 632 | 167.4 | 139.4 | link | | vitlargepatch14clip336.laion2bftin12kin1k | 88.2 | 304.5 | 191.1 | 270.2 | link | | vitlargepatch14clip224.openaiftin12kin1k | 88.2 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip224.laion2bftin12kin1k | 87.9 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip224.openaiftin1k | 87.9 | 304.2 | 81.1 | 88.8 | link | | vitlargepatch14clip336.laion2bftin1k | 87.9 | 304.5 | 191.1 | 270.2 | link | | vithugepatch14clip224.laion2bftin1k | 87.6 | 632 | 167.4 | 139.4 | link | | vitlargepatch14clip224.laion2bftin1k | 87.3 | 304.2 | 81.1 | 88.8 | link | | vitbasepatch16clip384.laion2bftin12kin1k | 87.2 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.openaiftin12kin1k | 87 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.laion2bftin1k | 86.6 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip384.openaiftin1k | 86.2 | 86.9 | 55.5 | 101.6 | link | | vitbasepatch16clip224.laion2bftin12kin1k | 86.2 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch16clip224.openaiftin12kin1k | 85.9 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip448.laion2bftin12kin1k | 85.8 | 88.3 | 17.9 | 23.9 | link | | vitbasepatch16clip224.laion2bftin1k | 85.5 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip384.laion2bftin12kin1k | 85.4 | 88.3 | 13.1 | 16.5 | link | | vitbasepatch16clip224.openaiftin1k | 85.3 | 86.6 | 17.6 | 23.9 | link | | vitbasepatch32clip384.openaiftin12kin1k | 85.2 | 88.3 | 13.1 | 16.5 | link | | vitbasepatch32clip224.laion2bftin12kin1k | 83.3 | 88.2 | 4.4 | 5 | link | | vitbasepatch32clip224.laion2bftin1k | 82.6 | 88.2 | 4.4 | 5 | link | | vitbasepatch32clip224.openaift_in1k | 81.9 | 88.2 | 4.4 | 5 | link |
- Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
| model | top1 | paramcount | gmac | macts | hub | |:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------| | maxvitxlargetf512.in21kftin1k | 88.5 | 475.8 | 534.1 | 1413.2 | link | | maxvitxlargetf384.in21kftin1k | 88.3 | 475.3 | 292.8 | 668.8 | link | | maxvitbasetf512.in21kftin1k | 88.2 | 119.9 | 138 | 704 | link | | maxvitlargetf512.in21kftin1k | 88 | 212.3 | 244.8 | 942.2 | link | | maxvitlargetf384.in21kftin1k | 88 | 212 | 132.6 | 445.8 | link | | maxvitbasetf384.in21kftin1k | 87.9 | 119.6 | 73.8 | 332.9 | link | | maxvitbasetf512.in1k | 86.6 | 119.9 | 138 | 704 | link | | maxvitlargetf512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | link | | maxvitbasetf384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | link | | maxvitlargetf384.in1k | 86.2 | 212 | 132.6 | 445.8 | link | | maxvitsmalltf512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | link | | maxvittinytf512.in1k | 85.7 | 31 | 33.5 | 257.6 | link | | maxvitsmalltf384.in1k | 85.5 | 69 | 35.9 | 183.6 | link | | maxvittinytf384.in1k | 85.1 | 31 | 17.5 | 123.4 | link | | maxvitlargetf224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | link | | maxvitbasetf224.in1k | 84.9 | 119.5 | 24 | 95 | link | | maxvitsmalltf224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | link | | maxvittinytf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | link |
Oct 15, 2022
- Train and validation script enhancements
- Non-GPU (ie CPU) device support
- SLURM compatibility for train script
- HF datasets support (via ReaderHfds)
- TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
- in_chans !=3 support for scripts / loader
- Adan optimizer
- Can enable per-step LR scheduling via args
- Dataset 'parsers' renamed to 'readers', more descriptive of purpose
- AMP args changed, APEX via
--amp-impl apex, bfloat16 supportedf via--amp-dtype bfloat16 - main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
- master -> main branch rename
- Python
Published by rwightman over 3 years ago
instances - v0.6.12 Release
Minor bug fixes to HF pushtohub, plus some more MaxVit weights
Oct 10, 2022
- More weights in
maxxvitseries, incl first ConvNeXt block basedcoatnextandmaxxvitexperiments:coatnext_nano_rw_224- 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm)maxxvit_rmlp_nano_rw_256- 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)maxvit_rmlp_small_rw_224- 84.5 @ 224, 85.1 @ 320 (G)maxxvit_rmlp_small_rw_256- 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparams need tuning (uses ConvNeXt block, no BN)coatnet_rmlp_2_rw_224- 84.6 @ 224, 85 @ 320 (T)
- Python
Published by rwightman over 3 years ago
instances - v0.6.11 Release
Changes Since 0.6.7
Sept 23, 2022
- CLIP LAION-2B pretrained B/32, L/14, H/14, and g/14 image tower weights as vit models (for fine-tune)
Sept 7, 2022
- Hugging Face
timmdocs home now exists, look for more here in the future - Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
- Add more weights in
maxxvitseries incl apico(7.5M params, 1.9 GMACs), twotinyvariants:maxvit_rmlp_pico_rw_256- 80.5 @ 256, 81.3 @ 320 (T)maxvit_tiny_rw_224- 83.5 @ 224 (G)maxvit_rmlp_tiny_rw_256- 84.2 @ 256, 84.8 @ 320 (T)
Aug 29, 2022
- MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
maxvit_rmlp_nano_rw_256- 83.0 @ 256, 83.6 @ 320 (T)
Aug 26, 2022
- CoAtNet (https://arxiv.org/abs/2106.04803) and MaxVit (https://arxiv.org/abs/2204.01697)
timmoriginal models- both found in
maxxvit.pymodel def, contains numerous experiments outside scope of original papers - an unfinished Tensorflow version from MaxVit authors can be found https://github.com/google-research/maxvit
- both found in
- Initial CoAtNet and MaxVit timm pretrained weights (working on more):
coatnet_nano_rw_224- 81.7 @ 224 (T)coatnet_rmlp_nano_rw_224- 82.0 @ 224, 82.8 @ 320 (T)coatnet_0_rw_224- 82.4 (T) -- NOTE timm '0' coatnets have 2 more 3rd stage blockscoatnet_bn_0_rw_224- 82.4 (T)maxvit_nano_rw_256- 82.9 @ 256 (T)coatnet_rmlp_1_rw_224- 83.4 @ 224, 84 @ 320 (T)coatnet_1_rw_224- 83.6 @ 224 (G)- (T) = TPU trained with
bits_and_tpubranch training code, (G) = GPU trained
- GCVit (weights adapted from https://github.com/NVlabs/GCVit, code 100%
timmre-write for license purposes) - MViT-V2 (multi-scale vit, adapted from https://github.com/facebookresearch/mvit)
- EfficientFormer (adapted from https://github.com/snap-research/EfficientFormer)
- PyramidVisionTransformer-V2 (adapted from https://github.com/whai362/PVT)
- 'Fast Norm' support for LayerNorm and GroupNorm that avoids float32 upcast w/ AMP (uses APEX LN if available for further boost)
Aug 15, 2022
- ConvNeXt atto weights added
convnext_atto- 75.7 @ 224, 77.0 @ 288convnext_atto_ols- 75.9 @ 224, 77.2 @ 288
Aug 5, 2022
- More custom ConvNeXt smaller model defs with weights
convnext_femto- 77.5 @ 224, 78.7 @ 288convnext_femto_ols- 77.9 @ 224, 78.9 @ 288convnext_pico- 79.5 @ 224, 80.4 @ 288convnext_pico_ols- 79.5 @ 224, 80.5 @ 288convnext_nano_ols- 80.9 @ 224, 81.6 @ 288
- Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)
July 28, 2022
- Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks Hugo Touvron!
- Python
Published by rwightman over 3 years ago
instances - MaxxVit (CoAtNet, MaxVit, and related experimental weights)
CoAtNet (https://arxiv.org/abs/2106.04803) and MaxVit (https://arxiv.org/abs/2204.01697) timm trained weights
Weights were created reproducing the paper architectures and exploring timm sepcific additions such as ConvNeXt blocks, parallel partitioning, and other experiments.
Weights were trained on a mix of TPU and GPU systems. Bulk of weights were trained on TPU via the TRC program (https://sites.research.google/trc/about/).
CoAtNet variants run particularly well on TPU, it's a great combination. MaxVit is better suited to GPU due to the window partitioning, although there are some optimizations that can be made to improve TPU padding/utilization incl using 256x256 image size (8, 8) windo/grid size, and keeping format in NCHW for partition attention when using PyTorch XLA.
Glossary:
* coatnet - CoAtNet (MBConv + transformer blocks)
* coatnext - CoAtNet w/ ConvNeXt conv blocks
* maxvit - MaxViT (MBConv + block (ala swin) and grid partioning transformer blocks)
* maxxvit - MaxViT w/ ConvNeXt conv blocks
* rmlp - relative position embedding w/ MLP (can be resized) -- if this isn't in model name, it's using relative position bias (ala swin)
* rw - my variations on the model, slight differences in sizing / pooling / etc from Google paper spec
Results:
* maxvit_rmlp_pico_rw_256 - 80.5 @ 256, 81.3 @ 320 (T)
* coatnet_nano_rw_224 - 81.7 @ 224 (T)
* coatnext_nano_rw_224 - 82.0 @ 224 (G) -- (uses convnext block, no BatchNorm)
* coatnet_rmlp_nano_rw_224 - 82.0 @ 224, 82.8 @ 320 (T)
* coatnet_0_rw_224 - 82.4 (T) -- NOTE timm '0' coatnets have 2 more 3rd stage blocks
* coatnet_bn_0_rw_224 - 82.4 (T) -- all BatchNorm, no LayerNorm
* maxvit_nano_rw_256 - 82.9 @ 256 (T)
* maxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.6 @ 320 (T)
* maxxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.7 @ 320 (G) (uses convnext conv block, no BatchNorm)
* coatnet_rmlp_1_rw_224 - 83.4 @ 224, 84 @ 320 (T)
* maxvit_tiny_rw_224 - 83.5 @ 224 (G)
* coatnet_1_rw_224 - 83.6 @ 224 (G)
* maxvit_rmlp_tiny_rw_256 - 84.2 @ 256, 84.8 @ 320 (T)
* maxvit_rmlp_small_rw_224 - 84.5 @ 224, 85.1 @ 320 (G)
* maxxvit_rmlp_small_rw_256 - 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparms need tuning (uses convnext conv block, no BN)
* coatnet_rmlp_2_rw_224 - 84.6 @ 224, 85 @ 320 (T)
(T) = TPU trained with bits_and_tpu branch training code, (G) = GPU trained
- Python
Published by rwightman almost 4 years ago
instances - More 3rd party ViT / ViT-hybrid weights
More weights for 3rd party ViT / ViT-CNN hybrids that needed remapping / re-hosting
EfficientFormer
Rehosted and remaped checkpoints from https://github.com/snap-research/EfficientFormer (originals in Google Drive)
GCViT
Heavily remaped from originals at https://github.com/NVlabs/GCVit due to from-scratch re-write of model code
NOTE: these checkpoints have a non-commercial CC-BY-NC-SA-4.0 license.
- Python
Published by rwightman almost 4 years ago
instances - v0.6.7 Release
Minor bug fixes and a few more weights since 0.6.5
- A few more weights & model defs added:
darknetaa53- 79.8 @ 256, 80.5 @ 288convnext_nano- 80.8 @ 224, 81.5 @ 288cs3sedarknet_l- 81.2 @ 256, 81.8 @ 288cs3darknet_x- 81.8 @ 256, 82.2 @ 288cs3sedarknet_x- 82.2 @ 256, 82.7 @ 288cs3edgenet_x- 82.2 @ 256, 82.7 @ 288cs3se_edgenet_x- 82.8 @ 256, 83.5 @ 320
cs3*weights above all trained on TPU w/bits_and_tpubranch. Thanks to TRC program!- Add output_stride=8 and 16 support to ConvNeXt (dilation)
- deit3 models not being able to resize pos_emb fixed
- Python
Published by rwightman almost 4 years ago
instances - v0.6.5 Release
First official release in a long while (since 0.5.4). All change log since 0.5.4 below,
July 8, 2022
More models, more fixes
* Official research models (w/ weights) added:
* EdgeNeXt from (https://github.com/mmaaz60/EdgeNeXt)
* MobileViT-V2 from (https://github.com/apple/ml-cvnets)
* DeiT III (Revenge of the ViT) from (https://github.com/facebookresearch/deit)
* My own models:
* Small ResNet defs added by request with 1 block repeats for both basic and bottleneck (resnet10 and resnet14)
* CspNet refactored with dataclass config, simplified CrossStage3 (cs3) option. These are closer to YOLO-v5+ backbone defs.
* More relative position vit fiddling. Two srelpos (shared relative position) models trained, and a medium w/ class token.
* Add an alternate downsample mode to EdgeNeXt and train a small model. Better than original small, but not their new USI trained weights.
* My own model weight results (all ImageNet-1k training)
* resnet10t - 66.5 @ 176, 68.3 @ 224
* resnet14t - 71.3 @ 176, 72.3 @ 224
* resnetaa50 - 80.6 @ 224 , 81.6 @ 288
* darknet53 - 80.0 @ 256, 80.5 @ 288
* cs3darknet_m - 77.0 @ 256, 77.6 @ 288
* cs3darknet_focus_m - 76.7 @ 256, 77.3 @ 288
* cs3darknet_l - 80.4 @ 256, 80.9 @ 288
* cs3darknet_focus_l - 80.3 @ 256, 80.9 @ 288
* vit_srelpos_small_patch16_224 - 81.1 @ 224, 82.1 @ 320
* vit_srelpos_medium_patch16_224 - 82.3 @ 224, 83.1 @ 320
* vit_relpos_small_patch16_cls_224 - 82.6 @ 224, 83.6 @ 320
* edgnext_small_rw - 79.6 @ 224, 80.4 @ 320
* cs3, darknet, and vit_*relpos weights above all trained on TPU thanks to TRC program! Rest trained on overheating GPUs.
* Hugging Face Hub support fixes verified, demo notebook TBA
* Pretrained weights / configs can be loaded externally (ie from local disk) w/ support for head adaptation.
* Add support to change image extensions scanned by timm datasets/parsers. See (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
* Default ConvNeXt LayerNorm impl to use F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2) via LayerNorm2d in all cases.
* a bit slower than previous custom impl on some hardware (ie Ampere w/ CL), but overall fewer regressions across wider HW / PyTorch version ranges.
* previous impl exists as LayerNormExp2d in models/layers/norm.py
* Numerous bug fixes
* Currently testing for imminent PyPi 0.6.x release
* LeViT pretraining of larger models still a WIP, they don't train well / easily without distillation. Time to add distill support (finally)?
* ImageNet-22k weight training + finetune ongoing, work on multi-weight support (slowly) chugging along (there are a LOT of weights, sigh) ...
May 13, 2022
- Official Swin-V2 models and weights added from (https://github.com/microsoft/Swin-Transformer). Cleaned up to support torchscript.
- Some refactoring for existing
timmSwin-V2-CR impl, will likely do a bit more to bring parts closer to official and decide whether to merge some aspects. - More Vision Transformer relative position / residual post-norm experiments (all trained on TPU thanks to TRC program)
vit_relpos_small_patch16_224- 81.5 @ 224, 82.5 @ 320 -- rel pos, layer scale, no class token, avg poolvit_relpos_medium_patch16_rpn_224- 82.3 @ 224, 83.1 @ 320 -- rel pos + res-post-norm, no class token, avg poolvit_relpos_medium_patch16_224- 82.5 @ 224, 83.3 @ 320 -- rel pos, layer scale, no class token, avg poolvit_relpos_base_patch16_gapcls_224- 82.8 @ 224, 83.9 @ 320 -- rel pos, layer scale, class token, avg pool (by mistake)
- Bring 512 dim, 8-head 'medium' ViT model variant back to life (after using in a pre DeiT 'small' model for first ViT impl back in 2020)
- Add ViT relative position support for switching btw existing impl and some additions in official Swin-V2 impl for future trials
- Sequencer2D impl (https://arxiv.org/abs/2205.01972), added via PR from author (https://github.com/okojoalg)
May 2, 2022
- Vision Transformer experiments adding Relative Position (Swin-V2 log-coord) (
vision_transformer_relpos.py) and Residual Post-Norm branches (from Swin-V2) (vision_transformer*.py)vit_relpos_base_patch32_plus_rpn_256- 79.5 @ 256, 80.6 @ 320 -- rel pos + extended width + res-post-norm, no class token, avg poolvit_relpos_base_patch16_224- 82.5 @ 224, 83.6 @ 320 -- rel pos, layer scale, no class token, avg poolvit_base_patch16_rpn_224- 82.3 @ 224 -- rel pos + res-post-norm, no class token, avg pool
- Vision Transformer refactor to remove representation layer that was only used in initial vit and rarely used since with newer pretrain (ie
How to Train Your ViT) vit_*models support removal of class token, use of global average pool, use of fc_norm (ala beit, mae).
April 22, 2022
timmmodels are now officially supported in fast.ai! Just in time for the new Practical Deep Learning course.timmdocsdocumentation link updated to timm.fast.ai.- Two more model weights added in the TPU trained series. Some In22k pretrain still in progress.
seresnext101d_32x8d- 83.69 @ 224, 84.35 @ 288seresnextaa101d_32x8d(anti-aliased w/ AvgPool2d) - 83.85 @ 224, 84.57 @ 288
March 23, 2022
- Add
ParallelBlockandLayerScaleoption to base vit models to support model configs in Three things everyone should know about ViT convnext_tiny_hnf(head norm first) weights trained with (close to) A2 recipe, 82.2% top-1, could do better with more epochs.
March 21, 2022
- Merge
norm_norm_norm. IMPORTANT this update for a coming 0.6.x release will likely de-stabilize the master branch for a while. Branch0.5.xor a previous 0.5.x release can be used if stability is required. - Significant weights update (all TPU trained) as described in this release
regnety_040- 82.3 @ 224, 82.96 @ 288regnety_064- 83.0 @ 224, 83.65 @ 288regnety_080- 83.17 @ 224, 83.86 @ 288regnetv_040- 82.44 @ 224, 83.18 @ 288 (timm pre-act)regnetv_064- 83.1 @ 224, 83.71 @ 288 (timm pre-act)regnetz_040- 83.67 @ 256, 84.25 @ 320regnetz_040h- 83.77 @ 256, 84.5 @ 320 (w/ extra fc in head)resnetv2_50d_gn- 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)resnetv2_50d_evos80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)regnetz_c16_evos- 81.9 @ 256, 82.64 @ 320 (EvoNormS)regnetz_d8_evos- 83.42 @ 256, 84.04 @ 320 (EvoNormS)xception41p- 82 @ 299 (timm pre-act)xception65- 83.17 @ 299xception65p- 83.14 @ 299 (timm pre-act)resnext101_64x4d- 82.46 @ 224, 83.16 @ 288seresnext101_32x8d- 83.57 @ 224, 84.270 @ 288resnetrs200- 83.85 @ 256, 84.44 @ 320
- HuggingFace hub support fixed w/ initial groundwork for allowing alternative 'config sources' for pretrained model definitions and weights (generic local file / remote url support soon)
- SwinTransformer-V2 implementation added. Submitted by Christoph Reich. Training experiments and model changes by myself are ongoing so expect compat breaks.
- Swin-S3 (AutoFormerV2) models / weights added from https://github.com/microsoft/Cream/tree/main/AutoFormerV2
- MobileViT models w/ weights adapted from https://github.com/apple/ml-cvnets
- PoolFormer models w/ weights adapted from https://github.com/sail-sg/poolformer
- VOLO models w/ weights adapted from https://github.com/sail-sg/volo
- Significant work experimenting with non-BatchNorm norm layers such as EvoNorm, FilterResponseNorm, GroupNorm, etc
- Enhance support for alternate norm + act ('NormAct') layers added to a number of models, esp EfficientNet/MobileNetV3, RegNet, and aligned Xception
- Grouped conv support added to EfficientNet family
- Add 'group matching' API to all models to allow grouping model parameters for application of 'layer-wise' LR decay, lr scale added to LR scheduler
- Gradient checkpointing support added to many models
forward_head(x, pre_logits=False)fn added to all models to allow separate calls offorward_features+forward_head- All vision transformer and vision MLP models update to return non-pooled / non-token selected features from
foward_features, for consistency with CNN models, token selection or pooling now applied inforward_head
Feb 2, 2022
- Chris Hughes posted an exhaustive run through of
timmon his blog yesterday. Well worth a read. Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide - I'm currently prepping to merge the
norm_norm_normbranch back to master (ver 0.6.x) in next week or so.- The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware
pip install git+https://github.com/rwightman/pytorch-image-modelsinstalls! 0.5.xreleases and a0.5.xbranch will remain stable with a cherry pick or two until dust clears. Recommend sticking to pypi install for a bit if you want stable.
- The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware
- Python
Published by rwightman almost 4 years ago
instances - Swin Transformer V2 (CR) weights and experiments
This release holds weights for timm's variant of Swin V2 (from @ChristophReich1996 impl, https://github.com/ChristophReich1996/Swin-Transformer-V2)
NOTE: ns variants of the models have extra norms on the main branch at the end of each stage, this seems to help training. The current small model is not using this, but currently training one. Will have a non-ns tiny soon as well as a comparsion. in21k and 1k base models are also in the works...
small checkpoints trained on TPU-VM instances via the TPU-Research Cloud (https://sites.research.google/trc/about/)
swin_v2_tiny_ns_224- 81.80 top-1swin_v2_small_224- 83.13 top-1swin_v2_small_ns_224- 83.5 top-1
- Python
Published by rwightman about 4 years ago
instances - TPU VM trained weight release w/ PyTorch XLA
A wide range of mid-large sized models trained in PyTorch XLA on TPU VM instances. Demonstrating viability of the TPU + PyTorch combo for excellent image model results. All models trained w/ the bits_and_tpu branch of this codebase.
A big thanks to the TPU Research Cloud (https://sites.research.google/trc/about/) for the compute used in these experiments.
This set includes several novel weights, including EvoNorm-S RegNetZ (C/D timm variants) and ResNet-V2 model experiments, as well as custom pre-activation model variants of RegNet-Y (called RegNet-V) and Xception (Xception-P) models.
Many if not all of the included RegNet weights surpass original paper results by a wide margin and remain above other known results (e.g. recent torchvision updates) in ImageNet-1k validation and especially OOD test set / robustness performance and scaling to higher resolutions.
RegNets
regnety_040- 82.3 @ 224, 82.96 @ 288regnety_064- 83.0 @ 224, 83.65 @ 288regnety_080- 83.17 @ 224, 83.86 @ 288regnetv_040- 82.44 @ 224, 83.18 @ 288 (timm pre-act)regnetv_064- 83.1 @ 224, 83.71 @ 288 (timm pre-act)regnetz_040- 83.67 @ 256, 84.25 @ 320regnetz_040h- 83.77 @ 256, 84.5 @ 320 (w/ extra fc in head)
Alternative norm layers (no BN!)
resnetv2_50d_gn- 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)resnetv2_50d_evos80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)regnetz_c16_evos- 81.9 @ 256, 82.64 @ 320 (EvoNormS)regnetz_d8_evos- 83.42 @ 256, 84.04 @ 320 (EvoNormS)
Xception redux
xception41p- 82 @ 299 (timm pre-act)xception65- 83.17 @ 299xception65p- 83.14 @ 299 (timm pre-act)
ResNets (w/ SE and/or NeXT)
resnext101_64x4d- 82.46 @ 224, 83.16 @ 288seresnext101_32x8d- 83.57 @ 224, 84.27 @ 288seresnext101d_32x8d- 83.69 @ 224, 84.35 @ 288seresnextaa101d_32x8d- 83.85 @ 224, 84.57 @ 288resnetrs200- 83.85 @ 256, 84.44 @ 320
Vision transformer experiments -- relpos, residual-post-norm, layer-scale, fc-norm, and GAP
vit_relpos_base_patch32_plus_rpn_256- 79.5 @ 256, 80.6 @ 320 -- rel pos + extended width + res-post-norm, no class token, avg poolvit_relpos_small_patch16_224- 81.5 @ 224, 82.5 @ 320 -- rel pos, layer scale, no class token, avg poolvit_relpos_medium_patch16_rpn_224- 82.3 @ 224, 83.1 @ 320 -- rel pos + res-post-norm, no class token, avg poolvit_base_patch16_rpn_224- 82.3 @ 224 -- rel pos + res-post-norm, no class token, avg poolvit_relpos_medium_patch16_224- 82.5 @ 224, 83.3 @ 320 -- rel pos, layer scale, no class token, avg poolvit_relpos_base_patch16_224- 82.5 @ 224, 83.6 @ 320 -- rel pos, layer scale, no class token, avg poolvit_relpos_base_patch16_gapcls_224- 82.8 @ 224, 83.9 @ 320 -- rel pos, layer scale, class token, avg pool (by mistake)
- Python
Published by rwightman about 4 years ago
instances - MobileViT weights
Pretrained weights for MobileViT and MobileViT-V2 adapted from Apple impl at https://github.com/apple/ml-cvnets
Checkpoints remapped to timm impl of the model with BGR corrected to RGB (for V1).
- Python
Published by rwightman over 4 years ago
instances - v0.5.4 - More weights, models. ResNet strikes back, self-attn - convnet hybrids, optimizers and more
- Python
Published by rwightman over 4 years ago
instances - v0.1-rsb-weights
Weights for ResNet Strikes Back
Paper: https://arxiv.org/abs/2110.00476
More details on weights and hparams to come...
- Python
Published by rwightman over 4 years ago
instances - v0.1-attn-weights
A collection of weights I've trained comparing various types of SE-like (SE, ECA, GC, etc), self-attention (bottleneck, halo, lambda) blocks, and related non-attn baselines.
ResNet-26-T series
- [2, 2, 2, 2] repeat Bottlneck block ResNet architecture
- ReLU activations
- 3 layer stem with 24, 32, 64 chs, max-pool
- avg pool in shortcut downsample
- self-attn blocks replace 3x3 in both blocks for last stage, and second block of penultimate stage
|model |top1 |top1err|top5 |top5err|paramcount|imgsize|croptpct|interpolation| |--------------|------|--------|------|--------|-----------|--------|---------|-------------| |botnet26t256 |79.246|20.754 |94.53 |5.47 |12.49 |256 |0.95 |bicubic | |halonet26t |79.13 |20.87 |94.314|5.686 |12.48 |256 |0.95 |bicubic | |lambdaresnet26t|79.112|20.888 |94.59 |5.41 |10.96 |256 |0.94 |bicubic | |lambdaresnet26rpt_256|78.964|21.036 |94.428|5.572 |10.99 |256 |0.94 |bicubic | |resnet26t |77.872|22.128 |93.834|6.166 |16.01 |256 |0.94 |bicubic |
Details: * HaloNet - 8 pixel block size, 2 pixel halo (overlap), relative position embedding * BotNet - relative position embedding * Lambda-ResNet-26-T - 3d lambda conv, kernel = 9 * Lambda-ResNet-26-RPT - relative position embedding
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
|model |infersamplespersec|infersteptime|inferbatchsize|inferimgsize|trainsamplespersec|trainsteptime|trainbatchsize|trainimgsize|paramcount| |--------------|---------------------|---------------|----------------|--------------|---------------------|---------------|----------------|--------------|-----------| |resnet26t |2967.55 |86.252 |256 |256 |857.62 |297.984 |256 |256 |16.01 | |botnet26t256 |2642.08 |96.879 |256 |256 |809.41 |315.706 |256 |256 |12.49 | |halonet26t |2601.91 |98.375 |256 |256 |783.92 |325.976 |256 |256 |12.48 | |lambdaresnet26t|2354.1 |108.732 |256 |256 |697.28 |366.521 |256 |256 |10.96 | |lambdaresnet26rpt_256|1847.34 |138.563 |256 |256 |644.84 |197.892 |128 |256 |10.99 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
|model |infersamplespersec|infersteptime|inferbatchsize|inferimgsize|trainsamplespersec|trainsteptime|trainbatchsize|trainimgsize|paramcount| |----------------------|---------------------|---------------|----------------|--------------|---------------------|---------------|----------------|--------------|-----------| |resnet26t |3691.94 |69.327 |256 |256 |1188.17 |214.96 |256 |256 |16.01 | |botnet26t256 |3291.63 |77.76 |256 |256 |1126.68 |226.653 |256 |256 |12.49 | |halonet26t |3230.5 |79.232 |256 |256 |1077.82 |236.934 |256 |256 |12.48 | |lambdaresnet26rpt256|2324.15 |110.133 |256 |256 |864.42 |147.485 |128 |256 |10.99 | |lambda_resnet26t|Not Supported | | | | | |
ResNeXT-26-T series
- [2, 2, 2, 2] repeat Bottlneck block ResNeXt architectures
- SiLU activations
- grouped 3x3 convolutions in bottleneck, 32 channels per group
- 3 layer stem with 24, 32, 64 chs, max-pool
- avg pool in shortcut downsample
- channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
- when active, self-attn blocks replace 3x3 conv in both blocks for last stage, and second block of penultimate stage
|model |top1 |top1err|top5 |top5err|paramcount|imgsize|croptpct|interpolation| |--------------|------|--------|------|--------|-----------|--------|---------|-------------| |ecahalonext26ts|79.484 |20.516 |94.600 |5.400 |10.76 |256 |0.94 |bicubic | |ecabotnext26ts256|79.270 |20.730 |94.594 |5.406 |10.59 |256 |0.95 |bicubic | |batresnext26ts|78.268|21.732 |94.1 |5.9 |10.73 |256 |0.9 |bicubic | |seresnext26ts |77.852|22.148 |93.784|6.216 |10.39 |256 |0.9 |bicubic | |gcresnext26ts |77.804|22.196 |93.824|6.176 |10.48 |256 |0.9 |bicubic | |ecaresnext26ts|77.446|22.554 |93.57 |6.43 |10.3 |256 |0.9 |bicubic | |resnext26ts |76.764|23.236 |93.136|6.864 |10.3 |256 |0.9 |bicubic |
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
|model |infersamplespersec|infersteptime|inferbatchsize|inferimgsize|trainsamplespersec|trainsteptime|trainbatchsize|trainimgsize|paramcount| |----------------------|---------------------|---------------|----------------|--------------|---------------------|---------------|----------------|--------------|-----------| |resnext26ts |3006.57 |85.134 |256 |256 |864.4 |295.646 |256 |256 |10.3 | |seresnext26ts |2931.27 |87.321 |256 |256 |836.92 |305.193 |256 |256 |10.39 | |ecaresnext26ts |2925.47 |87.495 |256 |256 |837.78 |305.003 |256 |256 |10.3 | |gcresnext26ts |2870.01 |89.186 |256 |256 |818.35 |311.97 |256 |256 |10.48 | |ecabotnext26ts256 |2652.03 |96.513 |256 |256 |790.43 |323.257 |256 |256 |10.59 | |ecahalonext26ts |2593.03 |98.705 |256 |256 |766.07 |333.541 |256 |256 |10.76 | |batresnext26ts |2469.78 |103.64 |256 |256 |697.21 |365.964 |256 |256 |10.73 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
NOTE: there are performance issues with certain grouped conv configs with channels last layout, backwards pass in particular is really slow. Also causing issues for RegNet and NFNet networks. |model |infersamplespersec|infersteptime|inferbatchsize|inferimgsize|trainsamplespersec|trainsteptime|trainbatchsize|trainimgsize|paramcount| |----------------------|---------------------|---------------|----------------|--------------|---------------------|---------------|----------------|--------------|-----------| |resnext26ts |3952.37 |64.755 |256 |256 |608.67 |420.049 |256 |256 |10.3 | |ecaresnext26ts |3815.77 |67.074 |256 |256 |594.35 |430.146 |256 |256 |10.3 | |seresnext26ts |3802.75 |67.304 |256 |256 |592.82 |431.14 |256 |256 |10.39 | |gcresnext26ts |3626.97 |70.57 |256 |256 |581.83 |439.119 |256 |256 |10.48 | |ecabotnext26ts256 |3515.84 |72.8 |256 |256 |611.71 |417.862 |256 |256 |10.59 | |ecahalonext26ts |3410.12 |75.057 |256 |256 |597.52 |427.789 |256 |256 |10.76 | |batresnext26ts |3053.83 |83.811 |256 |256 |533.23 |478.839 |256 |256 |10.73 |
ResNet-33-T series.
- [2, 3, 3, 2] repeat Bottlneck block ResNet architecture
- SiLU activations
- 3 layer stem with 24, 32, 64 chs, no max-pool, 1st and 3rd conv stride 2
- avg pool in shortcut downsample
- channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
- when active, self-attn blocks replace 3x3 conv last block of stage 2 and 3, and both blocks of final stage
- FC 1x1 conv between last block and classifier
The 33-layer models have an extra 1x1 FC layer between last conv block and classifier. There is both a non-attenion 33 layer baseline and a 32 layer without the extra FC.
|model |top1 |top1err|top5 |top5err|paramcount|imgsize|croptpct|interpolation| |--------------|------|--------|------|--------|-----------|--------|---------|-------------| |sehalonet33ts |80.986|19.014 |95.272|4.728 |13.69 |256 |0.94 |bicubic | |seresnet33ts |80.388|19.612 |95.108|4.892 |19.78 |256 |0.94 |bicubic | |ecaresnet33ts|80.132|19.868 |95.054|4.946 |19.68 |256 |0.94 |bicubic | |gcresnet33ts |79.99 |20.01 |94.988|5.012 |19.88 |256 |0.94 |bicubic | |resnet33ts |79.352|20.648 |94.596|5.404 |19.68 |256 |0.94 |bicubic | |resnet32ts |79.028|20.972 |94.444|5.556 |17.96 |256 |0.94 |bicubic |
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
|model |infersamplespersec|infersteptime|inferbatchsize|inferimgsize|trainsamplespersec|trainsteptime|trainbatchsize|trainimgsize|paramcount| |----------------------|---------------------|---------------|----------------|--------------|---------------------|---------------|----------------|--------------|-----------| |resnet32ts |2502.96 |102.266 |256 |256 |733.27 |348.507 |256 |256 |17.96 | |resnet33ts |2473.92 |103.466 |256 |256 |725.34 |352.309 |256 |256 |19.68 | |seresnet33ts |2400.18 |106.646 |256 |256 |695.19 |367.413 |256 |256 |19.78 | |ecaresnet33ts |2394.77 |106.886 |256 |256 |696.93 |366.637 |256 |256 |19.68 | |gcresnet33ts |2342.81 |109.257 |256 |256 |678.22 |376.404 |256 |256 |19.88 | |sehalonet33ts |1857.65 |137.794 |256 |256 |577.34 |442.545 |256 |256 |13.69 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
|model |infersamplespersec|infersteptime|inferbatchsize|inferimgsize|trainsamplespersec|trainsteptime|trainbatchsize|trainimgsize|paramcount| |----------------------|---------------------|---------------|----------------|--------------|---------------------|---------------|----------------|--------------|-----------| |resnet32ts |3306.22 |77.416 |256 |256 |1012.82 |252.158 |256 |256 |17.96 | |resnet33ts |3257.59 |78.573 |256 |256 |1002.38 |254.778 |256 |256 |19.68 | |seresnet33ts |3128.08 |81.826 |256 |256 |950.27 |268.581 |256 |256 |19.78 | |ecaresnet33ts |3127.11 |81.852 |256 |256 |948.84 |269.123 |256 |256 |19.68 | |gcresnet33ts |2984.87 |85.753 |256 |256 |916.98 |278.169 |256 |256 |19.88 | |sehalonet33ts |2188.23 |116.975 |256 |256 |711.63 |179.03 |128 |256 |13.69 |
ResNet-50(ish) models
In Progress
RegNet"Z" series
- RegNetZ inspired architecture, inverted bottleneck, SE attention, pre-classifier FC, essentially an EfficientNet w/ grouped conv instead of depthwise
- b, c, and d are three different sizes I put together to cover differing flop ranges, not based on the paper (https://arxiv.org/abs/2103.06877) or a search process
- for comparison to RegNetY and paper RegNetZ models, at 224x224 b,c, and d models are 1.45, 1.92, and 4.58 GMACs respectively, b, and c are trained at 256 here so higher than that (see tables)
haloregnetz_cuses halo attention for all of last stage, and interleaved every 3 (for 4) of penultimate stage- b, c variants use a stem / 1st stage like the paper, d uses a 3-deep tiered stem with 2-1-2 striding
ImageNet-1k validation at train resolution
|model |top1 |top1err|top5 |top5err|paramcount|imgsize|croptpct|interpolation| |-------------|------|--------|------|--------|-----------|--------|---------|-------------| |regnetzd |83.422|16.578 |96.636|3.364 |27.58 |256 |0.95 |bicubic | |regnetzc |82.164|17.836 |96.058|3.942 |13.46 |256 |0.94 |bicubic | |haloregnetzb|81.058|18.942 |95.2 |4.8 |11.68 |224 |0.94 |bicubic | |regnetz_b |79.868|20.132 |94.988|5.012 |9.72 |224 |0.94 |bicubic |
ImageNet-1k validation at optimal test res
|model |top1 |top1err|top5 |top5err|paramcount|imgsize|croptpct|interpolation| |-------------|------|--------|------|--------|-----------|--------|---------|-------------| |regnetzd |84.04 |15.96 |96.87 |3.13 |27.58 |320 |0.95 |bicubic | |regnetzc |82.516|17.484 |96.356|3.644 |13.46 |320 |0.94 |bicubic | |haloregnetzb|81.058|18.942 |95.2 |4.8 |11.68 |224 |0.94 |bicubic | |regnetz_b |80.728|19.272 |95.47 |4.53 |9.72 |288 |0.94 |bicubic |
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
|model |infersamplespersec|infersteptime|inferbatchsize|inferimgsize|inferGMACs|trainsamplespersec|trainsteptime|trainbatchsize|trainimgsize|paramcount| |-------------|---------------------|---------------|----------------|--------------|-----------|---------------------|---------------|----------------|--------------|-----------| |regnetzb |2703.42 |94.68 |256 |224 |1.45 |764.85 |333.348 |256 |224 |9.72 | |haloregnetzb|2086.22 |122.695 |256 |224 |1.88 |620.1 |411.415 |256 |224 |11.68 | |regnetzc |1653.19 |154.836 |256 |256 |2.51 |459.41 |277.268 |128 |256 |13.46 | |regnetzd |1060.91 |241.284 |256 |256 |5.98 |296.51 |430.143 |128 |256 |27.58 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
NOTE: channels last layout is painfully slow for backward pass here due to some sort of cuDNN issue |model |infersamplespersec|infersteptime|inferbatchsize|inferimgsize|inferGMACs|trainsamplespersec|trainsteptime|trainbatchsize|trainimgsize|paramcount| |-------------|---------------------|---------------|----------------|--------------|-----------|---------------------|---------------|----------------|--------------|-----------| |regnetzb |4152.59 |61.634 |256 |224 |1.45 |399.37 |639.572 |256 |224 |9.72 | |haloregnetzb|2770.78 |92.378 |256 |224 |1.88 |364.22 |701.386 |256 |224 |11.68 | |regnetzc |2512.4 |101.878 |256 |256 |2.51 |376.72 |338.372 |128 |256 |13.46 | |regnetzd |1456.05 |175.8 |256 |256 |5.98 |111.32 |1148.279 |128 |256 |27.58 |
- Python
Published by rwightman over 4 years ago
instances - v0.4.12. Vision Transformer AugReg support and more
- Vision Transformer AugReg weights and model defs (https://arxiv.org/abs/2106.10270)
- ResMLP official weights
- ECA-NFNet-L2 weights
- gMLP-S weights
- ResNet51-Q
- Visformer, LeViT, ConViT, Twins
- Many fixes, improvements, better test coverage
- Python
Published by rwightman almost 5 years ago
instances - 3rd Party Vision Transformer Weights
A catch-all (ish) release for storing vision transformer weights adapted/rehosted from 3rd parties. Too many incoming models for one release per source...
Containing weights from: * Twins - https://github.com/Meituan-AutoML/Twins * Visformer - https://github.com/danczs/Visformer/issues/2 * NesT (Aggregated Nested Transformer) - weights converted from https://github.com/google-research/nested-transformer by @alexander-soare ' script
- Python
Published by rwightman about 5 years ago
instances - v0.4.9. EfficientNetV2. MLP-Mixer. ResNet-RS. More vision transformers.
- Python
Published by rwightman about 5 years ago
instances - EfficientNet-V2 weights ported from Tensorflow impl
Weights from https://github.com/google/automl/tree/master/efficientnetv2
Paper: EfficientNetV2: Smaller Models and Faster Training - https://arxiv.org/abs/2104.00298
- Python
Published by rwightman about 5 years ago
instances - ResNet-RS weights
Weights for ResNet-RS models as per #554 . Ported from Tensorflow impl (https://github.com/tensorflow/tpu/tree/master/models/official/resnet/resnet_rs) by @amaarora
- Python
Published by rwightman about 5 years ago
instances - Weights for CoaT (vision transformer) models
Weights for CoaT: Co-Scale Conv-Attentional Image Transformers (from https://github.com/mlpc-ucsd/CoaT)
- Python
Published by rwightman about 5 years ago
instances - Weights for PiT (Pooling-based Vision Transformer) models
Weights from https://github.com/naver-ai/pit
Copyright 2021-present NAVER Corp.
Rehosted here for easy pytorch hub downloads.
- Python
Published by rwightman about 5 years ago
instances - v0.4.5. Lots of models. NFNets (& NF-ResNet, NF-RegNet), GPU-Efficient Nets, RepVGG, VGG.
- Python
Published by rwightman about 5 years ago
instances - DeepMind NFNet-F* weights
Weights converted from DeepMind Haiku impl of NFNets (https://github.com/deepmind/deepmind-research/tree/master/nfnets)
- Python
Published by rwightman over 5 years ago
instances - RepVGG checkpoints remapped from official repo
Checkpoints remapped from official repository at https://github.com/DingXiaoH/RepVGG
- Python
Published by rwightman over 5 years ago
instances - GPU-Efficient (Residual) Networks checkpoints
Checkpoints remapped from official repo at https://github.com/idstcv/GPU-Efficient-Networks
- Python
Published by rwightman over 5 years ago
instances - v0.3.4. Minor release. Conda setup.cfg added
- Python
Published by rwightman over 5 years ago
instances - v0.3.3. ResNet-101D/152D/200D and SE-ResNet-152D models w/ weights.
- Python
Published by rwightman over 5 years ago
instances - Ported weights from official JAX impl of Vision Transformers and MLP-Mixer
Converted to PyTorch from https://github.com/google-research/vision_transformer
- Python
Published by rwightman over 5 years ago
instances - Feature Maps, More Models, CutMix
Aug 12, 2020
- New/updated weights from training experiments
- EfficientNet-B3 - 82.1 top-1 (vs 81.6 for official with AA and 81.9 for AdvProp)
- RegNetY-3.2GF - 82.0 top-1 (78.9 from official ver)
- CSPResNet50 - 79.6 top-1 (76.6 from official ver)
- Add CutMix integrated w/ Mixup. See pull request for some usage examples
- Some fixes for using pretrained weights with
in_chans!= 3 on several models.
Aug 5, 2020
Universal feature extraction, new models, new weights, new test sets.
* All models support the features_only=True argument for create_model call to return a network that extracts feature maps from the deepest layer at each stride.
* New models
* CSPResNet, CSPResNeXt, CSPDarkNet, DarkNet
* ReXNet
* (Modified Aligned) Xception41/65/71 (a proper port of TF models)
* New trained weights
* SEResNet50 - 80.3 top-1
* CSPDarkNet53 - 80.1 top-1
* CSPResNeXt50 - 80.0 top-1
* DPN68b - 79.2 top-1
* EfficientNet-Lite0 (non-TF ver) - 75.5 (submitted by @hal-314)
* Add 'real' labels for ImageNet and ImageNet-Renditions test set, see results/README.md
* Test set ranking/top-n diff script by @KushajveerSingh
* Train script and loader/transform tweaks to punch through more aug arguments
* README and documentation overhaul. See initial (WIP) documentation at https://rwightman.github.io/pytorch-image-models/
* adamp and sgdp optimizers added by @hellbell
- Python
Published by rwightman almost 6 years ago
instances - RexNet remapped weights
ReXNet weights from https://github.com/clovaai/rexnet#pretrained remapped for timm model changes
- Python
Published by rwightman almost 6 years ago
instances - Mirror of ResNeSt weights
These are a mirror of weights from the official repository (https://github.com/zhanghang1989/ResNeSt ) to avoid issues with hosting changes/relocation
- Python
Published by rwightman almost 6 years ago
instances - RegNet official weights (remapped and cleaned)
RegNet weights cleaned and remapped from https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md
Changes: * first layer remapped from BGR to RGB * cleaned out training details such as optimizer state, etc and leave just model state_dict (1/2 size) * map layer names to mine
- Python
Published by rwightman about 6 years ago
instances - TResNet weights
Weights copied and cleaned (just state dict) from https://github.com/mrT23/TResNet/blob/master/MODEL_ZOO.md and other MIIL weight releases hosted at (*.aliyuncs.com) for more consistent/fast transfer speeds and avoidance of downtime.
- Python
Published by rwightman about 6 years ago
instances - SelecSLS Weights
These weights are re-hosted from original repository (https://github.com/mehtadushy/SelecSLS-Pytorch) with permission of the author, Dushyant Mehta (@mehtadushy), under a CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/legalcode) license.
SelecSLS (core) Network Architecture as proposed in "XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera, Mehta et al." https://arxiv.org/abs/1907.00837
- Python
Published by rwightman over 6 years ago
instances - HRNet weights from official impl
HRNet weights downloaded from official impl OneDrive links at: https://github.com/HRNet/HRNet-Image-Classification. Rehosted here with SHA hash for hub/modelzoo download compatibility.
- Python
Published by rwightman over 6 years ago
instances - Res2Net weights
Res2Net weights from https://github.com/gasvn/Res2Net for easier/faster access from North America that's compatible with modelzoo loadurl
- Python
Published by rwightman over 6 years ago
instances - Released on PyPi
https://pypi.org/project/timm/
- Python
Published by rwightman almost 7 years ago
instances - Pretrained weights (from Cadene)
These weights have all originated from Cadene's Pretrained model repository: https://github.com/Cadene/pretrained-models.pytorch
I'm re-hosting some of the weights here that I use more often to reduce download times as the US/Canada to France link can be slow.
- Python
Published by rwightman almost 7 years ago
instances - Pretrained weights
All weights present here were either trained by me with the code in this repository or ported by me from original implementations.
- Python
Published by rwightman about 7 years ago