Recent Releases of org.pytorch:torchvision_ops

org.pytorch:torchvision_ops - Torchvision 0.23 release

Highlight - Transforming KeyPoints and Rotated boxes!

📦 This release is introducing two highly popular feature requests: Transforms support for KeyPoints and Rotated Bounding Boxes! * Rotated Bounding Boxes provide a tighter fit and alignment with rotated and elongated objects, which improves the localization, reduces overlap in densely packed images, and improves isolation of objects in crowded scenes. * KeyPoints offer a robust and accurate way to identify and locate specific points of interest within an image or video frame. These features aim at improving developer experience to implement use cases, including detecting & tracking objects, estimating pose, analyzing facial expressions, and creating augmented reality experiences.

We illustrated the use of Rotated Bounding Boxes below. You can expect keypoints and rotated boxes to work with all existing torchvision transforms in torchvision.transforms.v2. You can find some examples on how to use those transformations in our Transforms on Rotated Bounding Boxes tutorials.

[!NOTE] These features are released in BETA status. The API are unlikely to change, but we may have some rough edges and we may make some slight bug fixes in future releases. Please let us know if you encounter any issue!

Detailed changes

New Features

[transforms] Added support for BoundingBoxes formats and transforms (#9104, #9084, #9095, #9138) [transforms] Added the KeyPoints to TVTensor and support for transforms (#8817)

Improvements

[utils] Add label background to draw_bounding_boxes (#9018) [MPS] Add deformable conv2d kernel support on MPS (#9017, #9115) [documentation] Various documentation improvements (#9063, #9119, #9083, #9105, #9106, #9145) [code-quality] Bunch of code quality improvements (#9087, #9093, #8814, #9035, #9120, #9080, #9027, #9062, #9117, #9024, #9032)

Bug Fixes

[datasets] Fix COCO dataset to avoid issue when copying the dataset results (#9107) [datasets] Raise error when download=True for LFW dataset, which is not available for download anymore #9040) [tv_tensors] Add error message when setting 1D tensor ToImage() (#9114) [io] Warn when webp is asked to decode into grayscale (#9101)

Contributors

🎉 We're grateful for our community, which helps us improve Torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release: @AlannaBurke, @Alexandre-SCHOEPP, @atalman, @AntoineSimoulin, @BoyuanFeng, @cyyever, @elmuz, @emmanuel-ferdman, @hmk114, @Isalia20, @NicolasHug, @malfet, @chengolivia, @RhutvikH, @hvaara, @scotts, @alinpahontu2912, @tsahiasher, and @youcefouadjer.

- Python
Published by AntoineSimoulin 7 months ago

org.pytorch:torchvision_ops - TorchVision 0.22.1 Release

Key info

⚠️ We are updating the areas that TorchVision will be prioritizing in the future. Please take a look at https://github.com/pytorch/vision/issues/9036 for more details.

⚠️ We are deprecating the video decoding and encoding capabilities of TorchVision, and they will be removed soon in version 0.25 (aimed for end of 2025). We encourage users to migrate existing video decoding code to rely on TorchCodec project, where we are consolidating all media decoding/encoding functionalities of PyTorch.

This is a patch release, which is compatible with PyTorch 2.7.1. There are no new features added.

- Python
Published by atalman 9 months ago

org.pytorch:torchvision_ops - Torchvision 0.22 release

Key info

⚠️ We are updating the areas that TorchVision will be prioritizing in the future. Please take a look at https://github.com/pytorch/vision/issues/9036 for more details.

Detailed Changes

Deprecations

[io] Video decoding and encoding capabilities are deprecated and will be removed soon in 0.25! Please migrate to TorchCodec! (#8997)

Bug Fixes

[io] Fix sync bug with encode_jpeg on CUDA (#8929) [transforms] pin_memory() now preserves TVTensor class and metadata (#8921)

Improvements

[datasets] Most datasets now support a loader parameter, which allow you to decode images directly into tensors with torchvision.io.decode_image(), instead of relying on PIL. This should lead to faster pipelines! (#8945, #8972, #8939, #8922) [datasets] Add classes attribute to the Flowers102 dataset (#8838) [datasets] Added 'test' split support for Places365 dataset (#8928) [datasets] Reduce output log on MNIST (#8865) [ops] Perf: greatly speed-up NMS on CUDA when num_boxes is high (#8766, #8925) [ops] Add roi_align nondeterministic support for XPU (#8931) [all] Improvements on input checks and error messages (#8959, #8994, #8944, #8995, #8993, #8866, #8882, #8851, #8844, #8991) [build] Various build improvements / platforms support (#8913, #8933, #8936, #8792) [docs] Various documentation improvements (#8843, #8860, #9014, #9015, #8932) [misc] Other non-user-facing changes (#8872, #8982, #8976, #8935, #8977, #8978, #8963, #8975, #8974, #8950, #8970, #8924, #8964, #8996, #8920, #8873, #8876, #8885, #8890, #8901, #8999, #8998, #8973, #8897, #9007, #8852)

Contributors

Aditya Kamath, Alexandre Ghelfi, PhD, Alfredo Tupone, amdfaa, Andrey Talman, Antoine Simoulin, Aurélien Geron, bjarzemb, deekay42, Frost Mitchell, frost-intel , GdoongMathew, Hangxing Wei, Huy Do, Nicolas Hug, Nikita Shulga, Noopur, Ruben, tvukovic-amd, Wenchen Li, Wieland Morgenstern , Yichen Yan, Yonghye Kwon, Zain Rizvi

- Python
Published by NicolasHug 10 months ago

org.pytorch:torchvision_ops - Torchvision 0.21 release

Highlights

Detailed changes

Image decoding

Torchvision continues to improve its image decoding capabilities. For this version, we added support for HEIC and AVIF image formats. Things are a bit different this time: to enable it, you'll need to pip install torchvision-extra-decoders, and the decoders are available in torchvision as torchvision.io.decode_heic() and torchvision.io.decode_avif(). This is still experimental / BETA, so let us know if you encounter any issue.

New Features

[io] Add support for decoding AVIF and HEIC image formats (#8671)

Improvements

[datasets] Don't error when dataset is already downloaded (#8691) [datasets] Don't print when dataset is already downloaded (#8681) [datasets] remove printing info in datasets (#8683) [utils] Add label_colors argument to draw_bounding_boxes (#8578) [models] Add __deepcopy__ support for DualGraphModule (#8708) [Docs] Various documentation improvements (#8798, #8709, #8576, #8620, #8846, #8758) [Code quality] Various code quality improvements (#8757, #8755, #8754, #8689, #8719, #8772, #8774, #8791, #8705)

Bug Fixes

[io] Fix memory leak in decode_webp (#8712) [io] Fix pyav 14 compatibility error (#8776) [models] Fix order of auxiliary networks in googlenet.py (#8743) [transforms] Fix adjust_hue on ARM (#8618) [reference scripts] Fix error when loading the cached dataset in video classification reference(#8727) [build] fix CUDA build with NVCC_FLAGS in env (#8692)

Tracked Regressions

[build] aarch64 builds are build with manylinux234_aarch64 tag according to auditwheel check (#8883)

Contributors

amdfaa Andreas Floros, Andrey Talman , Beh Chuen Yang, David Miguel Susano Pinto, GdoongMathew, Jason Chou, Li-Huai (Allan) Lin, Maohua Li, Nicolas Hug , pblwk, R. Yao, sclarkson, vfdev, Ștefan Talpalaru

- Python
Published by NicolasHug about 1 year ago

org.pytorch:torchvision_ops - Torchvision 0.20 release

Highlights

Encoding / Decoding images

Torchvision is further extending its encoding/decoding capabilities. For this version, we added a WEBP decoder, and a batch JPEG decoder on CUDA GPUs, which can lead to 10X speed-ups over CPU decoding.

We have also improved the UX of our decoding APIs to be more user-friendly. The main entry point is now torchvision.io.decode_image(), and it can take as input either a path (as str or pathlib.Path), or a tensor containing the raw encoded data.

Detailed changes

Bug Fixes

[datasets] Update URL of SBDataset trainnoval (#8551) [datasets] EuroSAT: fix SSL certificate issues (#8563) [io] Check averagerate availability in video reader (#8548)

New Features

[io] Add batch JPEG GPU decoding (decode_jpeg()) (#8496) [io] Add WEBP image decoder: decode_image(), decode_webp() (#8527, #8612, #8610) [io] Add HEIC and AVIF decoders, only available when building from source (#8597, #8596, #8647, #8613, #8621)

Improvements

[io] Add support for decoding 16bits png (#8524) [io] Allow decoding functions to accept the mode parameter as a string (#8627) [io] Allow decode_image() to support paths (#8624) [io] Automatically send video to CPU in io.writevideo (#8537) [datasets] Better progress bar for file downloading (#8556) [datasets] Add Path type annotation for ImageFolder (#8526) [ops] Register nms and roialign Autocast policy for PyTorch Intel GPU backend (#8541) [transforms] Use Sequence for parameters type checking in transforms.RandomErase (#8615) [transforms] Support v2.functional.gaussian_blur backprop (#8486) [transforms] Expose transforms.v2 utils for writing custom transforms. (#8670) [utils] Fix f-string in color error message (#8639) [packaging] Revamped and improved debuggability of setup.py build (#8535, #8581, #8581, #8582, #8590, #8533, #8528, #8659) [Documentation] Various documentation improvements (#8605, #8611, #8506, #8507, #8539, #8512, #8513, #8583, #8633) [tests] Various tests improvements (#8580, #8553, #8523, #8617, #8518, #8579, #8558, #8617, #8641) [code quality] Various code quality improvements (#8552, #8555, #8516, #8526, #8602, #8615, #8639, #8532) [ci] #8562, #8644, #8592, #8542, #8594, #8530, #8656

Contributors

Adam J. Stewart, AJS Payne, Andreas Floros, Andrey Talman, Bhavay Malhotra, Brizar, deekay42, Ehsan, Feng Yuan, Joseph Macaranas, Martin, Masahiro Hiramori, Nicolas Hug, Nikita Shulga , Sergii Dymchenko, Stefan Baumann, venkatram-dev, Wang, Chuanqi

- Python
Published by NicolasHug over 1 year ago

org.pytorch:torchvision_ops - TorchVision 0.19.1 Release

This is a patch release, which is compatible with PyTorch 2.4.1. There are no new features added.

- Python
Published by atalman over 1 year ago

org.pytorch:torchvision_ops - Torchvision 0.19 release

Highlights

Encoding / Decoding images

Torchvision is extending its encoding/decoding capabilities. For this version, we added a GIF decoder which is available as torchvision.io.decode_gif(raw_tensor), torchvision.io.decode_image(raw_tensor), and torchvision.io.read_image(path_to_image).

We also added support for jpeg GPU encoding in torchvision.io.encode_jpeg(). This is 10X faster than the existing CPU jpeg encoder.

Resizing according to the longest edge of an image

It is now possible to resize images by setting torchvision.transforms.v2.Resize(max_size=N): this will resize the longest edge of the image exactly to max_size, making sure the image dimension don't exceed this value. Read more on the docs!

Detailed changes

Bug Fixes

[datasets] SBDataset: Only download noval file when imageset='trainnoval' (#8475) [datasets] Update the download url in class EMNIST (#8350) [io] Fix compilation error when there is no libjpeg (#8342) [reference scripts] Fix use of cutmix_alpha in classification training references (#8448) [utils] Allow K=1 in draw_keypoints (#8439)

New Features

[io] Add decoder for GIF images (decode_gif(), decode_image(),read_image()) (#8406, #8419) [transforms] Add GaussianNoise transform (#8381)

Improvements

[transforms] Allow v2 Resize to resize longer edge exactly to max_size (#8459) [transforms] Add min_area parameter to SanitizeBoundingBox (#7735) [transforms] Make adjust_hue() work with numpy 2.0 (#8463) [transforms] Enable one-hot-encoded labels inMixUp andCutMix (#8427) [transforms] Create kernel on-device for transforms.functional.gaussian_blur (#8426) [io] Adding GPU acceleration to encode_jpeg (10X faster than CPU encoder) (#8391) [io] read_video: accept BytesIO objects on pyav backend (#8442) [io] Add compatibility with FFMPEG 7.0 (#8408) [datasets] Add extra to install gdown (#8430) [datasets] Support encoded RLE format in forCOCO segmentations (#8387) [datasets] Added binary cat vs dog classification target type to Oxford pet dataset (#8388) [datasets] Return labels for FER2013 if possible (#8452) [ops] Force use of torch.compile on deterministic roi_align implementation (#8436) [utils] add float support to utils.draw_bounding_boxes() (#8328) [featureextraction] Add concreteargs to feature extraction tracing. (#8393) [Docs] Various documentation improvements (#8429, #8467, #8469, #8332, #8262, #8341, #8392, #8386, #8385, #8411). [Tests] Various testing improvements (#8454, #8418, #8480, #8455) [Code quality] Various code quality improvements (#8404, #8402, #8345, #8335, #8481, #8334, #8384, #8451, #8470, #8413, #8414, #8416, #8412)

Contributors

Adam J. Stewart ahmadsharif1, AJS Payne, Andrew Lingg, Andrey Talman, Anner, Antoine Broyelle, cdzhan, deekay42, drhead, Edward Z. Yang, Emin Orhan, Fangjun Kuang, G, haarisr, Huy Do, Jack Newsom, JavaZero, Mahdi Lamb, Mantas, Nicolas Hug, Nicolas Hug , nihui, Richard Barnes , Richard Zou, Richie Bendall, Robert-André Mauchin, Ross Wightman, Siddarth Ijju, vfdev

- Python
Published by NicolasHug over 1 year ago

org.pytorch:torchvision_ops - TorchVision 0.18.1 Release

This is a patch release, which is compatible with PyTorch 2.3.1. There are no new features added.

- Python
Published by atalman over 1 year ago

org.pytorch:torchvision_ops - TorchVision 0.18 Release

BC-Breaking changes

[datasets] gdown is now a required dependency for downloading datasets that are on Google Drive. This change was actually introduced in 0.17.1 (repeated here for visibility) (#8237) [datasets] The StanfordCars dataset isn’t available for download anymore. Please follow these instructions to manually download it (#8309, #8324) [transforms] to_grayscale and corresponding transform now always return 3 channels when num_output_channels=3 (#8229)

Bug Fixes

[datasets] Fix download URL of EMNIST dataset (#8350) [datasets] Fix root path expansion in Kitti dataset (#8164) [models] Fix default momentum value of BatchNorm2d in MaxViT from 0.99 to 0.01 (#8312) [reference scripts] Fix CutMix and MixUp arguments (#8287) [MPS, build] Link essential libraries in cmake (#8230) [build] Fix build with ffmpeg 6.0 (#8096)

New Features

[transforms] New GrayscaleToRgb transform (#8247) [transforms] New JPEG augmentation transform (#8316)

Improvements

[datasets, io] Added pathlib.Path support to datasets and io utilities. (#8196, #8200, #8314, #8321) [datasets] Added allow_empty parameter to ImageFolder and related utils to support empty classes during image discovery (#8311) [datasets] Raise proper error in CocoDetection when a slice is passed (#8227) [io] Added support for EXIF orientation in JPEG and PNG decoders (#8303, #8279, #8342, #8302) [io] Avoiding unnecessary copies on io.VideoReader with pyav backend (#8173) [transforms] Allow SanitizeBoundingBoxes to sanitize more than labels (#8319) [transforms] Add sanitize_bounding_boxes kernel/functional (#8308) [transforms] Make perspective more numerically stable (#8249) [transforms] Allow 2D numpy arrays as inputs for to_image (#8256) [transforms] Speed-up rotate for 90, 180, 270 degrees (#8295) [transforms] Enabled torch compile on affine transform (#8218) [transforms] Avoid some graph breaks in transforms (#8171) [utils] Add float support to draw_keypoints (#8276) [utils] Add visibility parameter to draw_keypoints (#8225) [utils] Add float support to draw_segmentation_masks (#8150) [utils] Better show overlap section of masks in draw_segmentation_masks (#8213) [Docs] Various documentation improvements (#8341, #8332, #8198, #8318, #8202, #8246, #8208, #8231, #8300, #8197) [code quality] Various code quality improvements (#8273, #8335, #8234, #8345, #8334, #8119, #8251, #8329, #8217, #8180, #8105, #8280, #8161, #8313)

Contributors

Adam Dangoor Ahmad Sharif , ahmadsharif1, Andrey Talman, Anner, anthony-cabacungan, Arun Sathiya, Brizar, Brizar , cdzhan, Danylo Baibak, Huy Do, Ivan Magazinnik, JavaZero, Johan Edstedt, Li-Huai (Allan) Lin, Mantas, Mark Harfouche, Mithra, Nicolas Hug, Nicolas Hug , nihui, Philip Meier, Philip Meier , RazaProdigy , Richard Barnes , Riza Velioglu, sam-watts, Santiago Castro, Sergii Dymchenko, Syed Raza, talcs, Thien Tran, Thien Tran , TilmannR, Tobias Fischer, vfdev, vfdev , Zhu Lin Ch'ng, Zoltán Böszörményi.

- Python
Published by NicolasHug almost 2 years ago

org.pytorch:torchvision_ops - TorchVision 0.17.2 Release

This is a patch release, which is compatible with PyTorch 2.2.2. There are no new features added.

- Python
Published by atalman almost 2 years ago

org.pytorch:torchvision_ops - TorchVision 0.17.1 Release

This is a patch release, which is compatible with PyTorch 2.2.1.

Bug Fixes

Add gdown dependency to support downloading datasets from Google Drive (https://github.com/pytorch/vision/pull/8237)
Fix silent correctness with convert_bounding_box_format when passing string parameters (https://github.com/pytorch/vision/issues/8258)

- Python
Published by huydhn almost 2 years ago

org.pytorch:torchvision_ops - TorchVision 0.17 Release

Highlights

The V2 transforms are now stable!

The torchvision.transforms.v2 namespace was still in BETA stage until now. It is now stable! Whether you’re new to Torchvision transforms, or you’re already experienced with them, we encourage you to start with Getting started with transforms v2 in order to learn more about what can be done with the new v2 transforms.

Browse our main docs for general information and performance tips. The available transforms and functionals are listed in the API reference. Additional information and tutorials can also be found in our example gallery, e.g. Transforms v2: End-to-end object detection/segmentation example or How to write your own v2 transforms.

Towards `torch.compile()` support

We are progressively adding support for torch.compile() to torchvision interfaces, reducing graph breaks and allowing dynamic shape.

The torchvision ops (nms, [ps_]roi_align, [ps_]roi_pool and deform_conv_2d) are now compatible with torch.compile and dynamic shapes.

On the transforms side, the majority of low-level kernels (like resize_image() or crop_image()) should compile properly without graph breaks and with dynamic shapes. We are still addressing the remaining edge-cases, moving up towards full functional support and classes, and you should expect more progress on that front with the next release.

Detailed Changes

Breaking changes / Finalizing deprecations

[transforms] We changed the default of the antialias parameter from None to True, in all transforms that perform resizing. This change of default has been communicated in previous versions, and should drastically reduce the amount of bugs/surprises as it aligns the tensor backend with the PIL backend. Simply put: from now on, antialias is always applied when resizing (with bilinear or bicubic modes), whether you're using tensors or PIL images. This change only affects the tensor backend, as PIL always applies antialias anyway. (#7949)
[transforms] We removed the torchvision.transforms.functional_tensor.py and torchvision.transforms.functional_pil.py modules, as these had been deprecated for a while. Use the public functionals from torchvision.transforms.v2.functional instead. (#7953)
[video] Remove deprecated path parameter to VideoReader and made src mandatory (#8125)
[transforms] to_pil_image now provides the same output for equivalent numpy arrays and tensor inputs (#8097)

Bug Fixes

[datasets] Fix root path expansion in datasets.Kitti (#8165) [transforms] allow sequence fill for v2 AA scripted (#7919) [reference scripts] Fix quantized references (#8073) [reference scripts] Fix IoUs reported in segmentation references (#7916)

New Features

[datasets] add Imagenette dataset (#8139)

Improvements

[transforms] The v2 transforms are now officially stable and out of BETA stage (#8111) [ops] The ops ([ps_]roi_align, ps_[roi_pool], deform_conv_2d) are now compatible with torch.compile and dynamic shapes (#8061, #8049, #8062, #8063, #7942, #7944) [models] Allow custom atrous_rates for deeplabv3mobilenetv3_large (#8019) [transforms] allow float fill for integer images in F.pad (#7950) [transforms] allow len 1 sequences for fill with PIL (#7928) [transforms] allow size to be generic Sequence in Resize (#7999) [transforms] Making root parameter optional for Vision Dataset (#8124) [transforms] Added support for tv tensors in torch compile for func ops (#8110) [transforms] Reduced number of graphs for compiled resize (#8108) [misc] Various fixes for S390x support (#8149) [Docs] Various Documentation enhancements (#8007, #8014, #7940, #7989, #7993, #8114, #8117, #8121, #7978, #8002, #7957, #7907, #8000, #7963) [Tests] Various test enhancements (#8032, #7927, #7933, #7934, #7935, #7939, #7946, #7943, #7968, #7967, #8033, #7975, #7954, #8001, #7962, #8003, #8011, #8012, #8013, #8023, #7973, #7970, #7976, #8037, #8052, #7982, #8145, #8148, #8144, #8058, #8057, #7961, #8132, #8133, #8160) [Code Quality] (#8077, #8070, #8004, #8113,

Contributors

Aleksei Nikiforov. Alex Wei, Andrey Talman, Chunyuan WU, CptCaptain, Edward Z. Yang, Gu Wang, Haochen Yu, Huy Do, Jeff Daily, Josh Levy-Kramer, moto, Nicolas Hug, NVS Abhilash, Omkar Salpekar, Philip Meier, Sergii Dymchenko, Siddharth Singh, Thiago Crepaldi, Thomas Fritz, TilmannR, vfdev-5, Zeeshan Khan Suri.

- Python
Published by NicolasHug about 2 years ago

org.pytorch:torchvision_ops - TorchVision 0.16.2 Release

This is a patch release, which is compatible with PyTorch 2.1.2. There are no new features added.

- Python
Published by huydhn about 2 years ago

org.pytorch:torchvision_ops - TorchVision 0.16.1 Release

This is a minor release that only contains bug-fixes

Bug Fixes

[models] Fix download of efficientnet weights (#8036)
[transforms] Fix v2 transforms in spawn multi-processing context (#8067)

- Python
Published by NicolasHug over 2 years ago

org.pytorch:torchvision_ops - TorchVision 0.16 - Transforms speedups, CutMix/MixUp, and MPS support!

Highlights

[BETA] Transforms and augmentations

sphx_glr_plot_transforms_getting_started_004

Major speedups

The new transforms in torchvision.transforms.v2 support image classification, segmentation, detection, and video tasks. They are now 10%-40% faster than before! This is mostly achieved thanks to 2X-4X improvements made to v2.Resize(), which now supports native uint8 tensors for Bilinear and Bicubic mode. Output results are also now closer to PIL's! Check out our performance recommendations to learn more.

Additionally, torchvision now ships with libjpeg-turbo instead of libjpeg, which should significantly speed-up the jpeg decoding utilities (read_image, decode_jpeg), and avoid compatibility issues with PIL.

CutMix and MixUp

Long-awaited support for the CutMix and MixUp augmentations is now here! Check our tutorial to learn how to use them.

Towards stable V2 transforms

In the previous release 0.15 we BETA-released a new set of transforms in torchvision.transforms.v2 with native support for tasks like segmentation, detection, or videos. We have now stabilized the design decisions of these transforms and made further improvements in terms of speedups, usability, new transforms support, etc.

We're keeping the torchvision.transforms.v2 and torchvision.tv_tensors namespaces as BETA until 0.17 out of precaution, but we do not expect disruptive API changes in the future.

Whether you’re new to Torchvision transforms, or you’re already experienced with them, we encourage you to start with Getting started with transforms v2 in order to learn more about what can be done with the new v2 transforms.

[BETA] MPS support

The nms and roi-align kernels (roi_align, roi_pool, ps_roi_align, ps_roi_pool) now support MPS. Thanks to Li-Huai (Allan) Lin for this contribution!

Detailed Changes

Deprecations / Breaking changes

All changes below happened in the transforms.v2 and datapoints namespaces, which were BETA and protected with a warning. We do not expect other disruptive changes to these APIs moving forward!

[transforms.v2] to_grayscale() is not deprecated anymore (#7707) [transforms.v2] Renaming: torchvision.datapoints.Datapoint -> torchvision.tv_tensors.TVTensor (#7904, #7894) [transforms.v2] Renaming: BoundingBox -> BoundingBoxes (#7778) [transforms.v2] Renaming: BoundingBoxes.spatial_size -> BoundingBoxes.canvas_size (#7734) [transforms.v2] All public method on TVTensor classes (previously: Datapoint classes) were removed [transforms.v2] transforms.v2.utils is now private. (#7863) [transforms.v2] Remove wrap_like class method and add tv_tensors.wrap() function (#7832)

New Features

[transforms.v2] Add support for MixUp and CutMix (#7731, #7784) [transforms.v2] Add PermuteChannels transform (#7624) [transforms.v2] Add ToPureTensor transform (#7823) [ops] Add MPS kernels for nms and roi ops (#7643)

Improvements

[io] Added support for CMYK images in decode_jpeg (#7741) [io] Package torchvision with libjpeg-turbo instead of libjpeg (#7672, #7840) [models] Downloaded weights are now sha256-validated (#7219) [transforms.v2] Massive Resize speed-up by adding native uint8 support for bilinear and bicubic modes (#7557, #7668) [transforms.v2] Enforce pickleability for v2 transforms and wrapped datasets (#7860) [transforms.v2] Allow catch-all "others" key in fill dicts. (#7779) [transforms.v2] Allow passthrough for Resize (#7521) [transforms.v2] Add scale option to ToDtype. Remove ConvertDtype. (#7759, #7862) [transforms.v2] Improve UX for Compose (#7758) [transforms.v2] Allow users to choose whether to return TVTensor subclasses or pure Tensor (#7825) [transforms.v2] Remove import-time warning for v2 namespaces (#7853, 7897) [transforms.v2] Speedup hsv2rgb (#7754) [models] Add filter parameters to list_models() (#7718) [models] Assert RAFT input resolution is 128 x 128 or higher (#7339) [ops] Replaced gpuAtomicAdd by fastAtomicAdd (#7596) [utils] Add GPU support for draw_segmentation_masks (#7684) [ops] Add deterministic, pure-Python roi_align implementation (#7587) [tvtensors] Make TVTensors deepcopyable (#7701) [datasets] Only return small set of targets by default from dataset wrapper (#7488) [references] Added support for v2 transforms and tensors / `tvtensors` backends (#7732, #7511, #7869, #7665, #7629, #7743, #7724, #7742) [doc] A lot of documentation improvements (#7503, #7843, #7845, #7836, #7830, #7826, #7484, #7795, #7480, #7772, #7847, #7695, #7655, #7906, #7889, #7883, #7881, #7867, #7755, #7870, #7849, #7854, #7858, #7621, #7857, #7864, #7487, #7859, #7877, #7536, #7886, #7679, #7793, #7514, #7789, #7688, #7576, #7600, #7580, #7567, #7459, #7516, #7851, #7730, #7565, #7777)

Bug Fixes

[datasets] Fix split=None in MovingMNIST (#7449) [io] Fix heap buffer overflow in decode_png (#7691) [io] Fix blurry screen in video decoder (#7552) [models] Fix weight download URLs for some models (#7898) [models] Fix ShuffleNet ONNX export (#7686) [models] Fix detection models with pytorch 2.0 (#7592, #7448) [ops] Fix segfault in DeformConv2d when mask is None (#7632) [transforms.v2] Stricter SanitizeBoundingBoxes labels_getter heuristic (#7880) [transforms.v2] Make sure RandomPhotometricDistort transforms all images the same (#7442) [transforms.v2] Fix v2.Lambda’s transformed types (#7566) [transforms.v2] Don't call round() on float images for Resize (#7669) [transforms.v2] Let SanitizeBoundingBoxes preserve output type (#7446) [transforms.v2] Fixed int type support for sigma in GaussianBlur (#7887) [transforms.v2] Fixed issue with jitted AutoAugment transforms (#7839) [transforms] Fix Resize pass-through logic (#7519) [utils] Fix color in draw_segmentation_masks (#7520)

Others

[tests] Various test improvements / fixes (#7693, #7816, #7477, #7783, #7716, #7355, #7879, #7874, #7882, #7447, #7856, #7892, #7902, #7884, #7562, #7713, #7708, #7712, #7703, #7641, #7855, #7842, #7717, #7905, #7553, #7678, #7908, #7812, #7646, #7841, #7768, #7828, #7820, #7550, #7546, #7833, #7583, #7810, #7625, #7651) [CI] Various CI improvements (#7485, #7417, #7526, #7834, #7622, #7611, #7872, #7628, #7499, #7616, #7475, #7639, #7498, #7467, #7466, #7441, #7524, #7648, #7640, #7551, #7479, #7634, #7645, #7578, #7572, #7571, #7591, #7470, #7574, #7569, #7435, #7635, #7590, #7589, #7582, #7656, #7900, #7815, #7555, #7694, #7558, #7533, #7547, #7505, #7502, #7540, #7573) [Code Quality] Various code quality improvements (#7559, #7673, #7677, #7771, #7770, #7710, #7709, #7687, #7454, #7464, #7527, #7462, #7662, #7593, #7797, #7805, #7786, #7831, #7829, #7846, #7806, #7814, #7606, #7613, #7608, #7597, #7792, #7781, #7685, #7702, #7500, #7804, #7747, #7835, #7726, #7796)

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release: Adam J. Stewart, Aditya Oke , Andrey Talman, Camilo De La Torre, Christoph Reich, Danylo Baibak, David Chiu, David Garcia, Dennis M. Pöpperl, Dhuige, Duc Mguyen, Edward Z. Yang, Eric Sauser , Fansure Grin, Huy Do, Illia Vysochyn, Johannes, Kai Wana, Kobrin Eli, kurtamohler, Li-Huai (Allan) Lin, Liron Ilouz, Masahiro Hiramori, Mateusz Guzek, Max Chuprov, Minh-Long Luu (刘明龙), Minliang Lin, mpearce25, Nicolas Granger, Nicolas Hug , Nikita Shulga, Omkar Salpekar, Paul Mulders, Philip Meier , ptrblck, puhuk, Radek Bartoň, Richard Barnes , Riza Velioglu, Sahil Goyal, Shu, Sim Sun, SvenDS9, Tommaso Bianconcini, Vadim Zubov, vfdev-5

- Python
Published by NicolasHug over 2 years ago

org.pytorch:torchvision_ops - TorchVision 0.15.2 Release

This is a minor release, which is compatible with PyTorch 2.0.1 and contains some minor bug fixes.

Highlights

Bug Fixes

Move parameter sampling of v2.RandomPhotometricDistort into getparams https://github.com/pytorch/vision/pull/7442
Fix split parameter for MovingMNIST https://github.com/pytorch/vision/pull/7449
Prevent unwrapping in v2.SanitizeBoundingBoxes https://github.com/pytorch/vision/pull/7446

- Python
Published by atalman almost 3 years ago

org.pytorch:torchvision_ops - TorchVision 0.15 - New transforms API!

Highlights

[BETA] New transforms API

TorchVision is extending its Transforms API! Here is what’s new: - You can use them not only for Image Classification but also for Object Detection, Instance & Semantic Segmentation and Video Classification. - You can use new functional transforms for transforming Videos, Bounding Boxes and Segmentation Masks.

The API is completely backward compatible with the previous one, and remains the same to assist the migration and adoption. We are now releasing this new API as Beta in the torchvision.transforms.v2 namespace, and we would love to get early feedback from you to improve its functionality. Please reach out to us if you have any questions or suggestions.

```py import torchvision.transforms.v2 as transforms

Exactly the same interface as V1:

trans = transforms.Compose([ transforms.ColorJitter(contrast=0.5), transforms.RandomRotation(30), transforms.CenterCrop(480), ]) imgs, bboxes, masks, labels = trans(imgs, bboxes, masks, labels) ```

You can read more about these new transforms in our docs, and you can also check out our examples:

Note that this API is still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes.

[BETA] New Video Swin Transformer

We added a Video SwinTransformer model is based on the Video Swin Transformer paper.

```py import torch from torchvision.models.video import swin3d_t

video = torch.rand(1, 3, 32, 800, 600)

or swin3db, swin3ds

model = swin3dt(weights="DEFAULT") model.eval() with torch.inferencemode(): prediction = model(video) print(prediction) ```

The model has the following accuracies on the Kinetics-400 dataset:

| Model | Acc@1 | Acc@5 | | --- | ----------- | --------- | |swin3dt| 77.7 | 93.5 | |swin3ds| 79.5 | 94.1 | |swin3d_b| 79.4 | 94.4 |

We would like to thank oke-aditya for this contribution.

Detailed Changes (PRs)

BC-breaking changes

[models] Fixed a bug inside ops.MLP when backpropagating with dropout>0 by implicitly setting the inplace argument of nn.Dropout to False (#7209) [models, transforms] remove functionality scheduled for 0.15 after deprecation (#7176) We removed deprecated functionalities according to the deprecation cycle: gen_bar_updater, model_urls/quant_model_urls in models.

Deprecations

[transforms] Change default of antialias parameter from None to 'warn' (#7160) For all transforms / functionals that have the interpolate parameter, we change its current default from None to "warn" value that behaves exactly like None, but raises a warning indicating users to explicitly set either True, False or None. In v0.17.0 we plan remove "warn" and set the default to True.

[transforms] Deprecate functionalpil and functionaltensor and make them private (#7269) Since v0.15.0 torchvision.transforms.functional_pil and torchvision.transforms.functional_tensor have become private and will be removed in v0.17.0. Please use torchvision.transforms.functional or torchvision.transforms.v2.functional instead.

[transforms] Undeprecate PIL int constants for interpolation (#7241) We restored the support for integer interpolation mode (Pillow constants) which was deprecated since v0.13.0 (as PIL un-deprecated those as well).

New Features

[transforms] New transforms API (see highlight) [models] Add Video SwinTransformer (see highlight) (#6521)

Improvements

[transforms] introduce nearest-exact interpolation (#6754) [transforms] add sequence fill support for ElasticTransform (#7141) [transforms] perform out of bounds check for single values and two tuples in ColorJitter (#7133) [datasets] Fixes use download of SBU dataset (#7046) (#7051) [hub] Add video models to torchhub (#7083) [hub] Expose maxvit and swin_v2 models to torchhub (#7078) [io] suppress warning in VideoReader (#6976, 6971) [io] Set pytorch vision decoder probesize for getting stream info based on the value from decode setting (#6900) (#6950) [io] improve warning message for missing image extension (#7150) [io] Read video from memory newapi (#6771) [models] Allow dropout overwrites on EfficientNet (#7031) [models] Don't use named args in MHA calls to allow applying pytorch forward hooks to VIT (#6956) [onnx] Support exporting RoiAlign align=True to ONNX with opset 16 (#6685) [ops] Handle invalid reduction values (#6675) [datasets] Add MovingMNIST dataset (#7042) Add torchvision maintainers guide (#7109) [Documentation] Various doc improvements (#7041, #6947, #6690, #7142, #7156, #7025, #7048, #7074, #6936, #6694, #7161, #7164, #6912, #6854, #6926, #7065, #6813) [CI] Various CI improvements (#6864, #6863, #6855, #6856, #6803, #6893, #6865, #6804, #6866, #6742, #7273, #6999, #6713, #6972, #6954, #6968, #6987, #7004, #7010, #7014, #6915, #6797, #6759, #7060, #6857, #7212, #7199, #7186, #7183, #7178, #7163, #7181, #6789, #7110, #7088, #6955, #6788, #6970) [tests] Various tests improvements (#7020, #6939, #6658, #7216, #6996, #7363, #7379, #7218, #7286, #6901, #7059, #7202, #6708, #7013, #7206, #7204, #7233)

Bug Fixes

[datasets] fix MNIST byte flipping (#7081) [models] properly support deepcopying and serialization of model weights (#7107) [models] Use inplace=None as default in ops.MLP (#7209) [models] Fix dropout issue in swin transformers (#7224) [reference scripts] Fix quantized classif reference - missing args (#7072) [models, tests] [FBcode->GH] Fix GRACEHOPPER file internal discovery (#6719) [transforms] Replace getbands() with `getimagenumchannels()(#6941) [transforms] Switchview()withreshape()` on equalize (#6772) [transforms] add sequence fill support for ElasticTransform (#7141) [transforms] make RandomErasing scriptable for integer value (#7134) [video] fix bug in output format for pyav (#6672) [video, datasets] [bugfix] Fix the output format for VideoClips.subset (#6700) [onnx] Fix dtype for NonMaxSuppression (#7056)

Code Quality

[datasets] Remove unused import (#7245) [models] Fix error message typo (#6682) [models] make weights deepcopyable (#6883) [models] Fix missing f-string prefix in error message (#6684) [onnx] [ONNX] Rephrase ONNX RoiAlign warning for aligned=True (#6704) [onnx] [ONNX] misc improvements (#7249) [ops] Raise kernel launch errors instead of just print error message in cuda ops (#7080) [ops, tests] Remove torch.jit.fuser("fuser2") in test (#7069) [tests] replace assert torch.allclose with torch.testing.assertallclose (#6895) [transforms] Remove old TODO about using `logapiusage_once()(#7277) [transforms] Fixed repr for ElasticTransform (#6758) [transforms] Useis False` for some antialias checks (#7234) [datasets, models] Various type-hints improvements (#6844, #6929, #6843, #7087, #6735, #6845, #6846) [all] switch to C++17 following the core library (#7116)

Prototype

Most of these PRs (not all) relate to the transforms V2 work (#7122, #7120, #7113, #7270, #7037, #6665, #6944, #6919, #7033, #7138, #6718, #6068, #7194, #6997, #6647, #7279, #7232, #7225, #6663, #7235, #7236, #7275, #6791, #6786, #7203, #7009, #7278, #7238, #7230, #7118, #7119, #6876, #7190, #6995, #6879, #6904, #6921, #6905, #6977, #6714, #6924, #6984, #6631, #7276, #6757, #7227, #7197, #7170, #7228, #7246, #7255, #7254, #7253, #7248, #7256, #7257, #7252, #6724, #7215, #7260, #7261, #7244, #7271, #7231, #6738, #7268, #7258, #6933, #6891, #6890, #7012, #6896, #6881, #6880, #6877, #7045, #6858, #6830, #6935, #6938, #6914, #6907, #6897, #6903, #6859, #6835, #6837, #6807, #6776, #6784, #6795, #7135, #6930, #7153, #6762, #6681, #7139, #6831, #6826, #6821, #6819, #6820, #6805, #6811, #6783, #6978, #6667, #6741, #6763, #6774, #6748, #6749, #6722, #6756, #6712, #6733, #6736, #6874, #6767, #6902, #6847, #6851, #6777, #6770, #6800, #6812, #6702, #7223, #6906, #7226, #6860, #6934, #6726, #6730, #7196, #7211, #7229, #7177, #6923, #6949, #6913, #6775, #7091, #7136, #7154, #6833, #6824, #6785, #6710, #6653, #6751, #6503, #7266, #6729, #6989, #7002, #6892, #6888, #6894, #6988, #6940, #6942, #6945, #6983, #6773, #6832, #6834, #6828, #6801, #7084)

Contributors

Aditya Gandhamal, Aditya Oke, Aidyn-A, Akira Noda, Andrey Talman, Bowen Bao, Bruno Korbar, Chen Liu, cyy, David Berard, deepsghimire, Erjia Guan, F-G Fernandez, Jithun Nair, Joao Gomes, John Detloff, Justin Chu, Karan Desai, lezcano, mpearce25, Nghia, Nicolas Hug, Nikita Shulga, nps1ngh, Omkar Salpekar, Philip Meier, Robert Perrotta, RoiEX, Samantha Andow, Sergii Dymchenko, shunsuke yokokawa, Sim Sun, Toni Blaslov, toni057, Vasilis Vryniotis, vfdev-5, Vladislav Sovrasov, vsuryamurthy, Yosua Michael Maranatha, Yuxin Wu

- Python
Published by NicolasHug almost 3 years ago

org.pytorch:torchvision_ops - TorchVision 0.14.1 Release

This is a minor release, which is compatible with PyTorch 1.13.1. There are no new features added.

- Python
Published by atalman about 3 years ago

org.pytorch:torchvision_ops - TorchVision 0.14, including new model registration API, new models, weights, augmentations, and more

Highlights

[BETA] New Model Registration API

Following up on the multi-weight support API that was released on the previous version, we have added a new model registration API to help users retrieve models and weights. There are now 4 new methods under the torchvision.models module: get_model, get_model_weights, get_weight, and list_models. Here are examples of how we can use them:

```python import torchvision from torchvision.models import getmodel, getmodelweights, listmodels

max_params = 5000000

tinymodels = [] for modelname in listmodels(module=torchvision.models): weightsenum = getmodelweights(modelname) if len([w for w in weightsenum if w.meta["numparams"] <= maxparams]) > 0: tinymodels.append(modelname)

print(tiny_models)

['mnasnet05', 'mnasnet075', 'mnasnet10', 'mobilenetv2', ...]

model = getmodel(tinymodels[0], weights="DEFAULT") print(sum(x.numel() for x in model.state_dict().values()))

2239188

```

As of now, this API is still beta and there might be changes in the future in order to improve its usability based on your feedback.

New Architecture and Model Variants

Classification Models

We’ve added the Swin Transformer V2 architecture along with pre-trained weights for its tiny/small/base variants. In addition, we have added support for the MaxViT transformer. Here is an example on how to use the models:

```python import torch from torchvision.models import *

image = torch.rand(1, 3, 224, 224) model = swinv2t(weights="DEFAULT").eval()

model = maxvit_t(weights="DEFAULT").eval()

prediction = model(image) ```

Here is the table showing the accuracy of the models tested on ImageNet1K dataset.

Model	Acc@1	Acc@1 change over V1	Acc@5	Acc@5 change over V1
swin_v2_t	82.072	+0.598	96.132	+0.356
swin_v2_s	83.712	+0.516	96.816	+0.456
swin_v2_b	84.112	+0.530	96.864	+0.224
maxvit_t	83.700	-	96.722	-

We would like to thank Ren Pang and Teodor Poncu for contributing the 2 models to torchvision.

[BETA] Video Classification Model

We added two new video classification models, MViT and S3D. MViT is a state of the art video classification transformer model which has 80.757% accuracy on Kinetics400 dataset, while S3D is a relatively small model with good accuracy for its size. These models can be used as follows:

```python import torch from torchvision.models.video import *

video = torch.rand(3, 32, 800, 600) model = mvitv2s(weights="DEFAULT")

model = s3d(weights="DEFAULT")

model.eval() prediction = model(images) ```

Here is the table showing the accuracy of the new video classification models tested in the Kinetics400 dataset.

Model	Acc@1	Acc@5
mvit_v1_b	81.474	95.776
mvit_v2_s	83.196	96.36
s3d	83.582	96.64

We would like to thank Haoqi Fan, Yanghao Li, Christoph Feichtenhofer and Wan-Yen Lo for their work on PyTorchVideo and their support during the development of the MViT model. We would like to thank Sophia Zhi for her contribution implementing the S3D model in torchvision.

New Primitives & Augmentations

In this release we’ve added the SimpleCopyPaste augmentation in our reference scripts and we up-streamed the PolynomialLR scheduler to PyTorch Core. We would like to thank Lezwon Castelino and Federico Pozzi for their contributions. We are continuing our efforts to modernize TorchVision by adding more SoTA primitives, Augmentations and architectures with the help of our community. If you are interested in contributing, have a look at the following issue.

Upcoming Prototype APIs

We are currently working on extending our existing Transforms and Functional API to provide native support for Video, Object Detection, Semantic and Instance Segmentation. This will enable us to offer better support to the existing Computer Vision tasks and make importable from the TorchVision binary SoTA augmentations such as MixUp, CutMix, Large Scale Jitter and SimpleCopyPaste. The API is still under development and thus was not included in the release but you can read more about it on our blogpost and provide your feedback on the dedicated Github issue.

Backward Incompatible Changes

We’ve removed some APIs that have been deprecated since version 0.12 (or before). Here is the list of things that we removed and their replacement:

The Kinetics400 class has been removed. Users must now use the newer Kinetics class which is a direct replacement.
The class _DeprecatedConvBNAct, ConvBNReLU, and ConvBNActivation were removed from torchvision.models.mobilenetv2 and are replaced with the more generic Conv2dNormActivation class.
The torchvision.models.mobilenetv3.SqueezeExcitation has been removed in favor of torchvision.ops.SqueezeExcitation.
The class methods convert_to_roi_format, infer_scale, setup_scales from torchvision.ops.MultiScaleRoiAlign have been removed.
We have removed the resample and fillcolor parameters from the Transforms API. They have been replaced with interpolation and fill respectively.
We’ve removed the range parameter from torchvision.utils.make_grid as it was replaced by the value_range parameter to avoid shadowing the Python built-in method.

Detailed Changes (PRs)

Deprecations

[models] Remove cpp model in v0.14 due to deprecation (#6632) [utils, ops, transforms, models, datasets] Remove deprecated APIs for 0.14 (#6258)

New Features

[datasets] Add various Stereo Matching datasets (#6345, #6346, #6311, #6347, #6349, #6348, #6350, #6351) [models] Add the S3D architecture to TorchVision (#6412, #6537) [models] add crestereo implementation (#6310, #6629) [models] MaxVit model (#6342) [models] Make getmodelbuilder public (#6560) [models] Add registration mechanism for models (#6333, #6369) [models] Add MViT architecture in TorchVision for both V1 and V2 (#6198, #6373) [models] Add SwinV2 mode variant (#6246, #6266) [reference scripts] Add stereo matching reference scripts (#6549, #6554, #6605) [transforms] Added elastic transform in torchvision.transforms (#4938) [build] Add M1 binary builds (#5948, #6135, #6140, #6110, #6132, #6324, #6122, #6409)

Improvements

[build] Various torchvision binary build improvements (#6396, #6201, #6230, #6199) [build] Install NVJPEG on Windows for 11.6 and 11.7 CUDA (#6578) [models] Change weights return type to Mapping in models api (#6097) [models] Vectorize box decoding and encoding in FCOS (#6203, #6278) [ci] Add CUDA 11.7 builds (#6425) [ci] Various CI improvements (#6590, #6290, #6170, #6218) [documentation] Various documentations improvements (#6276, #6163, #6450, #6294, #6572, #6176, #6340, #6314, #6427, #6536, #6215, #6150) [documentation] Add new .. betastatus:: directive and document Beta APIs (#6115) [hub] Expose on Hub the public methods of the registration API (#6364) [io, documentation] DOC: add limitation of decodejpeg in the function docstring (#6637) [models] Make the assert message more verbose in vision transformer (#6583) [ops] Generalize ConvNormActivation function to accept tuple for some parameters (#6251) [reference scripts] Update the dataset cache to factor input parameters (#6234) [reference scripts] Adding video level accuracy for videoclassification reference script (#6241) [reference scripts] refactor: replace LambdaLR with PolynomialLR in segmentation training script (#6405, #6436) [reference scripts, documentation] Introduce resize params, fix lr estimation, update docs. (#6444) [reference scripts, transforms] Add SimpleCopyPaste augmentation (#5825) [rocm, ci] Update to rocm5.2 wheels (#6571) [tests] Various tests improvements (#6601, #6380, #6497, #6248, #6660, #6027, #6226, #6594, #6747, #6272) [tests] Skip big models on CI tests (#6539, #6197, #6573) [transforms] Added antialias arg to resized crop transform and op (#6193) [transforms] Refactored and modified private api for resize functional op (#6191) [utils] Throw ValueError in draw bounding boxes for invalid boxes (#6123) [utils] Extend _log_api_usage_once to work for overwritten classes (#6237) [video] Add more logging information for decoder (#6108) [video] [FBcode->GH] Handle images with AVPIXFMT_PAL8 pixel format in decoder callback (#6359) [io] Add an option to skip packets with empty data (#6442) [datasets] Put back CelebA download (#6147) [datasets, tests] Update link to download SBU dataset. Enable the test again (#6638)

Bug Fixes

[build] Set MACOSXDEPLOYMENTTARGET=10.9 for binary jobs (#6298) [ci] Fixing issue with setupenv.sh in docker (#6106) [datasets] swap MD5 checksums of PCAM val and test split (#6644) [documentation] fix example galleries in documentation (#6701) [hub] Add missing `resnext10164x4dtohubconf.py(#6228) [io] Fix out-of-bounds read in decode_png (#6456) [models] Fix swapped width and height in DefaultBoxGenerator (#6551) [models] Fix the error message ofovewritevalueparam(#6585) [models] Add missinghandlelegacyinterface()` calls (#6565) [models] Fix resnet model by checking if normlayer weight is None before init (#6082) [models] Adding _log_api_usage_once to Swin's reusable components. (#6174) [models] Move out the pad operation from PatchMerging in swin transformer to make it fx compatible (#6252) [models] Add missing _version to the MLPBlock (#6113) [ops] Fix d/c IoU for different batch sizes (#6338) [ops] update roipool to make it torch fx traceable (#6501) [ops] Fix typing jit issue on RoIPool and RoIAlign (#6397) [reference scripts] Fix copypaste collate pickle issues (#6181) [reference scripts] Remove the unused/buggy --train-center-crop flag from Classification preset (#6642) [tests] Add .float() before .mean() on testbackboneutils.py because .mean() dont accept integer dtype (#6090) [transforms] Update pilconstants.py (#6154) [transforms] Fixed issue with F.crop when cropping outside the input image (#6615) [transforms] Bugfix for accimage test on functionalpil.resize image (#6208) [transforms] Fixed error condition in RandomCrop (#6548) [video] [FBcode->GH] Move func calls outside of *CHECK* in io decoder (#6357) [video] [bugfix] Fix the output format for VideoClips.subset (#6700) (#6706) [video] fix bug in output format for pyav (#6672) (#6703) [ci] Fix for cygpath windows issue (#6513) [ops] Replaced CHECK_ by TORCHCHECK (#6322) [build] Fix typo in GHA nightly build condition (#6158)

Code Quality

[ci, test] Improvements on CI and test code quality (#6413, #6303, #6652, #6360, #6493, #6146, #6593, #6297, #6678, #6389) [ci] Upgrade usort to 1.0.2 and black to 22.3.0 (#5106) [reference scripts] [FBcode->GH] Rename asset files to remove spaces. (#6666) [build] fix submodule imports by importing functions directly (#6188) [datasets] Simplify checkintegrity for cifar and stl10 (#6335) [datasets] Moved pfm file reading into dataset utils (#6270) [documentation] Docs: build with Sphinx 5 (#5121) [models] Typo fix in comment in mvit.py (#6618) [models] cleanup for box encoding and decoding in FCOS (#6277) [ops] Remove AffineQuantizer.h from qnms_kernel.cpp (#6141) [reference scripts] Type fix in transformers.py (#6376) [transforms] Fix typo in error message (#6291) [transforms] Update typehint for fill arg in rotate (#6594) [io] Free avPacket on EAGAIN decoder error (#6432) (#6443) [android] [pytorch] Bump SoLoader version to 0.10.4 (#81946) (#6327) [transforms, reference script] port FixedSizeCrop from detection references to prototype transforms (#6417) [models, transforms] Update the expected removal date for several deprecated API for release v0.14 (#6654) [tests] Replace torch.utils.data.graph.traverse with traverse_dps (#6657) [build] Replacing cudatoolkit by cuda for 11.6 (#5996) [ops] [FBcode->GH] [quant][core][better-engineering] Rename files in quantized directory… (#6133) [build] [BE] Unify version computation (#6117) [models] Refactor swin transfomer so later we can reuse component for 3d version (#6088) [models] [FBcode->GH] Fix vit model assert message to be compatible with torchmultimodal test (#6592)

Contributors

We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Abhijit Deo, Adam J. Stewart, Aditya Oke, Alexander Jipa, Ambuj Pawar, Andrey Talman, dzdang, Edward Wang (EcoF), Eli Uriegas, Erjia Guan, Federico Pozzi, inisis, Jithun Nair, Joao Gomes, Karan Desai, Kevin Tse, Lenz, Lezwon Castelino, Mayanand, Nicolas Granger, Nicolas Hug, Nikita Shulga, Oleksandr Voietsa, Philip Meier, Ponku, ptrblck, Sergii Dymchenko, Sergiy Bilobrov, Shantanu, Sim Sun, Sophia Zhi, Tinson Lai, Vasilis Vryniotis, vcarpani, vcwai, vfdev-5, Yakhyokhuja Valikhujaev, Yosua Michael Maranatha, Zachariah Carmichael, キツネさん

- Python
Published by YosuaMichael over 3 years ago

org.pytorch:torchvision_ops - Minor release

This minor release bumps the pinned PyTorch version to v1.12.1 and contains some minor bug fixes.

Highlights

Bug Fixes

Small Patch SwinTransformer for FX compatibility https://github.com/pytorch/vision/pull/6252
Indicate strings can be used to specify weights parameter https://github.com/pytorch/vision/pull/6314
Fix d/c IoU for different batch sizes https://github.com/pytorch/vision/pull/6338

- Python
Published by atalman over 3 years ago

org.pytorch:torchvision_ops - TorchVision 0.13, including new Multi-weights API, new pre-trained weights, and more

Highlights

Models

Multi-weight support API

TorchVision v0.13 offers a new Multi-weight support API for loading different weights to the existing model builder methods:

```py from torchvision.models import *

Old weights with accuracy 76.130%

resnet50(weights=ResNet50Weights.IMAGENET1KV1)

New weights with accuracy 80.858%

resnet50(weights=ResNet50Weights.IMAGENET1KV2)

Best available weights (currently alias for IMAGENET1K_V2)

Note that these weights may change across versions

resnet50(weights=ResNet50_Weights.DEFAULT)

Strings are also supported

resnet50(weights="IMAGENET1K_V2")

No weights - random initialization

resnet50(weights=None) ```

The new API bundles along with the weights important details such as the preprocessing transforms and meta-data such as labels. Here is how to make the most out of it:

```py from torchvision.io import readimage from torchvision.models import resnet50, ResNet50Weights

img = readimage("test/assets/encodejpeg/gracehopper517x606.jpg")

Step 1: Initialize model with the best available weights

weights = ResNet50_Weights.DEFAULT model = resnet50(weights=weights) model.eval()

Step 2: Initialize the inference transforms

preprocess = weights.transforms()

Step 3: Apply inference preprocessing transforms

batch = preprocess(img).unsqueeze(0)

Step 4: Use the model and print the predicted category

prediction = model(batch).squeeze(0).softmax(0) classid = prediction.argmax().item() score = prediction[classid].item() categoryname = weights.meta["categories"][class_id] print(f"{categoryname}: {100 * score:.1f}%") ```

You can read more about the new API in the docs. To provide your feedback, please use this dedicated Github issue.

New architectures and model variants

Classification

The Swin Transformer and EfficienetNetV2 are two popular classification models which are often used for downstream vision tasks. This release includes 6 pre-trained weights for their classification variants. Here is how to use the new models:

```py import torch from torchvision.models import *

image = torch.rand(1, 3, 224, 224) model = swin_t(weights="DEFAULT").eval() prediction = model(image)

image = torch.rand(1, 3, 384, 384) model = efficientnetv2s(weights="DEFAULT").eval() prediction = model(image) ```

In addition to the above, we also provide new variants for existing architectures such as ShuffleNetV2, ResNeXt and MNASNet. The accuracies of all the new pre-trained models obtained on ImageNet-1K are seen below:

Model | Acc@1 | Acc@5 -- | -- | -- swint | 81.474 | 95.776 swins | 83.196 | 96.36 swinb | 83.582 | 96.64 efficientnetv2s | 84.228 | 96.878 efficientnetv2m | 85.112 | 97.156 efficientnetv2l | 85.808 | 97.788 resnext10164x4d | 83.246 | 96.454 resnext10164x4d (quantized) | 82.898 | 96.326 shufflenetv2x15 | 72.996 | 91.086 shufflenetv2x15 (quantized) | 72.052 | 90.700 shufflenetv2x20 | 76.230 | 93.006 shufflenetv2x20 (quantized) | 75.354 | 92.488 mnasnet075 | 71.180 | 90.496 mnas1_3 | 76.506 | 93.522

We would like to thank Hu Ye for contributing to TorchVision the Swin Transformer implementation.

[BETA] Object Detection and Instance Segmentation

We have introduced 3 new model variants for RetinaNet, FasterRCNN and MaskRCNN that include several post-paper architectural optimizations and improved training recipes. All models can be used similarly:

```py import torch from torchvision.models.detection import *

images = [torch.rand(3, 800, 600)] model = retinanetresnet50fpn_v2(weights="DEFAULT")

model = fasterrcnnresnet50fpn_v2(weights="DEFAULT")

model = maskrcnnresnet50fpn_v2(weights="DEFAULT")

model.eval() prediction = model(images) ```

Below we present the metrics of the new variants on COCO val2017. In parenthesis we denote the improvement over the old variants:

Model | Box mAP | Mask mAP -- | -- | -- retinanetresnet50fpnv2 | 41.5 (+5.1) | - fasterrcnnresnet50fpnv2 | 46.7 (+9.7) | - maskrcnnresnet50fpn_v2 | 47.4 (+9.5) | 41.8 (+7.2)

We would like to thank Ross Girshick, Piotr Dollar, Vaibhav Aggarwal, Francisco Massa and Hu Ye for their past research and contributions to this work.

New pre-trained weights

SWAG weights

The ViT and RegNet model variants offer new pre-trained SWAG (Supervised Weakly from hashtAGs) weights. One of the biggest of these models achieves a whopping 88.6% accuracy on ImageNet-1K. We currently offer two versions of the weights: 1) fine-tuned end-to-end weights on ImageNet-1K (highest accuracy) and 2) frozen trunk weights with a linear classifier fit on ImageNet-1K (great for transfer learning). Below we see the detailed accuracies of each model variant:

Model Weights | Acc@1 | Acc@5 -- | -- | -- RegNetY16GFWeights.IMAGENET1KSWAGE2EV1 | 86.012 | 98.054 RegNetY16GFWeights.IMAGENET1KSWAGLINEARV1 | 83.976 | 97.244 RegNetY32GFWeights.IMAGENET1KSWAGE2EV1 | 86.838 | 98.362 RegNetY32GFWeights.IMAGENET1KSWAGLINEARV1 | 84.622 | 97.48 RegNetY128GFWeights.IMAGENET1KSWAGE2EV1 | 88.228 | 98.682 RegNetY128GFWeights.IMAGENET1KSWAGLINEARV1 | 86.068 | 97.844 ViTB16Weights.IMAGENET1KSWAGE2EV1 | 85.304 | 97.65 ViTB16Weights.IMAGENET1KSWAGLINEARV1 | 81.886 | 96.18 ViTL16Weights.IMAGENET1KSWAGE2EV1 | 88.064 | 98.512 ViTL16Weights.IMAGENET1KSWAGLINEARV1 | 85.146 | 97.422 ViTH14Weights.IMAGENET1KSWAGE2EV1 | 88.552 | 98.694 ViTH14Weights.IMAGENET1KSWAGLINEARV1 | 85.708 | 97.73

The weights can be loaded normally as follows:

```py from torchvision.models import *

model1 = vith14(weights="IMAGENET1KSWAGE2EV1") model2 = vith14(weights="IMAGENET1KSWAGLINEARV1") ```

The SWAG weights are released under the Attribution-NonCommercial 4.0 International license. We would like to thank Laura Gustafson, Mannat Singh and Aaron Adcock for their work and support in making the weights available to TorchVision.

Model Refresh

The release of the Multi-weight support API enabled us to refresh the most popular models and offer more accurate weights. We improved on average each model by ~3 points. The new recipe used was learned on top of ResNet50 and its details were covered on a previous blogpost.

Model | Old weights | New weights -- | -- | -- efficientnetb1 | 78.642 | 79.838 mobilenetv2 | 71.878 | 72.154 mobilenetv3large | 74.042 | 75.274 regnety400mf | 74.046 | 75.804 regnety800mf | 76.42 | 78.828 regnety16gf | 77.95 | 80.876 regnety32gf | 78.948 | 81.982 regnety8gf | 80.032 | 82.828 regnety16gf | 80.424 | 82.886 regnety32gf | 80.878 | 83.368 regnetx400mf | 72.834 | 74.864 regnetx800mf | 75.212 | 77.522 regnetx16gf | 77.04 | 79.668 regnetx32gf | 78.364 | 81.196 regnetx8gf | 79.344 | 81.682 regnetx16gf | 80.058 | 82.716 regnetx32gf | 80.622 | 83.014 resnet50 | 76.13 | 80.858 resnet50 (quantized) | 75.92 | 80.282 resnet101 | 77.374 | 81.886 resnet152 | 78.312 | 82.284 resnext5032x4d | 77.618 | 81.198 resnext10132x8d | 79.312 | 82.834 resnext10132x8d (quantized) | 78.986 | 82.574 wideresnet502 | 78.468 | 81.602 wideresnet101_2 | 78.848 | 82.51

We would like to thank Piotr Dollar, Mannat Singh and Hugo Touvron for their past research and contributions to this work.

Ops and Transforms

New Augmentations, Layers and Losses

This release brings a bunch of new primitives which can be used to produce SOTA models. Some highlights include the addition of AugMix data-augmentation method, the DropBlock layer, the cIoU/dIoU loss and many more. We would like to thank Aditya Oke, Abhijit Deo, Yassine Alouini and Hu Ye for contributing to the project and for helping us maintain TorchVision relevant and fresh.

Documentation

We completely revamped our models documentation to make them easier to browse, and added various key information such as supported image sizes, or image pre-processing steps of pre-trained weights. We now have a main model page with various summary tables of available weights, and each model has a dedicated page. Each model builder is also documented in their own page, with more details about the available weights, including accuracy, minimal image size, link to training recipes, and other valuable info. For comparison, our previous models docs are here. To provide feedback on the new documentation, please use the dedicated Github issue.

Backward-incompatible changes

The new Multi-weight support API replaced the legacy “pretrained” parameter of model builders. Both solutions are currently supported to maintain backwards compatibility but our intention is to remove the deprecated API in 2 versions. Migrating to the new API is very straightforward. The following method calls between the 2 APIs are all equivalent:

```py from torchvision.models import resnet50, ResNet50_Weights

Using pretrained weights:

resnet50(weights=ResNet50Weights.IMAGENET1KV1) resnet50(weights="IMAGENET1K_V1") resnet50(pretrained=True) # deprecated resnet50(True) # deprecated

Using no weights:

resnet50(weights=None) resnet50() resnet50(pretrained=False) # deprecated resnet50(False) # deprecated ```

Deprecations

[models, models.quantization] Reinstate and deprecate model_urls and quant_model_urls (#5992) [transforms] Deprecate int as interpolation argument type (#5974)

New Features

[models] New Multi-weight API support (#5618, #5859, #6047, #6026, #5848) [models] Adding Swin Transformer architecture (#5491) [models] Adding EfficientNetV2 architecture (#5450) [models] Adding detection model improved weights: RetinaNet, MaskRCNN, FasterRCNN (#5756, #5773, #5763) [models] Adding classification model weight: resnext101 64x4d, mnasnet075, mnasnet13 (#5935, #6019) [models] Add SWAG model pretrained weights (#5714, #5722, #5732, #5793, #5721) [ops] AddingIoU loss function variants: DIoU, CIoU (#5786, #5776) [ops] Adding various ops and test for ops (#6053, #5416, #5792, #5783) [transforms] Adding AugMix transforms implementation (#5411) [reference scripts] Support custom weight decay setting in classification reference script (#5671) [transforms, reference scripts] Improve detection reference script: Scale Jitter, RandomShortestSize, FixedSizeCrop (#5435, #5610, #5607) [ci] Add M1 support : (#6167) [ci] Add Python-3.10 (build and test) (#5420)

Improvements

[documentation] Complete new revamp of models documentation (#5821, #5876, #5899, #6025, #5885, #5884, #5886, #5891, #6023, #6009, #5852, #5831, #5832, #6003, #6013, #5856, #6004, #6005, #5878, #6012, #5894, #6002, #5854, #5864, #5920, #5869, #5871, #6021, #6006, #6016, #5905, #6028, #5915, #5924, #5977, #5918, #5921, #5934, #5936, #5937, #5933, #5949, #5988, #5962, #5963, #5975, #5900, #5917, #5895, #5901, #6033, #6032, #6030, #5904, #5661, #6035, #6049, #6036, #5908, #5907, #6044, #6039, #5874, #6151) [documentation] Various documentation improvements (#5695, #5930, #5814, #5799, #5827, #5796, #5923, #5599, #5554, #5995, #5457, #6163, #6031, #6000, #5847, #6024)) [documentation] Add warnings in docs to document Beta APIs (#6115) [datasets] improve GDrive downloads (#5704, #5645) [datasets] indicate md5 checksum is not used for security (#5717) [models] Add shufflenetv2 1.5 and 2.0 weights (#5906) [models] Reduce unnecessary cuda sync in anchorutils.py (#5515) [models] Adding improved MobileNetV2 weights (#5560) [models] Remove (N, T, H, W, C) => (N, T, C, H, W) from presets (#6058) [models] add swins and swinb variants and improved swint (#6048) [models] Update ShuffleNetV2 annotations for x15 and x20 variants (#6022) [models] Better error message in ViT (#5820) [models, ops] Add private support for ciou and diou (#5984, #5685, #5690) [models, reference scripts] Various improvements to detection recipe and models (#5715, #5444) [transforms, tests] add functional vertical flip tests on segmentation mask (#5860) [transforms] make maxvalue jit-scriptable (#5623) [transforms] Make ScaleJitter proportional (#5559) [transforms] add tensor kernels for normalize and erase (#5462) [transforms] Update transforms following PIL deprecation (#5898) [transforms, models, datasets…] Replace asserts with exceptions (#5587, #5659) [utils] add warning if font is not set in drawboundingboxes (#5785) [utils] Throw warning for empty masks or box tensors on drawsegmentationmasks and drawboundingboxes (#5857) [video] Add outputformat do video datasets and readers (#6061) [video, io] Better compatibility with FFMPEG 5.0 (#5644) [video, io] Allow cuda device to be passed without the index for GPU decoding (#5505) [reference scripts] Simplify EMA to use Pytorch's updateparameters (#5469) [reference scripts] Reduce variance of evaluation in reference (#5819) [reference scripts] Various improvements to RAFT training reference (#5590) [tests] Speed up Model tests by 20% (#5574) [tests] Make test suite fail on unexpected test success (#5556) [tests] Skip big model in test to reduce memory usage in CI (#5903, #5902) [tests] Improve test of backbone utils (#5552) [tests] Validate against expected files on videos (#6077) [ci] Support for CUDA 11.6 (#5803, 5862) [ci] pre-download model weights in CI docs build (#5625)

Bug Fixes

[transforms] remove option to pass fill as str in transforms (#5632) [transforms] Better handling for Pad's fill argument (#5596) [transforms] [FBcode->GH] Fix accimage tests (#5545) [transforms] Update pilconstants.py (#6154) (#6156) [transforms] Fix resize transform when size == smalledgesize and maxsize isn't None (#5409) [transforms] Fixed rotate transform with expand inconsistency (#5677) [transforms] Fixed upstream issue with padding (#5875) [transforms] Fix functional.adjustgamma (#5427) [models] Respect strict=False when loading detection models (#5841) [models] Fix resnet norm initialization (#6082) (#6085) [models] Use frozen BN only if pre-trained for detection models. (#5443) [models] fix fcos gtarea calculation (#5816) [models, onnx] Add topk min function for trace and onnx (#5310) [models, tests] fix mobilnet norm layer test (#5643) [reference scripts] Fix regression on Detection training script (#5985) [datasets] do not re-download from GDrive if file is already present (#5805) [datasets] Fix datasets: kinetics, Flowers102, VOC2009, INaturalist 2021train, caltech (#5578, #5775, #5425, #5844, #5789) [documentation] Fixes device mismatch issue while building docs (#5428) [documentation] Fix Accuracy meta-data on shufflenetv2 (#5896) [documentation] fix typo in docstrings of some transforms (#5609) [video, documentation] Fix append of audiopts (#5488) [io, tests] More robust check in tests for 16 bits images (#5652) [video, io] Fix shape mismatch error in video reader (#5489) [io] Address nvjpeg leak on CUDA < 11.6 issue (#5713, #5482) [ci] Fixing issue with setupenv.sh in docker: resolve "unsafe directory" error (#6106) (#6109) [ci] fix documentation version problems when new release is tagged (#5583) [ci] Replace jcenter and fix version for android (#6046) [tests] Add .float() before .mean() on testbackboneutils.py because .mean() dont accept integer dtype (#6090) (#6091) [tests] Fix keypointrcnnresnet50fpn flaky test (#5911) [tests] Disable test_encode|write_jpeg_reference tests (#5910) [mobile] Bump up LibTorchvision version number for Podspec to release Cocoapods (#5624) [feature extraction] Add default tracer args for model feature extraction function (#5637) [build] Fix libtorchvision.so not able to encode images by adding *_FOUND macros to CMakeLists.txt (#5547)

Code Quality

[dataset, models] Better deprecation message for voc2007 and SqueezeExcitation (#5391) [datasets, reference scripts] Use Kinetics instead of Kinetics400 in references (#5787) (#5952) [models] CleanUp DenseNet code (#5966) [models] Minor Swin Transformer fixes (#6054) [models, onnx] Use onnx function only in tracing mode (#5468) [models] Refactor swin transfomer so later we can reuse component for 3d version (#6088) (#6100) [models, tests] Fix minor issues with model tests. (#5576) [transforms] Remove to_tensor() and ToTensor() usages (#5553) [transforms] Refactor Augmentation Space calls to speed up. (#5402) [transforms] Recoded maxvalue method using a dictionary (#5566) [transforms] Replace getimagesize/numchannels with getdimensions (#5487) [ops] Replace usages of atomicAdd with gpuAtomicAdd (#5823) [ops] Fix unused variable warning in psroialignkernel.cu (#5408) [ops] Remove custom ops interpolation with antialiasing (#5329) [ops] Move Permute layer to ops. (#6055) [ops] Remove assertions for generalizedboxiou (#5691) [utils] Moving `sequencetostrtotorchvision.utils` (#5604) [utils] Clarify TypeError message in makegrid (#5997) [video, io] replace distutils.spawn with shutil.which per PEP632 in setup script (#5849) [video, io] Move VideoReader out of _init__ (#5495) [video, io] Remove unnecessary initialisation in GPUDecoder (#5507) [video, io] Remove unused member variable and argument in GPUDecoder (#5499) [video, io] Improve testvideoreader (#5498) [video, io] Update private attribute name for readability (#5484) [video, tests] Improve testvideoapi (#5497) [reference scripts] Minor updates to optical flow ref for consistency (#5654) [reference scripts] Add barrier() after initprocessgroup() (#5475) [ci] Delete stale packaging scripts (#5433) [ci] remove explicit install of Pillow throughout CI (#5950) [ci, test] remove unnecessary pytest install (#5739) [ci, tests] Remove unnecessary PYTORCHTESTWITHSLOW env (#5631) [ci] Add .git-blame-ignore-revs to ignore specific commits in git blame (#5696) [ci] Remove CUDA 11.1 support (#5477, #5470, #5451, #5978) [ci] Minor linting improvement (#5880) [ci] Remove Bandit and CodeQL jobs (#5734) [ci] Various type annotation fixes / issues (#5598, #5970, #5943)

Contributors

Abhijit Deo, Aditya Oke, Andrey Talman, Anton Thomma, Behrooz, Bruno Korbar, Daniel Angelov, Dbhasin1, Drishti Bhasin, F-G Fernandez, Federico Pozzi, FG Fernandez, Georg Grab, Gouvernathor, Hu Ye, Jeffery (Zeyu) Zhao, Joao Gomes, kaijieshi, Kazuki Adachi, KyleCZH, kylematoba, LEGRAND Matthieu, Lezwon Castelino, Luming Tang, Matti Picus, Nicolas Hug, Nikita, Nikita Shulga, oxabz, Philip Meier, Prabhat Roy, puhuk, Richard Barnes, Sahil Goyal, satojkovic, Shijie, Shubham Bhokare, talregev, tcmyxc, Vasilis Vryniotis, vfdev, WuZhe, XiaobingZhang, Xu Zhao, Yassine Alouini, Yonghye Kwon, YosuaMichael, Yulv-git, Zhiqiang Wang

- Python
Published by NicolasHug over 3 years ago

org.pytorch:torchvision_ops - TorchVision 0.12, including new Models, Datasets, GPU Video Decoding, and more

Highlights

New Models

Four new model families have been released in the latest version along with pre-trained weights for their variants: FCOS, RAFT, Vision Transformer (ViT) and ConvNeXt.

Object Detection

FCOS is a popular, fully convolutional, anchor-free model for object detection. In this release we include a community-contributed model implementation as well as pre-trained weights. The model was trained on COCO train2017 and can be used as follows:

```python import torch from torchvision import models

x = [torch.rand(3, 224, 224)] fcos = models.detection.fcosresnet50fpn(pretrained=True).eval() predictions = fcos(x) ```

The box AP of the pre-trained model on COCO val2017 is 39.2 (see #4961 for more details).

We would like to thank Hu Ye and Zhiqiang Wang for contributing to the model implementation and initial training. This was the first community-contributed model in a long while, and given its success, we decided to use the learnings from this process and create a new model contribution guidelines.

Optical Flow support and RAFT model

Torchvision now supports optical flow! Optical flow models try to predict movement in a video: given two consecutive frames, the model predicts where each pixel of the first frame ends up in the second frame. Check out our new tutorial on Optical Flow!

We implemented a torchscript-compatible RAFT model with pre-trained weights (both normal and “small” versions), and added support for training and evaluating optical flow models. Our training scripts support distributed training across processes and nodes, leading to much faster training time than the original implementation. We also added 5 new optical flow datasets: Flying Chairs, Flying Things, Sintel, Kitti, and HD1K.

raft

Image Classification

Vision Transformer (ViT) and ConvNeXt are two popular architectures which can be used as image classifiers or as backbones for downstream vision tasks. In this release we include 8 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:

```python import torch from torchvision import models

x = torch.rand(1, 3, 224, 224) vit = models.vitb16(pretrained=True).eval() convnext = models.convnext_tiny(pretrained=True).eval() predictions1 = vit(x) predictions2 = convnext(x) ```

The accuracies of the pre-trained models obtained on ImageNet val are seen below:

|Model |Acc@1 |Acc@5 | |--- |--- |--- | |vitb16|81.072|95.318| |vitb32|75.912|92.466| |vitl16|79.662|94.638| |vitl32|76.972|93.07| |convnexttiny|82.52|96.146| |convnextsmall|83.616|96.65| |convnextbase|84.062|96.87| |convnextlarge|84.414|96.976|

The above models have been trained using an adjusted version of our new training recipe and this allows us to offer models with accuracies significantly higher than the ones on the original papers.

GPU Video Decoding

In this release, we add support for GPU video decoding in the video reading API. To use hardware-accelerated decoding, we just need to pass a cuda device to the video reading API as shown below:

```python import torchvision

reader = torchvision.io.VideoReader(file_name, device='cuda:0') for frame in reader: print(frame) ```

We also support seeking to anyframe or a keyframe in the video before reading, as shown below:

python reader.seek(seek_time)

New Datasets

We have implemented 14 new classification datasets: CLEVR, GTSRB, FER2013, SUN397, Country211, Flowers102, fvgc_aircraft, OxfordIIITPet, DTD, Food 101, Rendered SST2, Stanford cars, PCAM, and EuroSAT.

As part of our work on Optical Flow support (see above for more details), we also added 5 new optical flow datasets: Flying Chairs, Flying Things, Sintel, Kitti, and HD1K.

Documentation

New documentation layout

We have updated our documentation pages to be more compact and easier to browse. Each function / class is now documented in a separate page, clearing up some space in the per-module pages, and easing the discovery of the proposed APIs. Compare e.g. our previous docs vs the new ones. Please let us know if you have any feedback!

Model contribution guidelines

New model contribution guidelines have been published following the success of the FCOS model which was contributed by the community. These guidelines aim to be an overview of the model contribution process for anyone who would like to suggest, implement and train a new model.

Upcoming Prototype APIs

We are currently working on a prototype API which adds Multi-weight support on all of our model builder methods. This will enable us to offer multiple pre-trained weights, associated with their meta-data and inference transforms. The API is still under review and thus was not included in the release but you can read more about it on our blogpost and provide your feedback on the dedicated Github issue.

Changes in our deprecation policy

Up until now, torchvision would almost never remove deprecated APIs. In order to be more aligned and consistent with pytorch core, we are updating our deprecation policy. We are now following a 2-release deprecation cycle: deprecated APIs will raise a warning for 2 versions, and will be removed after that. To reflect these changes and to smooth the transition, we have decided to:

Remove all APIs that had been deprecated before or on v0.8, released 1.5 years ago.
Update the removal timeline of all other deprecated APIs to v0.14, to reflect the new 2-cycle policy starting now in v0.12.

Backward-incompatible changes

[models.quantization] Removed the Quantized shufflenetv2x15 and shufflenetv2x20 model builders which had no associated weights, rendering them useless. Additionally we added pre-trained weights for the shufflenetv2x0_5 quantized variant.. (#4854) [ops] Change to stable sort in nms implementations - this change can lead to different behavior in rare cases therefore it has been flagged as backwards-incompatible (#4767) [transforms] Changed the center and the parametrization of shear X/Y in Auto Augment transforms to align with the original papers (#5285) (#5384)

Deprecations

Note: in order to be more aligned with pytorch core, we are updating our deprecation policy. Please read more above in the “Highlights” section.

[ops] The ops.poolers.MultiScaleRoIAlign public methods setup_setup_scales, convert_to_roi_format, and infer_scale have been deprecated and will be removed in 0.14 (#4951) (#4810)

New Features

[datasets] New optical flow datasets added: FlyingChairs, Kitti, Sintel, FlyingThings3D, and HD1K (#4860) (#4845) (#4858) (#4890) (#5004) (#4889) (#4888) (#4870) [datasets] New classification datasets support for FLAVA: CLEVR, GTSRB, FER2013, SUN397, Country211, Flowers102, fvgc_aircraft, OxfordIIITPet, DTD, Food 101, Rendered SST2, Stanford cars, PCAM, and EuroSAT (#5120) (#5130) (#5117) (#5132) (#5138) (#5177) (#5178) (#5116) (#5115) (#5119) (#5220) (#5166) (#5203) (#5114) (#5164) (#5280) [models] Add VisionTransformer model (#5173) (#5210) (#5172) (#5085) (#5226) (#5025) (#5086) (#5159) [models] Add ConvNeXt model (#5330) (#5253) [models] Add RAFT models and support for optical flow model training (#5022) (#5070) (#5174) (#5381) (#5078) (#5076) (#5081) (#5079) (#5026) (#5027) (#5082) (#5060) (#4868) (#4657) (#4732) [models] Add FCOS model (#4961) (#5267) [utils] Add utility to convert optical flow to an image (#5134) (#5308) [utils] Add utility to draw keypoints (#4216) [video] Add video GPU decoder (#5019) (#5191) (#5215) (#5256) (#4474) (#3179) (#4878) (#5328) (#5327) (#5183) (#4947) (#5192)

Improvements

[datasets] Migrate mnist dataset from np.frombuffer (#4598) [io, tests] Switch from np.frombuffer to torch.frombuffer (#4578) [models] Update ResNet-50 accuracy with Repeated Augmentation (#5201) [models] Add regnety128gf factory function, and several regnet model weights (#5176) (#4530) [models] Adding minsize to classification and video models (#5223) [models] Remove in-place mutation in DefaultBoxGenerator (#5279) [models] Added Dropout parameter to Models Constructors (#4580) [models] Allow to use custom normlayer (#4621) [models] Add IntermediateLayerGetter on segmentation (#5298) [models] Use FX feature extractor for segm model (#4563) [models, ops, io] Add model, ops and io usage logging (#4956) (#4735) (#4736) (#4737) (#5044) (#4799) (#5095) (#5038) [models.quantization] Implement isqat in TorchVision (#5299) [models.quantization] Cleanup Quantized ShuffleNet (#4854) [models.quantization] Adding new Quantized models (#4969) [ops] [FBcode->GH] Fix missing kernel guards (#4620) (#4743) [ops] Expose misc ops at package level (#4812) [ops] Fix giou naming bug (#5270) [ops] Change batched NMS threshold to choose for-loop version (#4990) [ops] Add bias parameter to ConvNormActivation (#5012) [ops] Feature extraction default arguments - ops (#4810) [ops] Change to stable sort in nms implementations (#4767) [reference scripts] Support amp training (#4923) (#4933) (#4994) (#4547) (#4570) [reference scripts] Add types and improve descriptions to ArgumentParser parameters (#4724) [reference scripts] Replaced all 'nograd()' instances with 'inferencemode()' (#4629) [reference scripts] Adding Repeated Augment Sampler (#5051) [reference scripts] Reduce variance of classification references evaluation (#4609) [reference scripts] Avoid inplace modification of target boxes in detection references (#5289) [reference scripts] Allow variable number of repetitions for RA (#5084) [reference scripts, classification] Adding gradient clipping (#4824) [reference scripts, models.quantization] Add --prototype flag to quantization scripts. (#5334) [reference scripts, ops] Additional SOTA ingredients on Classification Recipe (#4493) [transforms] Added center arg to F.affine and RandomAffine ops (#5208) [transforms] Explicitly copying array in piltotensor (#4566) [transforms] Update functionaltensor.py (#4852) [transforms] Add api usage log to transforms (#5007) [utils] Support random colors by default for drawboundingboxes (#5127) [utils] Add API usage calls to utils (#5077) Various documentation improvements (#4913) (#4892) (#5305) (#5273) (#5089) (#4653) (#5302) (#4647) (#4922) (#5124) (#4972) (#5165) (#4843) (#5238) (#4846) (#4823) (#5316) (#5195) (#5153) (#4783) (#4798) (#4797) (#5368) (#5037) (#4830) (#4681) (#4579) (#4520) (#4586) (#4536) (#4574)) (#4565) (#4822) (#5315) (#4546) (#4522) (#5312) (#5372) (#4833) [tests] Set seed on several tests to reduce flakiness (#4911) (#4764) (#4762) (#4759) (#4766) (#4763) (#4758) (#4761) [tests]Other tests improvements (#4756) (#4775) (#4867) (#4929) (#4632) (#5029) (#4597) Added script to sync fbcode changes with main branch (#4769) [ci] Various CI improvements (#4662) (#4669) (#4791) (#4626) (#5021) (#4739) (#3973)(#4618) (#4788) (#4946) (#5112) (#5099) (#5288) (#5152) (#4696) (#5122) (#4793) (#4998) (#4498) [build] Various build improvements (#5261) (#5190) (#4945) (#4920) (#5024) (#4571) (#4742) (#4944) (#4989) (#5179) (#4516) (#4661) (#4695) (#4939) (#4954) [io] decode* returns contiguous tensors (#4898) [io] Revert "decode* returns contiguous tensors (#4898)" (#4901)

Bug Fixes

[datasets] fix Caltech datasets (#4556) [datasets] fix UCF101 on Windows (#5129) [datasets] remove extracted archive if flag was set (#5055) [datasets] Reverted folder.py back to using complete path to file for makedataset and isvalidfile rather than just the filename (#4885) [datasets] fix fromfile on windows (#4980) [datasets] fix WIDERFace download links (#4649) [datasets] fix targettype selection for Caltech101 (#4637) [io] Skip jpeg comparison tests with PIL (#5169) [io] [Windows] Workaround for loading bundled DLLs (#4893) [models] Adding missing named param check on ViT (#5196) [models] Modifying keypointrcnn.py for keypointpredictor issue (#5180) [models] Fixing bug on SSD backbone freezing (#4590) [models] [FBcode->GH] Removed type annotations from rcnn (#4883) [models.quantization] Amend the weights only if quantize=True (#4966) [models.quantization] fix mobilenetv3 quantization state dict loading (#4997) [ops] Adding maskstoboxes to all in ops (#4779) [ops] Update the error message on DeformConv2d (#4908) [ops, onnx] RoiAlign aligned=True (#4692) [reference scripts] Fix reduceacrossprocesses inconsistent return type (#4733) [reference scripts] Fix bug on EMA naveraged estimation (#4544) [reference scripts] support random seed for RA sampler (#5053) [reference scripts] fix bug in training model by amp (#4874) [reference scripts, transforms] Fix a bug on RandomZoomOut (#5278) [tests] Skip expected checks for quantized resnet50 due to flakiness (#4686) [transforms] Fix bug on autocontrast when min==max (#4999) [transforms] Fix augmentation space to be uint8 compatible (#4806) [utils] Fix `drawboundingboxesanddrawkeypoints `for tensors on GPU (#5101) (#5102) [build] fix formatting CIRCLECITAG when building docs (#4693) [build] Fix nvjpeg packaging into the wheel (#4752) [build] Switch Android app to pytorchandroid stable (#4926) [ci] Add libtinfo5 dependency (#4931) [ci] Revert vith14 as it breaks our CI (#5259) [ci] Remove pager on git diff (#4800) [ci] Fix failing CI job for android (#4912) [ci] Add numpy as explicit dependency to build_cmake.sh (#4987)

Code Quality

Various typing improvements (#4603) (#4172) (#4173) (#4631) (#4619) (#4583) (#4602) (#5182) Add ufmt (usort + black) as code formatter (#4384) Fix formatting issues (#4535) (#4747) Add pre-commit hook to fix line endings (#5021) Various imports cleanups/improvements (#4533) (#4879) Use f-strings almost everywhere, and other cleanups by applying pyupgrade (#4585) Update code to Python 3.7 compliance and remove Python 3.6 references (#5125) (#5161) Consolidate repr methods throughout the repo (#5392) Set allowredefinition = True for mypy (#4531) Use is to compare type of objects (#4605) Various typos fixed (#5031) (#5092) Fix annotations for Python >= 3.8 (#5301) Revamp log api usage method (#5072) [deprecation] Update deprecation messages stating APIs will be removed in 0.14 and remove APIs that were deprecated before 0.8 (#5387) (#5386) [build] Updated setup.py to use TorchVersion object for version comparison (#4307) [ops] remove debugging asserts (#5332) [c++frontend] Fix missing Torch includes (#5118) [ci] Cleanup and removing unnecessary references and parameters (#4983) (#4930) (#5042) [datasets] [FBcode->GH] remove unused requests functionality (#5014) [datasets] allow single extension as str in makedataset (#5229) [datasets] use helper function to extract archive in CelebA (#4557) [datasets] simplify QMNIST download logic (#4562) [documentation] fix make html-noplot docs build command (#5389) [models] Move all weight initializations from private methods to constructors (#5331) [models] simplify model builders (#5001) [models] Replace asserts with ValueErrors (#5275) [models] Use enumerate to get index of ModuleList (#4534) [models] Simplify efficientnet code by removing efficientnetconf (#4690) [models] Refactor Segmentation models (#4646) [models] Pass indexing param to meshgrid to avoid warning in detection models (#4645) [models] Refactor the backbone builders of detection (#4656) [models.quantization] Switch torch.quantization to torch.ao.quantization (#5296) (#4554) [ops] Fixed unused variables in ops (#4666) [ops] Refactor poolers (#4951) [reference scripts] Simplify the gradient clipping code (#4896) [reference scripts] only set random generator if shuffle=true (#5135) [tests] Refactor BoxOps tests to use parameterize (#5380) [tests] rename TestWeights to appease pytest (#5054) [tests] fix and add test for sequencetostr (#5213) [tests] remove getboolenvvar (#5222) [models, tests] remove custom code for model output comparison (#4971) [utils, documentation] Fix annotation of drawsegmentation_masks (#4527) [video] Fix error message in demuxer (#5293)

Contributors

Abhijit Deo, Aditya Oke, Alexander Soare, Alexander Unnervik, Allen Goodman, Andrey Talman, Brian Johnson, Bruno Korbar, buckage, Carlosbogo, Chungman Lee, Daniel Falbel, David Fan, Dmytro, Eli Uriegas, Ethan White, Eugene Yurtsev, F-G Fernandez, Fedor, Francisco Massa, Guo, Harish Kulkarni, HeungwooLee, Hu Ye, Jane (Yuan) Xu, Jirka Borovec, Jithun Nair, Joao Gomes, Jopo, Kai Zhang, kbozas, Kevin Tse, Khushi Agrawal, Konstantinos Bozas, Kotchin, Kushashwa Ravi Shrimali, KyleCZH, Mark Harfouche, Marko Kohtala, Masahiro Masuda, Matti Picus, Mengwei Liu, Mohammad (Moe) Rezaalipour, Mriganka Nath, Muhammed Abdullah, Nicolas Granger, Nicolas Hug, Nikita Shulga, peterbell10, Philip Meier, Piyush Singh, Prabhat Roy, ProGamerGov, puhuk, Richard Barnes, rvandeghen, Sai Krishna, Santiago Castro, Saswat Das, Sepehr Sameni, Sergii Khomenko, Stephen Matthews, Sumanth Ratna, Sumukh Aithal, Tal Ben-Nun, Vasilis Vryniotis, vfdev, Xiaolin Wang, Yi Zhang, Yiwen Song, Yoshitomo Matsubara, Yuchen Huang, Yuxin Wu, zhiqiang, and Zhiqiang Wang.

- Python
Published by jdsgomes almost 4 years ago

org.pytorch:torchvision_ops - Minor release

This is a minor release compatible with PyTorch 1.10.2 and a minor bug fix.

Highlights

Bug Fixes

[CI] Skip jpeg comparison tests with PIL (#5232)

- Python
Published by atalman about 4 years ago

org.pytorch:torchvision_ops - Minor bugfix release

This minor release bumps the pinned PyTorch version to v1.10.1 and contains some minor bug fixes.

Highlights

Bug Fixes

[CI] Fix clang_format issue (#5061)
[CI, MOBILE] Fix binarylibtorchvisionops_android job (#5062)
[CI] Add numpy as explicit dependency to build_cmake.sh (#5065)
[MODELS] Amend the weights only if quantize=True. (#5066)
[TRANSFORMS] Fix augmentation space to be uint8 compatible (#5067)
[DATASETS] Fix WIDERFace download links (#5068)
[BUILD, WINDOWS] Workaround for loading bundled DLLs (#5094)

- Python
Published by datumbox about 4 years ago

org.pytorch:torchvision_ops - Update dependency on wheels to match version in PyPI

Users were reporting issues installing torchvision on PyPI, this release contains an update to the dependencies for wheels to point directly to torch==0.10.0

- Python
Published by seemethere over 4 years ago

org.pytorch:torchvision_ops - RegNet, EfficientNet, FX Feature Extraction and more

This release introduces the RegNet and EfficientNet architectures, a new FX-based utility to perform Feature Extraction, new data augmentation techniques such as RandAugment and TrivialAugment, updated training recipes that support EMA, Label Smoothing, Learning-Rate Warmup, Mixup and Cutmix, and many more.

Highlights

New Models

RegNet and EfficientNet are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:

```python import torch from torchvision import models

x = torch.rand(1, 3, 224, 224)

regnet = models.regnety400mf(pretrained=True) regnet.eval() predictions = regnet(x)

efficientnet = models.efficientnet_b0(pretrained=True) efficientnet.eval() predictions = efficientnet(x) ```

The accuracies of the pre-trained models obtained on ImageNet val are seen below (see #4403, #4530 and #4293 for more details)

|Model |Acc@1 |Acc@5 | |--- |--- |--- | |regnetx400mf |72.834 |90.95 | |regnetx800mf |75.212 |92.348 | |regnetx16gf |77.04 |93.44 | |regnetx32gf |78.364 |93.992 | |regnetx8gf |79.344 |94.686 | |regnetx16gf |80.058 |94.944 | |regnetx32gf |80.622 |95.248 | |regnety400mf |74.046 |91.716 | |regnety800mf |76.42 |93.136 | |regnety16gf |77.95 |93.966 | |regnety32gf |78.948 |94.576 | |regnety8gf |80.032 |95.048 | |regnety16gf |80.424 |95.24 | |regnety32gf |80.878 |95.34 | |EfficientNet-B0 |77.692 |93.532 | |EfficientNet-B1 |78.642 |94.186 | |EfficientNet-B2 |80.608 |95.31 | |EfficientNet-B3 |82.008 |96.054 | |EfficientNet-B4 |83.384 |96.594 | |EfficientNet-B5 |83.444 |96.628 | |EfficientNet-B6 |84.008 |96.916 | |EfficientNet-B7 |84.122 |96.908 |

We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.

FX-based Feature Extraction

A new Feature Extraction method has been added to our utilities. It uses PyTorch FX and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. Here is an example of how to use the new utility:

```python import torch from torchvision.models import resnet50 from torchvision.models.featureextraction import createfeature_extractor

x = torch.rand(1, 3, 224, 224)

model = resnet50()

returnnodes = { "layer4.2.relu2": "layer4" } model2 = createfeatureextractor(model, returnnodes=returnnodes) intermediate_outputs = model2(x)

print(intermediate_outputs['layer4'].shape)

```

We would like to thank Alexander Soare for developing this utility.

New Data Augmentations

Two new Automatic Augmentation techniques were added: Rand Augment and Trivial Augment. Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:

```python from torchvision import transforms

t = transforms.RandAugment()

t = transforms.TrivialAugmentWide()

transformed = t(image)

transform = transforms.Compose([ transforms.Resize(256), transforms.RandAugment(), # transforms.TrivialAugmentWide() transforms.ToTensor()]) ```

We would like to thank Samuel G. Müller for contributing Trivial Augment and for his help on refactoring the AA package.

Updated Training Recipes

We have updated our training reference scripts to add support of Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, Mixup, Cutmix and other SOTA primitives. The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected on the next release.

Backward-incompatible changes

[models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)

Deprecations

[models] Deprecate the C++ vision::models namespace (#4375)

New Features

[datasets] Add iNaturalist dataset (#4123) [datasets] Download and Kinetics 400/600/700 Datasets (#3680) [datasets] Added LFW Dataset (#4255) [models] Add FX feature extraction as an alternative to intermediatelayergetter (#4302) (#4418) [models] Add RegNet Architecture in TorchVision (#4403) (#4530) (#4550) [ops] Add new maskstoboxes op (#4290) (#4469) [ops] Add StochasticDepth implementation (#4301) [reference scripts] Adding Mixup and Cutmix (#4379) [transforms] Integration of TrivialAugment with the current AutoAugment Code (#4221) [transforms] Adding RandAugment implementation (#4348) [models] Add EfficientNet Architecture in TorchVision (#4293)

Improvements

Various documentation improvements (#4239) (#4251) (#4275) (#4342) (#3894) (#4159) (#4133) (#4138) (#4089) (#3944) (#4349) (#3754) (#4308) (#4352) (#4318) (#4244) (#4362) (#3863) (#4382) (#4484) (#4503) (#4376) (#4457) (#4505) (#4363) (#4361) (#4337) (#4546) (#4553) (#4565) (#4567) (#4574) (#4575) (#4383) (#4390) (#3409) (#4451) (#4340) (#3967) (#4072) (#4028) (#4132) [build] Add CUDA-11.3 builds to torchvision (#4248) [ci, tests] Skip some CPU-only tests on CircleCI machines with GPU (#4002) (#4025) (#4062) [ci] New issue templates (#4299) [ci] Various CI improvements, in particular putting back GPU testing on windows (#4421) (#4014) (#4053) (#4482) (#4475) (#3998) (#4388) (#4179) (#4394) (#4162) (#4065) (#3928) (#4081) (#4203) (#4011) (#4055) (#4074) (#4419) (#4067) (#4201) (#4200) (#4202) (#4496) (#3925) [ci] ping maintainers in case a PR was not properly labeled (#3993) (#4012) (#4021) (#4501) [datasets] Add bzip2 file compression support to datasets (#4097) [datasets] Faster dataset indexing (#3939) [datasets] Enable logging of internal dataset instanciations. (#4319) (#4090) [datasets] Removed copy=False in torch.fromnumpy in MNIST to avoid warning (#4184) [io] Add warning for files with corrupt containers (#3961) [models, tests] Add test to check that classification models are FX-compatible (#3662) [tests] Speedup various tests (#3929) (#3933) (#3936) [models] Allow custom activation in SqueezeExcitation of EfficientNet (#4448) [models] Allow gradient backpropagation through GeneralizedRCNNTransform to inputs (#4327) [ops, tests] Add JIT tests (#4472) [ops] Make StochasticDepth FX-compatible (#4373) [ops] Added backward pass on CPU and CUDA for interpolation with anti-alias option (#4208) (#4211) [ops] Small refactoring to support opt mode for torchvision ops (fb internal specific) (#4080) (#4095) [reference scripts] Added Exponential Moving Average support to classification reference script (#4381) (#4406) (#4407) [reference scripts] Adding label smoothing on classification reference (#4335) [reference scripts] Further enhance Classification Reference (#4444) [reference scripts] Replaced totensor() with piltotensor() + convertimagedtype() (#4452) [reference scripts] Update the metrics output on reference scripts (#4408) [reference scripts] Warmup schedulers in References (#4411) [tests] Add check for fx compatibility on segmentation and video models (#4131) [tests] Mock redirection logic for tests (#4197) [tests] Replace setdeterministic with non-deprecated spelling (#4212) [tests] Skip building torchvision with ffmpeg when python==3.9 (#4417) [tests] [jit] Make operation call accept Stack& instead Stack* (#63414) (#4380) [tests] make tests that involve GDrive more robust (#4454) [tests] remove dependency for dtype getters (#4291) [transforms] Replaced example usage of ToTensor() by PILToTensor() + ConvertImageDtype() (#4494) [transforms] Explicitly copying array in piltotensor (#4566) (#4573) [transforms] Make getimagesize and getimagenumchannels public. (#4321) [transforms] adding gray images support for adjustcontrast and adjustsaturation (#4477) (#4480) [utils] Support single color in utils.drawboundingboxes (#4075) [video, documentation] Port the videoapi.ipynb notebook to the example gallery (#4241) [video, io, tests] Added check for invalid input file (#3932) [video, io] remove deprecated function call (#3861) (#3989) [video, tests] Removed testaudiovideosync as it doesn't work as expected (#4050) [video] Build torchvision with ffmpeg only on Linux and ignore ffmpeg on other platforms (#4413, #4410, #4041)

Bug Fixes

[build] Conda: Add numpy dependency (#4442) [build] Explicitly exclude PIL 8.3.0 from compatible dependencies (#4148) [build] More robust version check (#4285) [ci] Fix broken clang format test. (#4320) [ci] Remove mentions of conda-forge (#4082) [ci] fixup '' -> '/./' for CI filter (#4059) [datasets] Fix download from google drive which was downloading empty files in some cases (#4109) [datasets] Fix splitting CelebA dataset (#4377) [datasets] Add support for files with periods in name (#4099) [io, tests] Don't check transparency channel for pil >= 8.3 in testdecodepng (#4167) [io] Fix sizet issues across JPEG versions and platforms (#4439) [io] Raise proper error when decoding 16-bits jpegs (#4101) [io] Unpinned the libjpeg version and fixed jpegmemdest's size type Wind… (#4288) [io] deinterlacing PNG images with readimage (#4268) [io] More robust ffmpeg version query in setup.py (#4254) [io] Fixed readimage bug (#3948) [models] Don't download backbone weights if pretrained=True (#4283) [onnx, tests] Do not disable profiling executor in ONNX tests (#4324) [ops, tests] Fix DeformConvTester::testbackwardcuda by setting threads per block to 512 (#3942) [ops] Fix typing issue to make DeformConv2d scriptable (#4079) [ops] Fixes deformconv issue with large input/output (#4351) [ops] Resolving tracing problem on StochasticDepth iterator. (#4372) [ops] Port quantizeval and dequantizeval into torchvision to avoid at::native and android xplat incompatibility (#4311) [reference scripts] Fix bug on EMA naveraged estimation. (#4544) (#4545) [tests] Avoid cmyk in nvjpeg tests (#4246) [tests] Catch ValueError due to recent change to torch.testing.assertclose (#4165) [tests] Fix failing tests by catching the proper exception from torch.testing (#4121) [tests] Skip test if connection issues on fate (#4284) [transforms] Fix RandAugment and TrivialAugment bugs (#4370) [transforms] [FBcode->GH] [JIT] Add reference semantics to TorchScript classes (#44324) (#4166) [utils] Handle grayscale images on drawboundingboxes (#4043) (#4049) [video, io] Fixed missing audio with video_reader and pyav backend (#3934, #4064)

Code Quality

Various typing improvements (#4369) (#4168) (#4169) (#4170) (#4171) (#4224) (#4227) (#4395) (#4409) (#4232) (#4234 (#4236) (#4226) (#4416) Renamed the “master” branch into “main” (#4306) (#4365) ci Allow all torchvision test rules to run with RE (#4073) [ci] add pre-commit hooks for convenient formatting checks (#4387) [ci] Import hipifypython only when needed (#4031) [io] Fixed a couple of typos and removed unnecessary bracket (#4345) [io] use fromblob to avoid memcpy (#4118) [models, ops] Moving common layers to ops (#4504) [models, ops] Replace MobileNetV3's SqueezeExcitation with EfficientNet's one (#4487) [models] Explicitely store a distance value that is reused (#4341) [models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256) [onnx, tests] Use test images from repo rather than internet for ONNX tests (#4176) [onnx] Import ONNX utils from symbolicopset11 module (#4230) [ops] Fix clang formatting in deformconv2dkernel.cu (#3943) [ops] Update gpu atomics include path (#4478) (reverted) [reference scripts] Cleaned-up coco evaluation code (#4453) [reference scripts] remove unused package in cocoeval.py (#4404) [tests] Ported all tests to pytest (#3962) (#3996) (#3950) (#3964) (#3957) (#3959) (#3981) (#3952) (#3977) (#3974) (#3976) (#3983) (#3971) (#3988) (#3990) (#3985) (#3984) (#4030) (#3955)r (#4008) (#4010) (#4023) (#3954) (#4026) (#3953) (#4047) (#4185) (#3947) (#4045) (#4036) (#4034) (#3978) (#4046) (#3991) (#3930) (#4038) (#4037) (#4215) (#3972) (#3966) (#4114) (#4177) (#4280) (#3946) (#4233) (#4258) (#4035) (#4040) (#4000) (#4196) (#3922) (#4032) [tests] Prevent tests from leaking their respective RNG (#4497) (#3926) (#4250) [tests] Remove TestCase dependency for testmodelsdetectionanchorutils.py (#4207) [tests] Removed tests executing deprecated Ft.center/five/tencrop methods (#4479) [tests] Replace setdeterministic with non-deprecated spelling (#4212) [tests] Remove torchvision/test/fakedatageneration.py (#4130) [transforms, reference scripts] Added PILToTensor and ConvertImageDtype classes in reference scripts and used them to replace ToTensor(#4495, #4481) [transforms] Refactor AutoAugment to support more augmentations. (#4338) [transforms] Replace deprecated torch.lstsq with torch.linalg.lstsq (#3918) [video] Drop virtual from private member functions of Decoder class (#4027) [video] Fixed comparison warnings in audiostream and videostream (#4007) [video] Fixed some ffmpeg deprecation warnings in decoder (#4003)

Contributors

ABD-01, Adam J. Stewart, Aditya Oke, Alex Lin, Alexander Grund, Alexander Soare, Allen Goodman, Amani Kiruga, Anirudh, Beat Buesser, beet, Bert Maher, Bruno Korbar, Camilo De La Torre, cyy, D. Khuê Lê-Huu, David Fan, DevPranjal, dgenzel, dgenzel2, Dmitriy Genzel, Drishti Bhasin, Edward Z. Yang, Eli Uriegas, F-G Fernandez, Francisco Massa, Gary Miguel, Gaurav7888, IgorSusmelj, Ishan Kumar, Ivan Kobzarev, Jiawei Liu, Jithun Nair, Joao Gomes, Joe Early, Julien RIPOCHE, julienripoche, Kai Zhang, kingyiusuen, Loi Ly, Matti Picus, Meghan Lele, Muhammed Abdullah, Nicolas Hug, Nikita Shulga, ORippler, peterbell10, Philip Meier, Prabhat Roy, puhuk, Rajat Jaiswal, S Harish, Sahil Goyal, Samuel Gabriel, Santiago Castro, Saswat Das, Sepehr Sameni, Shengwei An, Shrill Shrestha, Shruti Pulstya, Sugato Ray, tanvimoharir, Vasilis Vryniotis, Vassilis C. Nicodemou, Vassilis Nicodemou, vfdev-5, Vincent Moens, Vivek Kumar, Yi Zhang, Yiwen Song, Yonghye Kwon, Yuchen Huang, Zhengxu Chen, Zhiqiang Wang, Zhongkai Zhu, zzk1st

- Python
Published by datumbox over 4 years ago

org.pytorch:torchvision_ops - Minor bugfix release

This release depends on pytorch 1.9.1 No functional changes other than minor updates to CI rules.

- Python
Published by malfet over 4 years ago

org.pytorch:torchvision_ops - iOS support, GPU image decoding, SSDlite and more

This release improves support for mobile, with new mobile-friendly detection models based on SSD and SSDlite, CPU kernels for quantized NMS and quantized RoIAlign, pre-compiled binaries for iOS available in cocoapods and an iOS demo app. It also improves image IO by providing JPEG decoding on the GPU, and many more.

Highlights

[BETA] New models for detection

SSD and SSDlite are two popular object detection architectures which are efficient in terms of speed and provide good results for low resolution pictures. In this release, we provide implementations for the original SSD model with VGG16 backbone and for its mobile-friendly variant SSDlite with MobileNetV3-Large backbone. The models were pre-trained on COCO train2017 and can be used as follows:

```python import torch import torchvision

Original SSD variant

x = [torch.rand(3, 300, 300), torch.rand(3, 500, 400)] mdetector = torchvision.models.detection.ssd300vgg16(pretrained=True) mdetector.eval() predictions = mdetector(x)

Mobile-friendly SSDlite variant

x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)] mdetector = torchvision.models.detection.ssdlite320mobilenetv3large(pretrained=True) mdetector.eval() predictions = mdetector(x) ```

The following accuracies can be obtained on COCO val2017 (full results available in #3403 and #3757):

Model | mAP | mAP@50 | mAP@75 -- | -- | -- | -- SSD300 VGG16 | 25.1 | 41.5 | 26.2 SSDlite320 MobileNetV3-Large | 21.3 | 34.3 | 22.1

[STABLE] Quantized kernels for object detection

The forward pass of the nms and roi_align operators now support tensors with a quantized dtype, which can help lowering the memory footprint of object detection models, particularly on mobile environments.

[BETA] JPEG decoding on the GPU

Decoding jpegs is now possible on GPUs with the use of nvjpeg, which should be readily available in your CUDA setup. The decoding time of a single image should be about 2 to 3 times faster than with libjpeg on CPU. While the resulting tensor will be stored on the GPU device, the input raw tensor still needs to reside on the host (CPU), because the first stages of the decoding process take place on the host:

```python from torchvision.io.image import readfile, decodejpeg

data = readfile('pathtoimage.jpg') # raw data is on CPU img = decodejpeg(data, device='cuda') # decoded image in on GPU ```

[BETA] iOS support

TorchVision 0.10 now provides pre-compiled iOS binaries for its C++ operators, which means you can run Faster R-CNN and Mask R-CNN on iOS. An example app on how to build a program leveraging those ops can be found in here.

[STABLE] Speed optimizations for Tensor transforms

The resize and flip transforms have been optimized and its runtime improved by up to 5x on the CPU. The corresponding PRs were sent to PyTorch in https://github.com/pytorch/pytorch/pull/51653, https://github.com/pytorch/pytorch/pull/54500 and https://github.com/pytorch/pytorch/pull/56713

[STABLE] Documentation improvements

Significant improvements were made to the documentation. In particular, a new gallery of examples is available: see here for the latest version (the stable version is not released at the time of writing). These examples visually illustrate how each transform acts on an image, and also properly documents and illustrate the output of the segmentation models.

The example gallery will be extended in the future to provide more comprehensive examples and serve as a reference for common torchvision tasks.

Backwards Incompatible Changes

[transforms] Ensure input type of normalize is float. (#3621)
[models] Use PyTorch smooth_l1_loss and remove private custom implementation (#3539)

New Features

Added iOS binaries and test app (#3582)(#3629) (#3806)
[datasets] Added KITTI dataset (#3640)
[utils] Added utility to draw segmentation masks (#3330, #3824)
[models] Added the SSD & SSDlite object detection models (#3403, #3757, #3766, #3855, #3896, #3818, #3799)
[transforms] Added antialias option to transforms.functional.resize (#3761, #3810, #3842)
[transforms] Add new max_size parameter to Resize (#3494)
[io] Support for decoding jpegs on GPU with nvjpeg (#3792)
[ci, rocm] Add ROCm to builds (#3840) (#3604) (#3575)
[ops, models.quantization] Add quantized version of NMS (#3601)
[ops, models.quantization] Add quantized version of RoIAlign (#3624, #3904)

Improvement

[build] Various build improvements: (#3618) (#3622) (#3399) (#3794) (#3561)
[ci] Various CI improvements (#3647) (#3609) (#3635) (#3599) (#3778) (#3636) (#3809) (#3625) (#3764) (#3679) (#3869) (#3871) (#3444) (#3445) (#3480) (#3768) (#3919) (#3641)(#3900)
[datasets] Improve error handling in make_dataset (#3496)
[datasets] Remove caching from MNIST and variants (#3420)
[datasets] Make DatasetFolder.find_classes public (#3628)
[datasets] Separate extraction and decompression logic in datasets.utils.extract_archive (#3443)
[datasets, tests] Improve dataset test coverage and infrastructure (#3450) (#3457) (#3454) (#3447) (#3489) (#3661) (#3458 (#3705) (#3411) (#3461) (#3465) (#3543) (#3550) (#3665) (#3464) (#3595) (#3466) (#3468) (#3467) (#3486) (#3736) (#3730) (#3731) (#3477) (#3589) (#3503) (#3423) (#3492)(#3578) (#3605) (#3448) (#3864) (#3544)
[datasets, tests] Fix lazy importing for dataset tests (#3481)
[datasets, tests] Fix test_extract(zip|tar|tar_xz|gzip) on windows (#3542)
[datasets, tests] Fix kwargs forwarding in fake data utility functions (#3459)
[datasets, tests] Properly fix dataset test that passes by accident (#3434)
[documentation] Improve the documentation infrastructure (#3868) (#3724) (#3834) (#3689) (#3700) (#3513) (#3671) (#3490) (#3660) (#3594)
[documentation] Various documentation improvements (#3793) (#3715) (#3727) (#3838) (#3701) (#3923) (#3643) (#3537) (#3691) (#3453) (#3437) (#3732) (#3683) (#3853) (#3684) (#3576) (#3739) (#3530) (#3586) (#3744) (#3645) (#3694) (#3584) (#3615) (#3693) (#3706) (#3646) (#3780) (#3704) (#3774) (#3634)(#3591)(#3807)(#3663)
[documentation, ci] Improve the CI infrastructure for documentation (#3734) (#3837) (#3796) (#3711)
[io] remove deprecated function calls (#3859) (#3858)
[documentation, io] Improve IO docs and expose ImageReadMode in torchvision.io (#3812)
[onnx, models] Replace reshape with flatten in MobileNetV2 (#3462)
[ops, tests] Added test for aligned=True (#3540)
[ops, tests] Add onnx test for batched_nms (#3483)
[tests] Various test improvements (#3548) (#3422) (#3435) (#3860) (#3479) (#3721) (#3872) (#3908) (#2916) (#3917) (#3920) (#3579)
[transforms] add __repr__ for transforms.RandomErasing (#3491)
[transforms, documentation] Adds Documentation for AutoAugmentation (#3529)
[transforms, documentation] Add illustrations of transforms with sphinx-gallery (#3652)
[datasets] Remove pandas dependency for CelebA dataset (#3656, #3698)
[documentation] Add docs for missing datasets (#3536)
[referencescripts] Make reference scripts compatible with submitit (#3785)
[referencescripts] Updated all_gather() to make use of all_gather_object() from PyTorch (#3857)
[datasets] Added dataset download support in fbcode (#3823) (#3826)

Code quality

Remove inconsistent FB copyright headers (#3741)
Keep consistency in classes ConvBNActivation (#3750)
Removed unused imports (#3738, #3740, #3639)
Fixed floor_divide deprecation warnings seen in pytest output (#3672)
Unify onnx and JIT resize implementations (#3654)
Cleaned-up imports in test files related to datasets (#3720)
[documentation] Remove old css file (#3839)
[ci] Fix inconsistent version pinning across yaml files (#3790)
[datasets] Remove redundant path.join in Places365 (#3545)
[datasets] Remove imprecise error handling in PhotoTour dataset (#3488)
[datasets, tests] Remove obsolete test_datasets_transforms.py (#3867)
[models] Making protected params of MobileNetV3 public (#3828)
[models] Make target argument in transform.py truly optional (#3866)
[models] Adding some references on MobileNetV3 implementation. (#3850)
[models] Refactored set_cell_anchors() in AnchorGenerator (#3755)
[ops] Minor cleanup of roi_align_forward_kernel_impl (#3619)
[ops] Replace deprecated AutoNonVariableTypeMode with AutoDispatchBelowADInplaceOrView. (#3786, #3897)
[tests] Port tests to use pytest (#3852, #3845, #3697, #3907, #3749)
[ops, tests] simplify get_script_fn (#3541)
[tests] Use torch.testing.assert_close in out test suite (#3886) (#3885) (#3883) (#3882) (#3881) (#3887) (#3880) (#3878) (#3877) (#3875) (#3888) (#3874) (#3884) (#3876) (#3879) (#3873)
[tests] Clean up test accept behaviour (#3759)
[tests] Remove unused masks variable in test_image.py (#3910)
[transforms] use ternary if in resize (#3533)
[transforms] replaced deprecated call to ByteTensor with from_numpy (#3813)
[transforms] Remove unnecessary casting in adjust_gamma (#3472)

Bugfixes

[ci] set empty cxx flags as default (#3474)
[android][test_app] Cleanup duplicate dependency (#3428)
Remove leftover exception (#3717)
Corrected spelling in a TypeError (#3659)
Add missing device info. (#3651)
Moving tensors to the right device (#3870)
Proper error message (#3725)
[ci, io] Pin JPEG version to resolve the size_t issue on windows (#3787)
[datasets] Make LSUN OS agnostic (#3455)
[datasets] Update squeezenet urls (#3581)
[datasets] Add .item() to the target variable in fakedataset.py (#3587)
[datasets] Fix VOC datasets for 2007 (#3572)
[datasets] Add custom user agent for download_url (#3498)
[datasets] Fix LSUN dataset tests flakyness (#3703)
[datasets] Fix (Fashion|K)MNIST download and MNIST download test (#3557)
[datasets] fix check for exceeded quota on Google Drive (#3710)
[datasets] Fix redirect behavior of datasets.utils.download_url (#3564)
[datasets] Update EMNIST url (#3567)
[datasets] Redirect datasets to correct urls (#3574)
[datasets] Prevent potential bug in DatasetFolder.make_dataset (#3733)
[datasets, tests] Fix redirection in download tests (#3568)
[documentation] Correct the size of returned tensor in comments of ps_roi_pool.py and ps_roi_align.py (#3849)
[io] Fix ternary operator to decide to store an image in Grayscale or RGB (#3553)
[io] Fixed audio-video synchronisation problem in read_video() when using pts as unit (#3791)
[models] Fix bug on detection backbones when trainable_layers == 0 (#3906)
[models] Removed caching of anchors from AnchorGenerator (#3745)
[models] Update weights of classification models with new serialization format to allow proper unpickling (#3620, #3851)
[onnx, ops] Fix roi_align ONNX export (#3355)
[referencescripts] Only sync cuda ifn cuda available (#3674)
[referencescripts] Add checkpoints used for preemption. (#3789)
[transforms] Fix to_tensor for accimage backend (#3439)
[transforms] Make crop work the same for PIL and Tensor (#3770)
[transforms, models, tests] Fix some tests in fbcode (#3686)
[transforms, tests] Fix test_random_autocontrast flakyness (#3699)
[utils] Fix the spacing of labels on draw_bounding_boxes (#3895)
[utils, tests] Fix test_draw_boxes (#3631)

Deprecation

[transforms] Deprecate _transforms_video and _functional_video in favor of transforms (#3441)

Performance

[ops] Improve performance of batched_nms when number of boxes is large (#3426)
[transforms] Speed up equalize transform by using bincount instead of histc (#3493)

Contributors

Aditya Oke, Akshay Kumar, Alessandro Melis, Avijit Dasgupta, Bruno Korbar, Caroline Chen, chengjuzhou, Edgar Andrés Margffoy Tuay, Eli Uriegas, Francisco Massa, Guillem Orellana Trullols, harishsdev, Ivan Kobzarev, Jaesun Park, James Thewlis, Jeff Daily, Jeff Yang, Jithendra Paruchuri, Jon Janzen, KAI ZHAO, Ksenija Stanojevic, Lewis Patten, Matti Picus, moto, Mustafa Bal, Nicolas Hug, Nikhil Kumar, Nikita Shulga, Philip Meier, Prabhat Roy, Sanket Thakur, scott-vsi, Sofiane Abbar, t-rutten, urmi22, Vasilis Vryniotis, vfdev, Yuchen Huang, Zhengyang Feng, Zhiqiang Wang

Thank you!

- Python
Published by fmassa over 4 years ago

org.pytorch:torchvision_ops - Dataset bugfixes

Highlights

This minor release bumps the pinned PyTorch version to v1.8.1, and brings a few bugfixes for datasets, including MNIST download not being available.

Bugfixes

fix VOC datasets for 2007 (#3572)
Update EMNIST url (#3567)
Fix redirect behavior of datasets.utils.download_url (#3564)
Fix MNIST download for minor release (#3559)

- Python
Published by fmassa almost 5 years ago

org.pytorch:torchvision_ops - Mobile support, AutoAugment, improved IO and more

This release introduces improved support for mobile, with new mobile-friendly models, pre-compiled binaries for Android available in maven and an android demo app. It also improves image IO and provides new data augmentations including AutoAugment.

Highlights

Better mobile support

torchvision 0.9 adds support for the MobileNetV3 architecture with pre-trained weights for Classification, Object Detection and Segmentation tasks. It also improves C++ operators so that they can be compiled and run on Android, and we are providing pre-compiled torchvision artifacts published to jcenter. An example application on how to use the torchvision ops on an Android app can be found in here.

Classification

We provide MobileNetV3 variants (including a quantized version) pre-trained on ImageNet 2012. ```python import torch import torchvision

Classification

x = torch.rand(1, 3, 224, 224) mclassifier = torchvision.models.mobilenetv3_large(pretrained=True)

mclassifier = torchvision.models.mobilenetv3_small(pretrained=True)

mclassifier.eval() predictions = mclassifier(x)

Quantized Classification

x = torch.rand(1, 3, 224, 224) mclassifier = torchvision.models.quantization.mobilenetv3large(pretrained=True) mclassifier.eval() predictions = m_classifier(x) ``` The pre-trained models have the following accuracies on ImageNet 2012 val:

| Model | Top-1 Acc | Top-5 Acc | --- | --- | --- | | MobileNetV3 Large | 74.042 | 91.340 | | MobileNetV3 Large (Quantized) | 73.004 | 90.858 | | MobileNetV3 Small | 67.620 | 87.404 |

Object Detection

We provide two variants of Faster R-CNN with MobileNetV3 backbone pre-trained on COCO train2017. They can be obtained as follows ```python import torch import torchvision

Fast Low Resolution Model

x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] mdetector = torchvision.models.detection.fasterrcnnmobilenetv3large320fpn(pretrained=True) mdetector.eval() predictions = mdetector(x)

Highly Accurate High Resolution Model

x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] mdetector = torchvision.models.detection.fasterrcnnmobilenetv3largefpn(pretrained=True) mdetector.eval() predictions = m_detector(x) ``` And yield the following accuracies on COCO val 2017 (full results available in #3265):

| Model | mAP | mAP@50 | mAP@75 | | --- | --- | --- | --- | | Faster R-CNN MobileNetV3-Large 320 FPN | 22.8 | 38.0 | 23.2 | | Faster R-CNN MobileNetV3-Large FPN | 32.8 | 52.5 | 34.3 |

Semantic Segmentation

We also provide pre-trained models for semantic segmentation. The models have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC. ```python import torch import torchvision

Fast Mobile Model

x = torch.rand(1, 3, 520, 520) msegmenter = torchvision.models.segmentation.lrasppmobilenetv3large(pretrained=True) msegmenter.eval() predictions = msegmenter(x)

Highly Accurate Mobile Model

x = torch.rand(1, 3, 520, 520) msegmenter = torchvision.models.segmentation.deeplabv3mobilenetv3large(pretrained=True) msegmenter.eval() predictions = msegmenter(x) ```

The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC (full results in #3276):

| Model | mean IoU | global pixelwise accuracy | | --- | --- | --- | | Lite R-ASPP with Dilated MobileNetV3 Large Backbone | 57.9 | 91.2 | | DeepLabV3 with Dilated MobileNetV3 Large Backbone | 60.3 | 91.2 |

Addition of the AutoAugment method

AutoAugment is a common Data Augmentation technique that can improve the accuracy of Scene Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets.

In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFA10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms: ```python from torchvision import transforms

t = transforms.AutoAugment() transformed = t(image)

transform=transforms.Compose([ transforms.Resize(256), transforms.AutoAugment(), transforms.ToTensor()]) ```

Improved Image IO and on-the-fly image type conversions

All the read and decode methods of the io.image package have been updated to:

Add support for Palette, Grayscale Alpha and RBG Alpha image types during PNG decoding.
Allow the on-the-fly conversion of image from one type to the other during read.

```python from torchvision.io.image import read_image, ImageReadMode

keeps original type, channels unchanged

x1 = read_image("image.png")

converts to grayscale, channels = 1

x2 = read_image("image.png", mode=ImageReadMode.GRAY)

converts to grayscale with alpha transparency, channels = 2

x3 = readimage("image.png", mode=ImageReadMode.GRAYALPHA)

coverts to RGB, channels = 3

x4 = read_image("image.png", mode=ImageReadMode.RGB)

converts to RGB with alpha transparency, channels = 4

x5 = readimage("image.png", mode=ImageReadMode.RGBALPHA) ```

Python 3.9 and CUDA 11.1

This release adds official support for Python 3.9 and CUDA 11.1 (#3341, #3418)

Backwards Incompatible Changes

[Ops] Change default eps value of FrozenBN to better align with nn.BatchNorm (#2933)
[Ops] Remove deprecated newempty_tensor. (#3156)
[Transforms] ColorJitter gets its random params by calling get_params() (#3001)
[Transforms] Change rounding of transforms on integer tensors (#2964)
[Utils] Remove normalize from save_image (#3324)

New Features

[Datasets] Add WiderFace dataset (#2883)
[Models] Add MobileNetV3 architecture:
- Classification Models: (#3354, #3252, #3182, #3242, #3177)
- Object Detection Models: (#3265, #3253, #3223, #3243, #3244, #3248)
- Segmentation Models: (#3276)
- Quantized Models: (#3366, #3323)
[Models] Improve speed/accuracy of FasterRCNN by introducing a score threshold on RPN (#3205)
[Mobile] Add Android gradle project with demo test app (#2897)
[Transforms] Implemented AutoAugment, along with required new transforms + Policies (#3123)
[Ops] Added support of Autocast in all Operators: #2938, #2926, #2922, #2928, #2905, #2906, #2907, #2898
[Ops] Add modulation input for DeformConv2D (#2791)
[IO] Improved io.image with on-the-fly image type conversions: (#3193, #3069, #3024, #2988, #2984)
[IO] Add option to write audio to video file (#2304)
[Utils] Added a utility to draw bounding boxes (#2785, #3296, #3075)

Improvements

Datasets

Concatenate small tensors in video datasets to reduce the use of shared file descriptor (#1795)
Improve testing for datasets (#3336, #3337, #3402, #3412, #3413, #3415, #3416, #3345, #3376, #3346, #3338)
Check if dataset file is located on Google Drive before downloading it (#3245)
Improve Coco implementation (#3417)
Make download_url follow redirects (#3236)
make_dataset as staticmethod of DatasetFolder (#3215)
Add a warning if any clip can't be obtained from a video in VideoClips. (#2513)

Models

Improve error message in AnchorGenerator (#2960)
Disable pretrained backbone downloading if pretrained is True in segmentation models (#3325)
Support for image with no annotations in RetinaNet (#3032)
Change RoIHeads reshape to support empty batches. (#3031)
Fixed typing exception throwing issues with JIT (#3029)
Replace deprecated functional.sigmoid with torch.sigmoid in RetinaNet (#3307)
Assert that inputs are floating point in Faster R-CNN normalize method (#3266)
Speedup RetinaNet's postprocessing (#2828)

Ops

Added eps in the __repr__ of FrozenBN (#2852)
Added __repr__ to MultiScaleRoIAlign (#2840)
Exposing LevelMapper params in MultiScaleRoIAlign (#3151)
Enable autocast for all operators and let them use the dispatcher (#2926, #2922, #2928, #2898)

Transforms

adjust_hue now accepts tensors with one channel (#3222)
Add fill color support for tensor affine transforms (#2904)
Remove torchscript workaround for center_crop (#3118)
Improved error message for RandomCrop (#2816)

IO

Enabling to import read_file and the other methods from torchvision.io (#2918)
accept python bytes in _read_video_from_memory() (#3347)
Enable rtmp timeout in decoder (#3076)
Specify tls cert file to decoder through config (#3289, #3374)
Add UUID in LOG() in decoder (#3080)

References

Add weight averaging and storing methods in references utils (#3352)
Adding Preset Transforms in reference scripts (#3317)
Load variables when --resume /path/to/checkpoint --test-only (#3285)
Updated video classification ref example with new transforms (#2935)

Misc

Various documentation improvements (#3039, #3271, #2820, #2808, #3131, #3062, #3061, #3000, #3299, #3400, #2899, #2901, #2908, #2851, #2909, #3005, #2821, #2957, #3360, #3019, #3124, #3217, #2879, #3234, #3180, #3425, #2979, #2935, #3298, #3268, #3203, #3290, #3295, #3200, #2663, #3153, #3147, #3232)
The documentation infrastructure was improved, in particular the docs are now built on every PR and uploaded to CircleCI (#3259, #3378, #3408, #3373, #3290)
Avoid some deprecation warnings from PyTorch (#3348)
Ensure operators are added in C++ (#2798, #3091, #3391)
Fixed compilation warnings on C++ codebase (#3390)
CI Improvements (#3401, #3329, #2990, #2978, #3189, #3230, #3254, #2844, #2872, #2825, #3144, #3137, #2827, #2848, #2914, #3419, #2895, #2837)
Installation improvements (#3302, #2969, #3113, #3202)
CMake improvements (#2801, #2805, #3212, #3381)

Mobile

Add Torch Selective macros in all C++ Ops for better support on mobile (#3218)

Code Quality, testing

[BC-breaking] Modernized C++ codebase & made it mobile-friendly (25% faster to compile): #2885, #2891, #2892, #2893, #2905, #2906, #2907, #2938, #2944, #2945, #3011, #3020, #3097, #3105, #3134, #3135, #3143, #3146, #3154, #3156, #3163, #3218, #3308, #3311, #3312, #3326, #3350, #3390
Cleaned up Python codebase & made it more Pythonic: #3263, #3239, #3059, #3055, #3045, #3382, #3159, #3171
Improve type annotations (#3288, #3045, #2862, #2858, #2857, #2863, #2865, #2856, #2860, #2864, #2875, #2859, #2854, #2861, #3174, #3059)
Code refactoring and static analysis improvements (#3379, #3335, #3229, #3204, #3095)
Miscellaneous test improvements (#2966, #2965, #3018, #3035, #2961, #2806, #2812, #2815, #2834, #2874, #3099, #3092, #3160, #3103, #2971, #3023, #2803, #3136, #3319, #3310, #3287, #3033, #2983, #3386, #3369, #3116, #2985, #3320)

Bug Fixes

[DATASETS] Fixes EMNIST split and label issues (#2673)
[DATASETS] Fix overflow in STL10 fold reading (#3353)
[MODELS] Fix incorrectly frozen BN on ResNet FPN backbone (#3396)
[MODELS] Fix scriptability support in Inception V3 (#2976)
[MODELS] Changed default value of eps in FrozenBatchNorm to match BatchNorm: #2940 #2933
[MODELS] Fixed warning in models.detection.transforms.resize_image_and_masks. (#3237)
[MODELS] Fix trainable_layers on RetinaNet (#3234)
[MODELS] Fix ShuffleNetV2 ONNX model export issue. (#3158)
[UTILS] Fixes no grad and range bugs in utils. (#3269)
[UTILS] make_grid uses a more correct normalization (#2967)
[OPS] fix GET_THREADS() for ROCm with DeformConv (#2997)
[OPS] Fix NMS and IoU overflows for fp16 (#3383, #3382)
[OPS] Fix ops registration on windows (#3380)
[OPS] Fix initialisation bug on FeaturePyramidNetwork (#2954)
[IO] Replace hardcoded error code with ENODATA (#3277)
[REFERENCES] Fix repeated UserWarning and add more flexibility to reference code for segmentation tasks (#2886)
[TRANSFORMS] Fix default fill value in RandomRotation (#3303)
[TRANSFORMS] Correct aspect ratio sampling in transforms.RandomErasing (#3344)
[TRANSFORMS] Fix CenterCrop for Tensor size is greater than imgsize (#3333)
[TRANSFORMS] Functional to_tensor returns float tensor of default dtype (#3398)
[TRANSFORMS] Add explicit check for number of channels (#3013)
[TRANSFORMS] pil_to_tensor with accimage backend now return uint8 (#3109)
[TRANSFORMS] Fix potential overflow in convert_image_dtype (#3107)
[TRANSFORMS] Check num of channels on adjust*_ transformations (#3069)

Deprecations

[TRANSFORMS] Introduced InterpolationModes and deprecated arguments: resample and fillcolor (#2952, #3055)

- Python
Published by fmassa almost 5 years ago

org.pytorch:torchvision_ops - Python 3.9 support and bugfixes

This minor release bumps the pinned PyTorch version to v1.7.1, and contains some minor improvements.

Highlights

Python 3.9 support

This releases add native binaries for Python 3.9 #3063

Bugfixes

Make readfile and writefile accept unicode strings on Windows #2949
Replaced tuple creation by one acceptable by majority of compilers #2937
Add docs for focal_loss #2979

- Python
Published by fmassa about 5 years ago

org.pytorch:torchvision_ops - Added version suffix back to package

Issues resolved:

Cannot pip install torchvision==0.8.0+cu110 - https://github.com/pytorch/vision/issues/2912

- Python
Published by seemethere over 5 years ago

org.pytorch:torchvision_ops - Improved transforms, native image IO, new video API and more

This release brings new additions to torchvision that improves support for model deployment. Most notably, transforms in torchvision are now torchscript-compatible, and can thus be serialized together with your model for simpler deployment. Additionally, we provide native image IO with torchscript support, and a new video reading API (released as Beta) which is more flexible than torchvision.io.read_video.

Highlights

Transforms now support Tensor, batch computation, GPU and TorchScript

torchvision transforms are now inherited from nn.Module and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimension and work seamlessly on CPU/GPU devices: ```python import torch import torchvision.transforms as T

to fix random seed, use torch.manual_seed

instead of random.seed

torch.manual_seed(12)

transforms = torch.nn.Sequential( T.RandomCrop(224), T.RandomHorizontalFlip(p=0.3), T.ConvertImageDtype(torch.float), T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ) scripted_transforms = torch.jit.script(transforms)

Note: we can similarly use T.Compose to define transforms

transforms = T.Compose([...]) and

scripted_transforms = torch.jit.script(torch.nn.Sequential(*transforms.transforms))

tensor_image = torch.randint(0, 256, size=(3, 256, 256), dtype=torch.uint8)

works directly on Tensors

outimage1 = transforms(tensorimage)

on the GPU

outimage1cuda = transforms(tensor_image.cuda())

with batches

batchedimage = torch.randint(0, 256, size=(4, 3, 256, 256), dtype=torch.uint8) outimagebatched = transforms(batchedimage)

and has torchscript support

outimage2 = scriptedtransforms(tensor_image) ``` These improvements enable the following new features:

support for GPU acceleration
batched transformations e.g. as needed for videos
transform multi-band torch tensor images (with more than 3-4 channels)
torchscript transforms together with your model for deployment

Note: Exceptions for TorchScript support includes Compose, RandomChoice, RandomOrder, Lambda and those applied on PIL images, such as ToPILImage.

Native image IO for JPEG and PNG formats

torchvision 0.8.0 introduces native image reading and writing operations for JPEG and PNG formats. Those operators support TorchScript and return CxHxW tensors in uint8 format, and can thus be now part of your model for deployment in C++ environments.

```python from torchvision.io import read_image

tensor_image is a CxHxW uint8 Tensor

tensorimage = readimage('pathtoimage.jpeg')

or equivalently

from torchvision.io.image import readfile, decodeimage

raw_data is a 1d uint8 Tensor with the raw bytes

rawdata = readfile('pathtoimage.jpeg') tensorimage = decodeimage(raw_data)

all operators are torchscriptable and can be

serialized together with your model torchscript code

scriptedreadimage = torch.jit.script(read_image) ```

New detection model

[BETA] New Video Reader API

This release introduces a new video reading abstraction, which gives more fine-grained control on how to iterate over the videos. It supports image and audio, and implements an iterator interface so that it can be combined with the rest of the python ecosystem, such as itertools.

```python from torchvision.io import VideoReader

stream indicates if reading from audio or video

reader = VideoReader('pathtovideo.mp4', stream='video')

can change the stream after construction

via reader.setcurrentstream

to read all frames in a video starting at 2 seconds

for frame in reader.seek(2): # frame is a dict with "data" and "pts" metadata print(frame["data"], frame["pts"])

because reader is an iterator you can combine it with

itertools

from itertools import takewhile, islice

read 10 frames starting from 2 seconds

for frame in islice(reader.seek(2), 10): pass

or to return all frames between 2 and 5 seconds

for frame in takewhile(lambda x: x["pts"] < 5, reader.seek(2)): pass ```

Note: In order to use the Video Reader API, you need to compile torchvision from source and make sure that you have ffmpeg installed in your system. Note: the VideoReader API is currently released as beta and its API can change following user feedback.

Backwards Incompatible Changes

[Transforms] Random seed now should be set with torch.manual_seed instead of random.seed (#2292)
[Transforms] RandomErasing.get_params function’s argument was previously value=0 and is now value=None which is interpreted as Gaussian random noise (#2386)
[Transforms] RandomPerspective and F.perspective changed the default value of interpolation to be BILINEAR instead of BICUBIC (#2558, #2561)
[Transforms] Fixes incoherence in affine transformation when center is defined as half image size + 0.5 (#2468)

New Features

[Ops] Added focal loss (#2784)
[Ops] Added bounding boxes conversion function (#2710, #2737)
[Ops] Added Generalized IOU (#2642)
[Models] Added RetinaNet object detection model (#2784)
[Datasets] Added Places365 dataset (#2610, #2625)
[Transforms] Added GaussianBlur transform (#2658)
[Transforms] Added torchscript, batch and GPU and tensor support for transforms (#2769, #2767, #2749, #2755, #2485, #2721, #2645, #2694, #2584, #2661, #2566, #2345, #2342, #2356, #2368, #2373, #2496, #2553, #2495, #2561, #2518, #2478, #2459, #2444, #2396, #2401, #2394, #2586, #2371, #2477, #2456, #2628, #2569, #2639, #2620, #2595, #2456, #2403, #2729)
[Transforms] Added example notebook for tensor transforms (#2730)
[IO] Added JPEG/PNG encoding / decoding ops
- JPEG (#2388, #2471, #2696, #2725)
- PNG (#2382, #2726, #2398, #2457, #2735)
- decode_image (#2680, #2695, #2718, #2764, #2766)
[IO] Added file reading / writing ops (#2728, #2765, #2768)
[IO] [BETA] Added new VideoReader API (#2683, #2781, #2778, #2802, #2596, #2612, #2734, #2770)

Improvements

Datasets

Added error message if Google Drive download quota is exceeded (#2321)
Optimized LSUN initialization time by only pulling keys from db (#2544)
Use more precise return type for gzip.open() (#2792)
Added UCF101 dataset tests (#2548)
Added download tests on a schedule (#2665, #2675, #2699, #2706, #2747, #2731)
Added typehints for datasets (#2487, #2521, #2522, #2523, #2524, #2526, #2528, #2529, #2525, #2527, #2530, #2533, #2534, #2535, #2536, #2532, #2538, #2537, #2539, #2531, #2540, #2667)

Models

Removed hard coded value in DeepLabV3 (#2793)
Changed the anchor generator default argument to an equivalent one (#2722)
Moved model construction location in resnet_fpn_backbone into after docstring (#2482)
Partially enabled type hints for models (#2668)

Ops

Moved RoIs shape check to C++ (#2794)
Use autocast built-in cast-helper functions (#2646)
Adde type annotations for torchvision.ops (#2331, #2462)

References

[References] Removed redundant target send to device in detection evaluation (#2503)
[References] Removed obsolete import in segmentation. (#2399)

Misc

[Transforms] Added support for negative padding in pad (#2744)
[IO] Added type hints for torchvision.io (#2543)
[ONNX] Export ROIAlign with aligned=True (#2613)

Internal

[Binaries] Added CUDA 11 binary builds (#2671)
[Binaries] Added DEBUG=1 option to build torchvision (#2603)
[Binaries] Unpin ninja version (#2358)
Warn if torchvision imported from repo root (#2759)
Added compatibility checks for C++ extensions (#2467)
Added probot (#2448)
Added ipynb to git attributes file (#2772)
CI improvements (#2328, #2346, #2374, #2437, #2465, #2579, #2577, #2633, #2640, #2727, #2754, #2674, #2678)
CMakeList improvements (#2739, #2684, #2626, #2585, #2587)
Documentation improvements (#2659, #2615, #2614, #2542, #2685, #2507, #2760, #2550, #2656, #2723, #2601, #2654, #2757, #2592, #2606)

Bug Fixes

[Ops] Fixed crash in deformable convolutions (#2604)
[Ops] Added empty batch support for DeformConv2d (#2782)
[Transforms] Enforced contiguous output in to_tensor (#2483)
[Transforms] Fixed fill parameter for PIL pad (#2515)
[Models] Fixed deprecation warning in nonzero for R-CNN models (#2705)
[IO] Explicitly cast to size_t in video decoder (#2389)
[ONNX] Fixed dynamic resize in Mask R-CNN (#2488)
[C++ API] Fixed function signatures for torch::nn::Functional (#2463)

Deprecations

[Transforms] Deprecated dedicated implementations functional_tensor of F_t.center_crop, F_t.five_crop, F_t.ten_crop, as they can be implemented as a function of crop (#2568)
[Transforms] Deprecated explicit usage of F_pil and F_t functions, users should instead use the general functional API (#2664)

- Python
Published by fmassa over 5 years ago

org.pytorch:torchvision_ops - Mixed precision training, new models and improvements

Highlights

Mixed precision support for all models

torchvision models now support mixed-precision training via the new torch.cuda.amp package. Using mixed precision support is easy: just wrap the model and the loss inside a torch.cuda.amp.autocast context manager. Here is an example with Faster R-CNN:

```python import torch, torchvision

device = torch.device('cuda')

model = torchvision.models.detection.fasterrcnnresnet50fpn() model.to(device)

input = [torch.rand(3, 300, 400, device=device)] boxes = torch.rand((5, 4), dtype=torch.float32, device=device) boxes[:, 2:] += boxes[:, :2] target = [{"boxes": boxes, "labels": torch.zeros(5, dtype=torch.int64, device=device), "image_id": 4, "area": torch.zeros(5, dtype=torch.float32, device=device), "iscrowd": torch.zeros((5,), dtype=torch.int64, device=device)}]

use automatic mixed precision

with torch.cuda.amp.autocast(): lossdict = model(input, target) losses = sum(loss for loss in lossdict.values())

perform backward outside of autocast context manager

losses.backward() ```

New pre-trained segmentation models

This releases adds pre-trained weights for the ResNet50 variants of Fully-Convolutional Networks (FCN) and DeepLabV3. They are available under torchvision.models.segmentation, and can be obtained as follows: python torchvision.models.segmentation.fcn_resnet50(pretrained=True) torchvision.models.segmentation.deeplabv3_resnet50(pretrained=True)

They obtain the following accuracies: Network | mean IoU | global pixelwise acc -- | -- | -- FCN ResNet50 | 60.5 | 91.4 DeepLabV3 ResNet50 | 66.4 | 92.4

Improved ONNX support for Faster / Mask / Keypoint R-CNN

This release restores ONNX support for the R-CNN family of models that had been temporarily dropped in the 0.6.0 release, and additionally fixes a number of corner cases in the ONNX export for these models. Notable improvements includes support for dynamic input shape exports, including images with no detections.

Backwards Incompatible Changes

[Transforms] Fix for integer fill value in constant padding (#2284)
[Models] Replace L1 loss with smooth L1 loss in Faster R-CNN for better performance (#2113)
[Transforms] Use torch.rand instead of random.random() for random transforms (#2520)

New Features

[Models] Add mixed-precision support (#2366, #2384)
[Models] Add fcn_resnet50 and deeplabv3_resnet50 pretrained models. (#2086, #2091)
[Ops] Added eps attribute to FrozenBatchNorm2d (#2190)
[Transforms] Add convert_image_dtype to functionals (#2078)
[Transforms] Add pil_to_tensor to functionals (#2092)

Bug Fixes

[JIT] Fix virtualenv and torchhub support by removing eager scripting calls (#2248)
[IO] Fix write_video when floating point FPS is passed (#2334)
[IO] Fix missing compilation files for video-reader (#2183)
[IO] Fix missing include for OSX in video decoder (#2224)
[IO] Fix overflow error for large buffers. (#2303)
[Ops] Fix wrong clamping in RoIAlign with aligned=True (#2438)
[Ops] Fix corner case in interpolate (#2146)
[Ops] Fix the use of contiguous() in C++ kernels (#2131)
[Ops] Restore support of tuple of Tensors for region pooling ops (#2199)
[Datasets] Fix bug related with trailing slash on UCF-101 dataset (#2186)
[Models] Make copy of targets in GeneralizedRCNNTransform (#2227)
[Models] Fix DenseNet issue with gradient checkpoints (#2236)
[ONNX] Fix ONNX implementation ofheatmaps_to_keypoints in KeypointRCNN (#2312)
[ONNX] Fix export of images with no detection for Faster / Mask / Keypoint R-CNN (#2126, #2215, #2272)

Deprecations

[Ops] Deprecate Conv2d, ConvTranspose2d and BatchNorm2d (#2244)
[Ops] Deprecate interpolate in favor of PyTorch's implementation (#2252)

Improvements

Datasets

Fix DatasetFolder error message (#2143)
Change range(len) to enumerate in DatasetFolder (#2153)
[DOC] Fix link URL to Flickr8k (#2178)
[DOC] Add CelebA to docs (#2107)
[DOC] Improve documentation of DatasetFolder and ImageFolder (#2112)

TorchHub

Fix torchhub tests due to numerical changes in torch.sum (#2361)
Add all the latest models to hubconf (#2189)

Transforms

Add fill argument to __repr__ of RandomRotation (#2340)
Add tensor support for adjust_hue (#2300, #2355)
Make ColorJitter torchscriptable (#2298)
Make RandomHorizontalFlip and RandomVerticalFlip torchscriptable (#2282)
[DOC] Use consistent symbols in the doc of Normalize to avoid confusion (#2181)
[DOC] Fix typo in hflip in functional.py (#2177)
[DOC] Fix spelling errors in functional.py (#2333)

IO

Refactor video.py to improve clarity (#2335)
Save memory by not storing full frames in read_video_timestamps (#2202, #2268)
Improve warning when video_reader backend is not available (#2225)
Set should_buffer to True by default in _read_from_stream (#2201)
[Test] Temporarily disable one PyAV test (#2150)

Models

Improve target checks in GeneralizedRCNN (#2207, #2258)
Use Module objects instead of functions for some layers of Inception3 (#2287)
Add support for other normalizations in MobileNetV2 (#2267)
Expose layer freezing option to detection models (#2160, #2242)
Make ASPP-Layer in DeepLab more generic (#2174)
Faster initialization for Inception family of models (#2170, #2211)
Make norm_layer as parameters in models/detection/backbone_utils.py (#2081)
Updates integer division to use floor division operator (#2234, #2243)
[JIT] Clean up no longer needed workarounds for torchscript support (#2249, #2261, #2210)
[DOC] Add docs to clarify aspect ratio definition in RPN. (#2185)
[DOC] Fix roi_heads argument name in doctstring of GeneralizedRCNN (#2093)
[DOC] Fix type annotation in RPN docstring (#2149)
[DOC] add clarifications to Object detection reference documentation (#2241)
[Test] Add tests for negative samples for Mask R-CNN and Keypoint R-CNN (#2069)

Reference scripts

Add support for SyncBatchNorm in QAT reference script (#2230, #2280)
Fix training resuming in references/segmentation (#2142)
Rename image to images in references/detection/engine.py (#2187)

ONNX

Add support for dynamic input shape export in R-CNN models (#2087)

Ops

Added number of features in FrozenBatchNorm2d __repr__ (#2168)
improve consistency among box IoU CPU / GPU calculations (#2072)
Avoid using in header files (#2257)
Make ceil_div __host__ __device__ (#2217)
Don't include CUDAApplyUtils.cuh (#2127)
Add namespace to avoid conflict with ATen version of channel_shuffle() (#2206)
[DOC] Update the statement of supporting torchscript ops (#2343)
[DOC] Update torchvision ops in doc (#2341)
[DOC] Improve documentation for NMS (#2159)
[Test] Add more tests to NMS (#2279)

Misc

Add PyTorch version compatibility table to README (#2260)
Fix lint (#2182, #2226, #2070)
Update version to 0.6.0 in CMake (#2140)
Remove mock (#2096)
Remove warning about deprecated (#2064)
Cleanup unused import (#2067)
Type annotations for torchvision/utils.py (#2034)

CI

Add version suffix to build version
Add backslash to escape
Add workflows to run on tag
Bump version to 0.7.0, pin PyTorch to 1.6.0
Update link for cudnn 10.2 (#2277)
Fix binary builds with CUDA 9.2 on Windows (#2273)
Remove Python 3.5 from CI (#2158)
Improvements to CI infra (#2075, #2071, #2058, #2073, #2099, #2137, #2204, #2264, #2274, #2319)
Master version bump 0.6 -> 0.7 (#2102)
Add test channels for pytorch version functions (#2208)
Add static type check with mypy (#2195, #1696, #2247)

- Python
Published by fmassa over 5 years ago

org.pytorch:torchvision_ops - v0.6.1

Highlights

Bump pinned PyTorch version to v1.5.1

- Python
Published by seemethere over 5 years ago

org.pytorch:torchvision_ops - Drop Python 2 support, several improvements and bugfixes

This release is the first one that officially drops support for Python 2. It contains a number of improvements and bugfixes.

Highlights

Faster/Mask/Keypoint RCNN supports negative samples

It is now possible to feed training images to Faster / Mask / Keypoint R-CNN that do not contain any positive annotations. This enables increasing the number of negative samples during training. For those images, the annotations expect a tensor with 0 in the number of objects dimension, as follows: python target = {"boxes": torch.zeros((0, 4), dtype=torch.float32), "labels": torch.zeros(0, dtype=torch.int64), "image_id": 4, "area": torch.zeros(0, dtype=torch.float32), "masks": torch.zeros((0, image_height, image_width), dtype=torch.uint8), "keypoints": torch.zeros((17, 0, 3), dtype=torch.float32), "iscrowd": torch.zeros((0,), dtype=torch.int64)}

Aligned flag for RoIAlign

RoIAlign now supports the aligned flag, which aligns more precisely two neighboring pixel indices.

Refactored abstractions for C++ video decoder

This change is transparent to Python users, but the whole C++ backend for video reading (which needs torchvision to be compiled from source for it to be enabled for now) has been refactored into more modular abstractions. The core abstractions are in https://github.com/pytorch/vision/tree/master/torchvision/csrc/cpu/decoder, and the video reader functions exposed to Python, by leveraging those abstractions, can be written in a much more concise way

Backwards Incompatible Changes

Dropping Python2 support (#1761, #1792, #1984, #1976, #2037, #2033, #2017)
[Models] Fix inception quantized pre-trained model (#1954, #1969, #1975)
ONNX support for Mask R-CNN and Keypoint R-CNN has been temporarily dropped, but will be fixed in next releases

New Features

[Transforms] Add Perspective fill option (#1973)
[Ops] aligned flag in ROIAlign (#1908)
[IO] Update video reader to use new decoder (#1978)
[IO] torchscriptable functions for video io (#1653, #1794)
[Models] Support negative samples in Faster R-CNN, Mask R-CNN and Keypoint R-CNN (#1911, #2069)

Improvements

Datasets

STL10: don't check integrity twice when download=True (#1787)
Improve code readability and docstring of video datasets(#2020)
[DOC] Fixed typo in Cityscapes docs (#1851)

Transforms

Allow passing list to the input argument 'scale' of RandomResizedCrop (#1997) (#2008)
F.normalize unsqueeze mean & std only for 1-d arrays (#2002)
Improved error messages for transforms.functional.normalize(). (#1915)
generalize number of bands calculation in to_tensor (#1781)
Replace 2 transpose ops with 1 permute in ToTensor(#2018)
Fixed Pillow version check for Pillow >= 10 (#2039)
[DOC]: Improve transforms.Normalize docs (#1784, #1858)
[DOC] Fixed missing new line in transforms.Crop docstring (#1922)

Ops

Check boxes shape in RoIPool / Align (#1968)
[ONNX] Export newemptytensor (#1733)
Fix Tensor::data<> deprecation. (#2028)
Fix deprecation warnings (#2055)

Models

Add warning and note docs for scipy (#1842) (#1966)
Added repr attribute to GeneralizedRCNNTransform (#1834)
Replace mean on dimensions 2,3 by adaptiveavgpooling2d in mobilenet (#1838)
Add init_weights keyword argument to Inception3 (#1832)
Add device to torch.tensor. (#1979)
ONNX export for variable input sizes in Faster R-CNN (#1840)
[JIT] Cleanup torchscript constant annotations (#1721, #1923, #1907, #1727)
[JIT] use // now that it is supported (#1658)
[JIT] add @torch.jit.script to ImageList (#1919)
[DOC] Improved docs for Faster R-CNN (#1886, #1868, #1768, #1763)
[DOC] add comments for the modified implementation of ResNet (#1983)
[DOC] Add comments to AnchorGenerator (#1941)
[DOC] Add comment in GoogleNet (#1932)

Documentation

Document int8 quantization model (#1951)
Update Doc with ONNX support (#1752)
Update README to reflect strict dependency on torch==1.4.0 (#1767)
Update sphinx theme (#2031)
Document origin of preprocessing mean / std (#1965)
Fix docstring formatting issues (#2049)

Reference scripts

Add return statement in evaluate function of detection reference script (#2029)
[DOC]Add default training parameters to classification reference README (#1998)
[DOC] Add README to references/segmentation (#1864)

Tests

Improve stability of testnmscuda (#2044)
[ONNX] Disable model tests since export of interpolate script module is broken (#1989)
Skip inception v3 in test/testquantizedmodels (#1885)
[LINT] Small indentation fix (#1831)

Misc

Remove unintentional -O0 option in setup.py (#1770)
Create CODEOFCONDUCT.md
Update issue templates (#1913, #1914)
master version bump 0.5 → 0.6
replace torch 1.5.0 items flagged with deprecation warnings (fix #1906) (#1918)
CUDASUFFIX → PYTORCHVERSION_SUFFIX

CI

Remove av from the binary requirements (#2006)
ci: Add cu102 to CI and packaging, remove cu100 (#1980)
.circleci: Switch to use token for conda uploads (#1960)
Improvements to CI infra (#2051, #2032, #2046, #1735, #2048, #1789, #1731, #1961)
typing only needed for python 3.5 and previous (#1778)
Move C++ and Python linter to CircleCI (#2056, #2057)

Bug Fixes

Datasets

bug fix on downloading voc2007 test dataset (#1991)
fix lsun docstring example (#1935)
Fixes EMNIST classes attribute is wrong #1716 (#1736)
Force object annotation to be a list in VOC (#1790)

Models

Fix for AnchorGenerator when device switch happen (#1745)
[JIT] fix len error (#1981)
[JIT] fix googlenet no aux logits (#1949)
[JIT] Fix quantized googlenet (#1974)

Transforms

Fix for rotate fill with Images of type F (#1828)
Fix fill in rotate (#1760)

Ops

Fix bug in DeformConv2d for batch sizes > 32 (#2027, #2040)
Fix for roi_align ONNX export (#1988)
Fix torchscript issue in ConvTranspose2d (#1917)
Fix interpolate when no scale_factor is passed (#1785)
Fix Windows build by renaming Python init functions (#1779)
fix for loading models with numbatchestracked in frozen bn (#1728)

Deprecations

the ptsunit of pts from readvideo and readvideotimestamp is deprecated, and will be replaced in next releases with seconds.

- Python
Published by fmassa almost 6 years ago

org.pytorch:torchvision_ops - Towards better research to production support

This release brings several new additions to torchvision that improves support for deployment. Most notably, all models in torchvision are torchscript-compatible, and can be exported to ONNX. Additionally, a few classification models have quantized weights.

Note: this is the last version of torchvision that officially supports Python 2.

Breaking changes

Updated KeypointRCNN pre-trained weights

The pre-trained weights for keypointrcnnresnet50fpn have been updated and now correspond to the results reported in the documentation. The previous weights corresponded to an intermediate training checkpoint. (#1609)

Corrected the implementation for MNASNet

The previous implementation contained a bug which affects all MNASNet variants other than mnasnet10. The bug was that the first few layers needed to also be scaled in terms of width multiplier, along with all the rest. We now provide a new checkpoint for mnasnet05, which gives 32.17 top1 error. (#1224)

Highlights

TorchScript support for all models

All models in torchvision have native support for torchscript, for both training and testing. This includes complex models such as DeepLabV3, Mask R-CNN and Keypoint R-CNN. Using torchscript with torchvision models is easy: ```python

get a pre-trained model

model = torchvision.models.detection.maskrcnnresnet50fpn(pretrained=True)

convert to torchscript

modelscript = torch.jit.script(model) modelscript.eval()

compute predictions

predictions = model_script([torch.rand(3, 300, 300)]) ```

Warning: the return type for the scripted version of Faster R-CNN, Mask R-CNN and Keypoint R-CNN is different from its eager counterpart, and it always returns a tuple of losses, detections. This discrepancy will be addressed in a future release.

ONNX

All models in torchvision can now be exported to ONNX for deployment. This includes models such as Mask R-CNN. ```python

get a pre-trained model

model = torchvision.models.detection.maskrcnnresnet50fpn(pretrained=True) model.eval() inputs = [torch.rand(3, 300, 300)] predictions = model(inputs)

convert to ONNX

torch.onnx.export(model, inputs, "model.onnx", doconstantfolding=True, opsetversion=11 # opsetversion 11 required for Mask R-CNN ) ``` Warning: for Faster R-CNN / Mask R-CNN / Keypoint R-CNN, the current exported model is dependent on the input shape during export. As such, make sure that once the model has been exported to ONNX that all images that are fed to it have the same shape as the shape used to export the model to ONNX. This behavior will be made more general in a future release.

Quantized models

torchvision now provides quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2, as well as reference scripts for quantizing your own model in references/classification/trainquantization.py (https://github.com/pytorch/vision/blob/master/references/classification/trainquantization.py). Obtaining a pre-trained quantized model can be obtained with a few lines of code: ```python model = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True) model.eval()

run the model with quantized inputs and weights

out = model(torch.rand(1, 3, 224, 224)) ```

We provide pre-trained quantized weights for the following models:

| Model | Acc@1 | Acc@5 | | --- | --- | --- | MobileNet V2 | 71.658 | 90.150 ShuffleNet V2: | 68.360 | 87.582 ResNet 18 | 69.494 | 88.882 ResNet 50 | 75.920 | 92.814 ResNext 101 32x8d | 78.986 | 94.480 Inception V3 | 77.084 | 93.398 GoogleNet | 69.826 | 89.404

Torchscript support for torchvision.ops

torchvision ops are now natively supported by torchscript. This includes operators such as nms, roialign and roipool, and for the ops that support backpropagation, both eager and torchscript modes are supported in autograd.

New operators

Deformable Convolution (#1586) (#1660) (#1637)

As described in Deformable Convolutional Networks (https://arxiv.org/abs/1703.06211), torchvision now supports deformable convolutions. The model expects as input both the input as well as the offsets, and can be used as follows: ```python from torchvision import ops

module = ops.DeformConv2d(inchannels=1, outchannels=1, kernel_size=3, padding=1) x = torch.rand(1, 1, 10, 10)

number of channels for offset should be a multiple

of 2 * module.weight.size[2] * module.weight.size[3], which correspond

to the kernel_size

offset = torch.rand(1, 2 * 3 * 3, 10, 10)

the output requires both the input and the offsets

out = module(x, offset) ```

If needed, the user can create their own wrapper module that imposes constraints on the offset. Here is an example, using a single convolution layer to compute the offset:

```python class BasicDeformConv2d(nn.Module): def init(self, inchannels, outchannels, kernelsize=1, stride=1, dilation=1, groups=1, offsetgroups=1): super().init() offsetchannels = 2 * kernelsize * kernelsize self.conv2doffset = nn.Conv2d( inchannels, offsetchannels * offsetgroups, kernelsize=3, stride=stride, padding=dilation, dilation=dilation, ) self.conv2d = ops.DeformConv2d( inchannels, outchannels, kernelsize=kernelsize, stride=stride, padding=dilation, dilation=dilation, groups=groups, bias=False )

def forward(self, x):
    offset = self.conv2d_offset(x)
    return self.conv2d(x, offset)

```

Position-sensitive RoI Pool / Align (#1410)

Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN (https://arxiv.org/abs/1711.07264). These are available under ops.psroialign, psroipool and the module equivalents ops.PSRoIAlign and ops.PSRoIPool, and have the same interface as RoIAlign / RoIPool.

New Features

TorchScript support

Bugfix in BalancedPositiveNegativeSampler introduced during torchscript support (#1670)
Make R-CNN models less verbose in script mode (#1671)
Minor torchscript fixes for Mask R-CNN (#1639)
remove BC-breaking changes (#1560)
Make maskrcnn scriptable (#1407)
Add Script Support for Video Resnet Models (#1393)
fix ASPPPooling (#1575)
Test that torchhub models are scriptable (#1242)
Make Googlnet & InceptionNet scriptable (#1349)
Make fcn_resnet Scriptable (#1352)
Make Densenet Scriptable (#1342)
make resnext scriptable (#1343)
make shufflenet and resnet scriptable (#1270)

ONNX

Enable KeypointRCNN test (#1673)
enable mask rcnn test (#1613)
Changes to Enable KeypointRCNN ONNX Export (#1593)
Disable Profiling in Failing Test (#1585)
Enable ONNX Test for FasterRcnn (#1555)
Support Exporting Mask Rcnn to ONNX (#1461)
Lahaidar/export faster rcnn (#1401)
Support Exporting RPN to ONNX (#1329)
Support Exporting MultiScaleRoiAlign to ONNX (#1324)
Support Exporting GeneralizedRCNNTransform to ONNX (#1325)

Quantization

Update quantized shufflenet weights (#1715)
Add commands to run quantized model with pretrained weights (#1547)
Quantizable googlenet, inceptionv3 and shufflenetv2 models (#1503)
Quantizable resnet and mobilenet models (#1471)
Remove model download from testquantizedmodels (#1526)

Improvements

Bugfixes

Bugfix on GroupedBatchSampler for corner case where there are not enough examples in a category to form a batch (#1677)
Fix rpn memory leak and dataType errors. (#1657)
Fix torchvision install due to zippeg egg (#1536)

Transforms

Make shear operation area preserving (#1529)
PILLOW_VERSION deprecation updates (#1501)
Adds optional fill colour to rotate (#1280)

Ops

Add Deformable Convolution operation. (#1586) (#1660) (#1637)
Fix inconsistent NMS implementation between CPU and CUDA (#1556)
Speed up nms_cuda (#1704)
Implementation for Position-sensitive ROI Pool/Align (#1410)
Remove cpp extensions in favor of torch ops (#1348)
Make custom ops differentiable (#1314)
Fix Windows build in Torchvision Custom op Registration (#1320)
Revert "Register Torchvision Ops as Cutom Ops (#1267)" (#1316)
Register Torchvision Ops as Cutom Ops (#1267)
Use Tensor.data_ptr instead of .data (#1262)
Fix header includes for cpu (#1644)

Datasets

fixed test for windows by closing the created temporary files (#1662)
VideoClips windows fixes (#1661)
Fix VOC on Windows (#1641)
update dead LSUN link (#1626)
DatasetFolder should follow links when searching for data (#1580)
add .tgz support to extract_archive (#1650)
expose audio_channels as a parameter to kinetics dataset (#1559)
Implemented integrity check (md5 hash) after dataset download (#1456)
Move VideoClips dummy dataset to top level for pickling (#1649)
Remove download for ImageNet (#1457)
add tar.xz archive handler (#1361)
Fix DeprecationWarning for collections.Iterable import in LSUN (#1417)
Support empty target_type for CelebA dataset (#1351)
VOC2007 support test set (#1340)
Fix EMNSIT download URL (#1297) (#1318)
Refactored clip_sampler (#1562)

Documentation

Fix documentation for NMS (#1614)
More examples of functional transforms (#1402)
Fixed doc of crop functionals (#1388)
Added Training Sample code for fasterrcnnresnet50fpn (#1695)
Fix rpn.py typo (#1276)
Update README with minimum required version of PyTorch (#1272)
fix alignment of README (#1396)
fixed typo in DatasetFolder and ImageFolder (#1284)

Models

Bugfix for MNASNet (#1224)
Fix anchor dtype in AnchorGenerator (#1341)

Utils

Adding File object option to utils.save_image (#1301)
Fix make_grid: support any number of channels in tensor (#1300)
Fix bug of changing input tensor in utils.save_image (#1244)

Reference scripts

add a README for training object detection models (#1612)
Adding args for names of train and val directories (#1544)
Fix broken bitwise operation in Similarity Reference loss (#1604)
Fixing issue #1530 by starting annid to 1 in converttococoapi (#1531)
Add commands for model training (#1203)
adding documentation for automatic mixed precision training (#1533)
Fix reference training script for Mask R-CNN for PyTorch 1.2 (during evaluation after epoch, mask datatype became bool, pycocotools expects uint8) (#1413)
fix a little bug about resume (#1628)
Better explain lr and batch size in references/detection/train.py (#1233)
update default parameters in references/detection (#1611)
Removed code redundancy/refactored inn video_classification (#1549)
Fix comment in default arguments in references/detection (#1243)

Tests

Correctness test implemented with old test architecture (#1511)
Simplify and organize test_ops. (#1551)
Replace asserts with assertEqual (#1488)(#1499)(#1497)(#1496)(#1498)(#1494)(#1487)(#1495)
Add expected result tests (#1377)
Add TorchHub tests to torchvision (#1319)
Scriptability checks for Tensor Transforms (#1690)
Add tests for results in script vs eager mode (#1430)
Test for checking non mutating behaviour of tensor transforms (#1656)
Disable download tests for Python2 (#1269)
Fix randomresized params flaky (#1282)

CI

Disable C++ models from being compiled without explicit request (#1535)
Fix discrepancy in regenerate.py (#1583)
soumith -> pytorch for docker images (#1577)
[wip] try vs2019 toolchain (#1509)
Make CI use PyTorch nightly (#1492)
Try enabling Windows CUDA CI (#1486)
Fix CUDA builds on Windows (#1485)
Try fix Windows CircleCI (#1433)
Fix CUDA CI (#1464)
Change approach for rebase to master (#1427)
Temporary fix for CI (#1411)
Use PyTorch 1.3 for CI (#1467)
Use links from S3 to install CUDA (#1472)
Enable CUDA 9.2 builds for Windows (#1381)
Fix nightly builds (#1374)
Fix Windows CI after #1301 (#1368)
Retry anaconda login for Windows builds (#1366)
Fix nightly wheels builds for Windows (#1358)
Fix CI for py2.7 cu100 wheels (#1354)
Fix Windows CI (#1347)
Windows build scripts (#1241)
Make CircleCI checkout merge commit (#1344)
use native python code generation logic (#1321)
Add CircleCI (v2) (#1298)

- Python
Published by fmassa about 6 years ago

org.pytorch:torchvision_ops - Optimized video reader backend

This minor release introduces an optimized video_reader backend for torchvision. It is implemented in C++, and uses FFmpeg internally.

The new video_reader backend can be up to 6 times faster compared to the pyav backend. - When decoding all video/audio frames in the video, the new video_reader is 1.2x - 6x faster depending on the codec and video length. - When decoding a fixed number of video frames (e.g. [4, 8, 16, 32, 64, 128]), video_reader runs equally fast for small values (i.e. [4, 8, 16]) and runs up to 3x faster for large values (e.g. [32, 64, 128]).

Using the optimized video backend

Switching to the new backend can be done via torchvision.set_video_backend('video_reader') function. By default, we use a backend based on top of PyAV.

Due to packaging issues with FFmpeg, in order to use the video_reader backend one need to first have ffmpeg available on the system, and then compile torchvision from source using the instructions from https://github.com/pytorch/vision#installation

Deprecations

In torchvision 0.4.0, the read_video and read_video_timestamps functions used pts relative to the video stream. This could lead to unaligned video-audio being returned in some cases.

torchvision now allow to specify a pts_unit argument in those functions. The default value is 'pts' (with same behavior as before), and the user can now specify pts_unit='sec', which produces consistently aligned results for both video and audio. The 'pts' value is deprecated for now, and kept for backwards-compatibility.

In the next release, the default value of pts_unit will change to 'sec', so that calling read_video without specifying pts_unit returns consistently aligned audio-video results. This will require users to update their VideoClips checkpoints, which used to store the information in pts by default.

Changelog

[video reader] inception commit (#1303) 31fad34
Expose frame-rate and cache to video datasets (#1356) 85ffd93
Expose num_workers in VideoClips (#1359) 02a8c0a
Fix randomresized params flaky (#1282) 7c9bbf5
Video transforms (#1353) 64917bc
add _backend argument to init() of class VideoClips (#1363) 7874374
Video clips workers (#1369) 0982395
modified code of io.readvideo and io.readvideo_timestamps to intepret pts values in seconds (#1331) 17e355f
add metadata to video dataset classes. bug fix. more robustness (#1376) 49b01e3
move sampler into TV core. Update UniformClipSampler (#1408) f0d3daa
remove hardcoded video extension in kinetics400 dataset (#1418) 929c81d
Fix hmdb51 and ucf101 typo (#1420) b13931a
fix a bug related to audioendpts (#1431) 1258bb7
expose more io api (#1423) e48b958
Make video transforms private (#1429) 79daca1
extend video reader to support fast video probing (#1437) ed5b2dc
Better handle corrupted videos (#1463) da89dad
Temporary fix to remove ffmpeg from build time (#1475) ed04dee
fix a bug when video decoding fails and empty frames are returned (#1506) 2804c12
extend DistributedSampler to support group_size (#1512) 355e9d2
Unify video backend (#1514) 97b53f9
Unify video metadata in VideoClips (#1527) 7d509c5
Fixed compute_clips docstring (#1543) b438d32

- Python
Published by fmassa over 6 years ago

org.pytorch:torchvision_ops - Compat with PyTorch 1.3 and bugfix

This minor release provides binaries compatible with PyTorch 1.3.

Compared to version 0.4.0, it contains a single bugfix for HMDB51 and UCF101 datasets, fixed in https://github.com/pytorch/vision/pull/1240

- Python
Published by fmassa over 6 years ago

org.pytorch:torchvision_ops - Video support, new datasets and models

This release adds support for video models and datasets, and brings several improvements.

Note: torchvision 0.4 requires PyTorch 1.2 or newer

Highlights

Video and IO

Video is now a first-class citizen in torchvision. The 0.4 release includes:

efficient IO primitives for reading and writing video files
Kinetics-400, HMDB51 and UCF101 datasets for action recognition, which are compatible with torch.utils.data.DataLoader
Pre-trained models for action recognition, trained on Kinetics-400
Training and evaluation scripts for reproducing the training results.

Writing your own video dataset is easy. We provide an utility class VideoClips that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It additionally allows to specify a fixed frame-rate for the videos.

```python from torchvision.datasets.video_utils import VideoClips

class MyVideoDataset(object): def init(self, videopaths): self.videoclips = VideoClips(videopaths, cliplengthinframes=16, framesbetweenclips``=1, frame_rate=15)

def __getitem__(self, idx):
    video, audio, info, video_idx = self.video_clips.get_clip(idx)
    return video, audio

def __len__(self):
    return self.video_clips.num_clips()

```

We provide pre-trained models for action recognition, trained on Kinetics-400, which reproduce the results on the original papers where they have been first introduced, as well the corresponding training scripts.

|model |clip @ 1 | |--- |--- | |r3d18 |52.748 | |mc318 |53.898 | |r2plus1d_18 |57.498 |

Bugfixes

change aspect ratio calculation formula in references/detection (#1194)
bug fixes in ImageNet (#1149)
fix save_image when height or width equals 1 (#1059)
Fix STL10 __repr__ (#969)
Fix wrong behavior of GeneralizedRCNNTransform in Python2. (#960)

Datasets

New

Add USPS dataset (#961)(#1117)
Added support for the QMNIST dataset (#995)
Add HMDB51 and UCF101 datasets (#1156)
Add Kinetics400 dataset (#1077)

Improvements

Miscellaneous dataset fixes (#1174)
Standardize str argument verification in datasets (#1167)
Always pass transform and target_transform to abstract dataset (#1126)
Remove duplicate transform assignment in FakeDataset (#1125)
Automatic extraction for Cityscapes Dataset (#1066) (#1068)
Use joint transform in Cityscapes (#1024)(#1045)
CelebA: track attr names, support split="all", code cleanup (#1008)
Add folds option to STL10 (#914)

Models

New

Add pretrained Wide ResNet (#912)
Memory efficient densenet (#1003) (#1090)
Implementation of the MNASNet family of models (#829)(#1043)(#1092)
Add VideoModelZoo models (#1130)

Improvements

Fix resnet fpn backbone for resnet18 and resnet34 (#1147)
Add checks to roi_heads in detection module (#1091)
Make shallow copy of input list in GeneralizedRCNNTransform (#1085)(#1111)(#1084)
Make MobileNetV2 number of channel divisible by 8 (#1005)
typo fix: ouput -> output in Inception and GoogleNet (#1034)
Remove empty proposals from the RPN (#1026)
Remove empty boxes before NMS (#1019)
Reduce code duplication in segmentation models (#1009)
allow user to define residual settings in MobileNetV2 (#965)
Use flatten instead of view (#1134)

Documentation

Consistency in detection box format (#1110)
Fix Mask R-CNN docs (#1089)
Add paper references to VGG and Resnet variants (#1088)
Doc, Test Fixes in Normalize (#1063)
Add transforms doc to more datasets (#1038)
Corrected typo: 5 to 0.5 (#1041)
Update doc for torchvision.transforms.functional.perspective (#1017)
Improve documentation for fillcolor option in RandomAffine (#994)
Fix COCO_INSTANCE_CATEGORY_NAMES (#991)
Added models information to documentation. (#985)
Add missing import in faster_rcnn.py documentation (#979)
Improve make_grid docs (#964)

Tests

Add test for SVHN (#1086)
Add tests for Cityscapes Dataset (#1079)
Update CI to Python 3.6 (#1044)
Make test_save_image more robust (#1037)
Add a generic test for the datasets (#1015)
moved fakedata generation to separate module (#1014)
Create imagenet fakedata on-the-fly (#1012)
Minor test refactorings (#1011)
Add test for CIFAR10(0) (#1010)
Mock MNIST download for less flaky tests (#1004)
Add test for ImageNet (#976)(#1006)
Add tests for datasets (#966)

Transforms

New

Add Random Erasing for image augmentation (#909) (#1060) (#1087) (#1095)

Improvements

Allowing 'F' mode for 1 channel FloatTensor in ToPILImage (#1100)
Add shear parallel to y-axis (#1070)
fix error message in to_tensor (#1000)
Fix TypeError in RandomResizedCrop.get_params (#1036)
Fix normalize for different dtype than float32 (#1021)

Ops

Renamed vision.h files to vision_cpu.h and vision_cuda.h (#1051)(#1052)
Optimize nms_cuda by avoiding extra torch.cat call (#945)

Reference scripts

Expose data-path in the detection reference scripts (#1109)
Make utils.py work with pytorch-cpu (#1023)
Add mixed precision training with Apex (#972)(#1124)
Add reference code for similarity learning (#1101)

Build

Add windows build steps and wheel build scripts (#998)
add packaging scripts (#996)
Allow forcing GPU build with FORCE_CUDA=1 (#927)

Misc

Misc lint fixes (#1020)
Reraise error on failed downloading (#1013)
add more hub models (#974)
make C extension lazy-import (#971)

- Python
Published by fmassa over 6 years ago

org.pytorch:torchvision_ops - Training scripts, detection/segmentation models and more

This release brings several new features to torchvision, including models for semantic segmentation, object detection, instance segmentation and person keypoint detection, and custom C++ / CUDA ops specific to computer vision.

Note: torchvision 0.3 requires PyTorch 1.1 or newer

Highlights

Reference training / evaluation scripts

We now provide under the references/ folder scripts for training and evaluation of the following tasks: classification, semantic segmentation, object detection, instance segmentation and person keypoint detection. Their purpose is twofold:

serve as a log of how to train a specific model.
provide baseline training and evaluation scripts to bootstrap research

They all have an entry-point train.py which performs both training and evaluation for a particular task. Other helper files, specific to each training script, are also present in the folder, and they might get integrated into the torchvision library in the future.

We expect users should copy-paste and modify those reference scripts and use them for their own needs.

TorchVision Ops

TorchVision now contains custom C++ / CUDA operators in torchvision.ops. Those operators are specific to computer vision, and make it easier to build object detection models. Those operators currently do not support PyTorch script mode, but support for it is planned for future releases.

List of supported ops

roi_pool (and the module version RoIPool)
roi_align (and the module version RoIAlign)
nms, for non-maximum suppression of bounding boxes
box_iou, for computing the intersection over union metric between two sets of bounding boxes

All the other ops present in torchvision.ops and its subfolders are experimental, in particular:

FeaturePyramidNetwork is a module that adds a FPN on top of a module that returns a set of feature maps.
MultiScaleRoIAlign is a wrapper around roi_align that works with multiple feature map scales

Here are a few examples on using torchvision ops: ```python import torch import torchvision

create 10 random boxes

boxes = torch.rand(10, 4) * 100

they need to be in [x0, y0, x1, y1] format

boxes[:, 2:] += boxes[:, :2]

create a random image

image = torch.rand(1, 3, 200, 200)

extract regions in `image` defined in `boxes`, rescaling

them to have a size of 3x3

pooledregions = torchvision.ops.roialign(image, [boxes], output_size=(3, 3))

check the size

print(pooled_regions.shape)

torch.Size([10, 3, 3, 3])

or compute the intersection over union between

all pairs of boxes

print(torchvision.ops.box_iou(boxes, boxes).shape)

torch.Size([10, 10])

```

Models for more tasks

The 0.3 release of torchvision includes pre-trained models for other tasks than image classification on ImageNet. We include two new categories of models: region-based models, like Faster R-CNN, and dense pixelwise prediction models, like DeepLabV3.

Object Detection, Instance Segmentation and Person Keypoint Detection models

Warning: The API is currently experimental and might change in future versions of torchvision

The 0.3 release contains pre-trained models for Faster R-CNN, Mask R-CNN and Keypoint R-CNN, all of them using ResNet-50 backbone with FPN. They have been trained on COCO train2017 following the reference scripts in references/, and give the following results on COCO val2017

Network | box AP | mask AP | keypoint AP -- | -- | -- | -- Faster R-CNN ResNet-50 FPN | 37.0 | | Mask R-CNN ResNet-50 FPN | 37.9 | 34.6 | Keypoint R-CNN ResNet-50 FPN | 54.6 | | 65.0

The implementations of the models for object detection, instance segmentation and keypoint detection are fast, specially during training.

In the following table, we use 8 V100 GPUs, with CUDA 10.0 and CUDNN 7.4 to report the results. During training, we use a batch size of 2 per GPU, and during testing a batch size of 1 is used.

For test time, we report the time for the model evaluation and post-processing (including mask pasting in image), but not the time for computing the precision-recall.

Network | train time (s / it) | test time (s / it) | memory (GB) -- | -- | -- | -- Faster R-CNN ResNet-50 FPN | 0.2288 | 0.0590 | 5.2 Mask R-CNN ResNet-50 FPN | 0.2728 | 0.0903 | 5.4 Keypoint R-CNN ResNet-50 FPN | 0.3789 | 0.1242 | 6.8

You can load and use pre-trained detection and segmentation models with a few lines of code

```python import torchvision

model = torchvision.models.detection.maskrcnnresnet50fpn(pretrained=True)

set it to evaluation mode, as the model behaves differently

during training and during evaluation

model.eval()

image = PIL.Image.open('/path/to/an/image.jpg') imagetensor = torchvision.transforms.functional.totensor(image)

pass a list of (potentially different sized) tensors

to the model, in 0-1 range. The model will take care of

batching them together and normalizing

output = model([image_tensor])

output is a list of dict, containing the postprocessed predictions

```

Pixelwise Semantic Segmentation models

Warning: The API is currently experimental and might change in future versions of torchvision

The 0.3 release also contains models for dense pixelwise prediction on images. It adds FCN and DeepLabV3 segmentation models, using a ResNet50 and ResNet101 backbones. Pre-trained weights for ResNet101 backbone are available, and have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.

The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC:

Network | mean IoU | global pixelwise acc -- | -- | -- FCN ResNet101 | 63.7 | 91.9 DeepLabV3 ResNet101 | 67.4 | 92.4

New Datasets

Add Caltech101, Caltech256, and CelebA (#775)
ImageNet dataset (#764) (#858) (#870)
Added Semantic Boundaries Dataset (#808) (#865)
Add VisionDataset as a base class for all datasets (#749) (#859) (#838) (#876) (#878)

New Models

Classification

Add GoogLeNet (Inception v1) (#678) (#821) (#828) (#816)
Add MobileNet V2 (#818) (#917)
Add ShuffleNet v2 (#849) (#886) (#889) (#892) (#916)
Add ResNeXt-50 32x4d and ResNeXt-101 32x8d (#822) (#852) (#917)

Segmentation

Fully-Convolutional Network (FCN) with ResNet 101 backbone
DeepLabV3 with ResNet 101 backbone

Detection

Faster R-CNN R-50 FPN trained on COCO train2017 (#898) (#921)
Mask R-CNN R-50 FPN trained on COCO train2017 (#898) (#921)
Keypoint R-CNN R-50 FPN trained on COCO train2017 (#898) (#921) (#922)

Breaking changes

Make CocoDataset ids deterministically ordered (#868)

New Transforms

Add bias vector to LinearTransformation (#793) (#843) (#881)
Add Random Perspective transform (#781) (#879)

Bugfixes

Fix user warning when applying normalize (#810)
Fix logic error in check_integrity (#871)

Improvements

Fixing mutation of 2d tensors in to_pil_image (#762)
Replace tensor.view with tensor.unsqueeze(0) in make_grid (#765)
Change usage of view to reshape in resnet to enable running with mkldnn (#890)
Improve normalize to work with tensors located on any device (#787)
Raise an IndexError for FakeData.__getitem__() if the index would be out of range (#780)
Aspect ratio is now sampled from a logarithmic distribution in RandomResizedCrop. (#799)
Modernize inception v3 weight initialization code (#824)
Remove duplicate code from densenet loadstatedict (#827)
Replace endswith calls in a loop with a single endswith call in DatasetFolder (#832)
Added missing dot in webp image extensions (#836)
fix inconsistent behavior for ~ expression (#850)
Minor Compressions in statements in folder.py (#874)
Minor fix to evaluation formula of PILLOW_VERSION in transforms.functional.affine (#895)
added is_valid_file parameter to DatasetFolder (#867)
Add support for joint transformations in VisionDataset (#872)
Auto calculating return dimension of squeezenet forward method (#884)
Added progress flag to model getters (#875) (#910)
Add support for other normalizations (i.e., GroupNorm) in ResNet (#813)
Add dilation option to ResNet (#866)

Testing

Add basic model testing. (#811)
Add test for num_class in test_model.py (#815)
Added test for normalize functionality in make_grid function. (#840)
Added downloaded directory not empty check in test_datasets_utils (#844)
Added test for save_image in utils (#847)
Added tests for check_md5 and check_integrity (#873)

Misc

Remove shebang in setup.py (#773)
configurable version and package names (#842)
More hub models (#851)
Update travis to use more recent GCC (#891)

Documentation

Add comments regarding downsampling layers of resnet (#794)
Remove unnecessary bullet point in InceptionV3 doc (#814)
Fix crop and resized_crop docs in functional.py (#817)
Added dimensions in the comments of googlenet (#788)
Update transform doc with random offset of padding due to pad_if_needed (#791)
Added the argument transform_input in docs of InceptionV3 (#789)
Update documentation for MNIST datasets (#778)
Fixed typo in normalize() function. (#823)
Fix typo in squeezenet (#841)
Fix typo in DenseNet comment (#857)
Typo and syntax fixes to transform docstrings (#887)

- Python
Published by fmassa almost 7 years ago

org.pytorch:torchvision_ops - More datasets, transforms and bugfixes

This version introduces several improvements and fixes.

Support for arbitrary input sizes for models

It is now possible to feed larger images than 224x224 into the models in torchvision. We added an adaptive pooling just before the classifier, which adapts the size of the feature maps before the last layer, allowing for larger input images. Relevant PRs: #744 #747 #746 #672 #643

Bugfixes

Fix invalid argument error when using lsun method in windows (#508)
Fix FashionMNIST loading MNIST (#640)
Fix inception v3 input transform for trace & onnx (#621)

Datasets

Add support for webp and tiff images in ImageFolder #736 #724
Add K-MNIST dataset #687
Add Cityscapes dataset #695 #725 #739 #700
Add Flicker 8k and 30k datasets #674
Add VOCDetection and VOCSegmentation datasets #663
Add SBU Captioned Photo Dataset (#665)
Updated URLs for EMNIST #726
MNIST and FashionMNIST now have their own 'raw' and 'processed' folder #601
Add metadata to some datasets (#501)

Improvements

Allow RandomCrop to crop in the padded region #564
ColorJitter now supports min/max values #548
Generalize resnet to use block.extension #487
Move area calculation out of for loop in RandomResizedCrop #641
Add option to zero-init the residual branch in resnet (#498)
Improve error messages in topilimage #673
Added the option of converting to tensor for numpy arrays having only two dimensions in to_tensor (#686)
Optimize findclasses in DatasetFolder via scandir in Python3 (#559)
Add padding_mode to RandomCrop (#489 #512)
Make DatasetFolder more generic (#527)
Add in-place option to normalize (#699)
Add Hamming and Box interpolations to transforms.py (#693)
Added the support of 2-channel Image modes such as 'LA' and adding a mode in 4 channel modes (#688)
Improve support for 'P' image mode in pad (#683)
Make torchvision depend on pillow-simd if already installed (#522)
Make tests run faster (#745)
Add support for non-square crops in RandomResizedCrop (#715)

Breaking changes

save_images now round to nearest integer #754

Misc

Added code coverage to travis #703
Add downloads and docs badge to README (#702)
Add progress to download_url #497 #524 #535
Replace 'residual' with 'identity' in resnet.py (#679)
Consistency changes in the models
Refactored MNIST and CIFAR to have data and target fields #578 #594
Update torchvision to newer versions of PyTorch
Relax assertion in transforms.Lambda.__init__ (#637)
Cast MNIST target to int (#605)
Change default target type of FakedDataset to long (#581)
Improve docs of functional transforms (#602)
Docstring improvements
Add isimagefile to folder_dataset (#507)
Add deprecation warning in MNIST train[test]_labelsdata
Mention TORCHMODELZOO in models documentation. (#624)
Add scipy as a dependency to setup.py (#675)
Added size information for inception v3 (#719)

- Python
Published by fmassa almost 7 years ago

org.pytorch:torchvision_ops - New datasets, transforms and fixes

This version introduces several fixes and improvements to the previous version.

Better printing of Datasets and Transforms

Add descriptions to Transform objects. python # Now T.Compose([T.RandomHorizontalFlip(), T.RandomCrop(224), T.ToTensor()]) prints Compose( RandomHorizontalFlip(p=0.5) RandomCrop(size=(224, 224), padding=0) ToTensor() )
Add descriptions to Datasets ```python

now torchvision.datasets.MNIST('~') prints

Dataset MNIST Number of datapoints: 60000 Split: train Root Location: /private/home/fmassa Transforms (if any): None Target Transforms (if any): None ```

New transforms
Add RandomApply, RandomChoice, RandomOrder transformations #402
- RandomApply: applies a list of transformation with a probability
- RandomChoice: choose randomly a single transformation from a list
- RandomOrder: apply transformations in a random order
Add random affine transformation #411
Add reflect, symmetric and edge padding to transforms.pad #460

Performance improvements

Speedup MNIST preprocessing by a factor of 1000x
make weight initialization optional to speed VGG construction. This makes loading pre-trained VGG models much faster
Accelerate transforms.adjust_gamma by using PIL's point function instead of custom numpy-based implementation

New Datasets

EMNIST - an extension of MNIST for hand-written letters
OMNIGLOT - a dataset for one-shot learning, with 1623 different handwritten characters from 50 different alphabets
Add a DatasetFolder class - generalization of ImageFolder

Miscellaneous improvements

FakeData accepts a seed argument, so having multiple different FakeData instances is now possible
Use consistent datatypes in Dataset targets. Now all datasets that returns labels will have them as int
Add probability parameter in RandomHorizontalFlip and RandomHorizontalFlip
Replace np.random by random in transforms - improves reproducibility in multi-threaded environments with default arguments
Detect tif images in ImageFolder
Add pad_if_needed to RandomCrop, so that if the crop size is larger than the image, the image is automatically padded
Add support in transforms.ToTensor for PIL Images with mode '1'

Bugfixes

Fix passing list of tensors to utils.save_image
single images passed to make_grid now are now also normalized
Fix PIL img close warnings
Added missing weight initializations to densenet
Avoid division by zero in make_grid when the image is constant
Fix ToTensor when PIL Image has mode F
Fix bug with to_tensor when the input is numpy array of type np.float32.

- Python
Published by soumith almost 8 years ago

org.pytorch:torchvision_ops - v0.2.0: New transforms + a new functional interface

This version introduced a functional interface to the transforms, allowing for joint random transformation of inputs and targets. We also introduced a few breaking changes to some datasets and transforms (see below for more details).

Transforms

We have introduced a functional interface for the torchvision transforms, available under torchvision.transforms.functional. This now makes it possible to do joint random transformations on inputs and targets, which is especially useful in tasks like object detection, segmentation and super resolution. For example, you can now do the following:

```python from torchvision import transforms import torchvision.transforms.functional as F import random

def mysegmentationtransform(input, target): i, j, h, w = transforms.RandomCrop.getparams(input, (100, 100)) input = F.crop(input, i, j, h, w) target = F.crop(target, i, j, h, w) if random.random() > 0.5: input = F.hflip(input) target = F.hflip(target) F.totensor(input), F.totensor(target) return input, target ``The following transforms have also been added: - [F.vflipandRandomVerticalFlip](http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.RandomVerticalFlip) - [FiveCrop](http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.FiveCrop) and [TenCrop](http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.TenCrop) - Various color transformations: - [ColorJitter](http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.ColorJitter) -F.adjustbrightness-F.adjustcontrast-F.adjustsaturation-F.adjust_hue-LinearTransformationfor applications such as whitening -GrayscaleandRandomGrayscale-RotateandRandomRotation-ToPILImagenow supportsRGBAimages -ToPILImagenow accepts amodeargument so you can specify which colorspace the image should be -RandomResizedCropnow acceptsscaleandratio` ranges as input parameters

Documentation

Documentation is now auto generated and publishing to pytorch.org

Datasets:

SEMEION Dataset of handwritten digits added Phototour dataset patches computed via multi-scale Harris corners now available by setting name equal to notredame_harris, yosemite_harris or liberty_harris in the Phototour dataset

Bug fixes:

Pre-trained densenet models is now CPU compatible #251

Breaking changes:

This version also introduced some breaking changes: - The SVHN dataset has now been made consistent with other datasets by making the label for the digit 0 be 0, instead of 10 (as it was previously) (see #194 for more details) - the labels for the unlabelled STL10 dataset is now an array filled with -1 - the order of the input args to the deprecated Scale transform has changed from (width, height) to (height, width) to be consistent with other transforms

- Python
Published by alykhantejani about 8 years ago

org.pytorch:torchvision_ops - More models and some bug fixes

Ability to switch image backends between PIL and accimage
Added more tests
Various bug fixes and doc improvements

Models

Fix for inception v3 input transform bug https://github.com/pytorch/vision/pull/144
Added pretrained VGG models with batch norm

Datasets

Fix indexing bug in LSUN dataset (https://github.com/pytorch/vision/pull/177)
enable ~ to be used in dataset paths
ImageFolder now returns the same (sorted) file order on different machines (https://github.com/pytorch/vision/pull/193)

Transforms

transforms.Scale now accepts a tuple as new size or single integer

Utils

can now pass a pad value to makegrid and saveimage

- Python
Published by soumith over 8 years ago

org.pytorch:torchvision_ops - More models and datasets. Some bugfixes

New Features

Models

SqueezeNet 1.0 and 1.1 models added, along with pre-trained weights
Add pre-trained weights for VGG models
- Fix location of dropout in VGG
torchvision.models now expose num_classes as a constructor argument
Add InceptionV3 model and pre-trained weights
Add DenseNet models and pre-trained weights

Datasets

Add STL10 dataset
Add SVHN dataset
Add PhotoTour dataset

Transforms and Utilities

transforms.Pad now allows fill colors of either number tuples, or named colors like "white"
add normalization options to make_grid and save_image
ToTensor now supports more input types

Performance Improvements

Bug Fixes

ToPILImage now supports a single image
Python3 compatibility bug fixes
ToTensor now copes with all PIL Image types, not just RGB images
ImageFolder now only scans subdirectories.
- Having files like .DS_Store is now no longer a blocking hindrance
- Check for non-zero number of images in ImageFolder
- Subdirectories of classes have recursive scans for images
LSUN test set loads now

- Python
Published by soumith almost 9 years ago

org.pytorch:torchvision_ops - Just a version bump

A small release, just needed a version bump because of PyPI.

- Python
Published by soumith almost 9 years ago

org.pytorch:torchvision_ops - Add models and modelzoo, some bugfixes

New Features

Add torchvision.models: Definitions and pre-trained models for common vision models
- ResNet, AlexNet, VGG models added with downloadable pre-trained weights
adding padding to RandomCrop. Also add transforms.Pad
Add MNIST dataset

Performance Fixes

Fixing performance of LSUN Dataset

Bug Fixes

Some Python3 fixes
Bug fixes in save_image, add single channel support

- Python
Published by soumith almost 9 years ago

org.pytorch:torchvision_ops - First release

Introduced Datasets and Transforms.

Added common datasets

COCO (Captioning and Detection)
LSUN Classification
ImageFolder
Imagenet-12
CIFAR10 and CIFAR100
Added utilities for saving images from Tensors.

- Python
Published by soumith almost 9 years ago

Recent Releases of org.pytorch:torchvision_ops

org.pytorch:torchvision_ops - Torchvision 0.23 release

Highlight - Transforming KeyPoints and Rotated boxes!

Detailed changes

New Features

Improvements

Bug Fixes

Contributors

org.pytorch:torchvision_ops - TorchVision 0.22.1 Release

Key info

org.pytorch:torchvision_ops - Torchvision 0.22 release

Key info

Detailed Changes

Deprecations

Bug Fixes

Improvements

Contributors

org.pytorch:torchvision_ops - Torchvision 0.21 release

Highlights

Detailed changes

Image decoding

New Features

Improvements

Bug Fixes

Tracked Regressions

Contributors

org.pytorch:torchvision_ops - Torchvision 0.20 release

Highlights

Encoding / Decoding images

Detailed changes

Bug Fixes

New Features

Improvements

Contributors

org.pytorch:torchvision_ops - TorchVision 0.19.1 Release

org.pytorch:torchvision_ops - Torchvision 0.19 release

Highlights

Encoding / Decoding images

Resizing according to the longest edge of an image

Detailed changes

Bug Fixes

New Features

Improvements

Contributors

org.pytorch:torchvision_ops - TorchVision 0.18.1 Release

org.pytorch:torchvision_ops - TorchVision 0.18 Release

BC-Breaking changes

Bug Fixes

New Features

Improvements

Contributors

org.pytorch:torchvision_ops - TorchVision 0.17.2 Release

org.pytorch:torchvision_ops - TorchVision 0.17.1 Release

Bug Fixes

org.pytorch:torchvision_ops - TorchVision 0.17 Release

Highlights

The V2 transforms are now stable!

Towards torch.compile() support

Detailed Changes

Breaking changes / Finalizing deprecations

Bug Fixes

New Features

Improvements

Contributors

org.pytorch:torchvision_ops - TorchVision 0.16.2 Release

org.pytorch:torchvision_ops - TorchVision 0.16.1 Release

Bug Fixes

org.pytorch:torchvision_ops - TorchVision 0.16 - Transforms speedups, CutMix/MixUp, and MPS support!

Highlights

[BETA] Transforms and augmentations

Major speedups

CutMix and MixUp

Towards stable V2 transforms

[BETA] MPS support

Detailed Changes

Deprecations / Breaking changes

New Features

Improvements

Bug Fixes

Others

Towards `torch.compile()` support