Recent Releases of TorchMetrics - Measuring Reproducibility in PyTorch
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.8.2] - 2025-09-03
Fixed
- Fixed
BinaryPrecisionRecallCurvenow returnsNaNfor precision when no predictions meet a threshold (#3227) - Fixed
precision_at_fixed_recallandrecall_at_fixed_precisionto correctly returnNaNthresholds when recall/precision conditions are not met (#3226)
Key Contributors
@iamkulbhushansingh
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.8.1...v1.8.2
Scientific Software - Peer-reviewed
- Python
Published by Borda 9 months ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.8.1] - 2025-08-07
Changed
- Added
reduction='none'tovifmetric (#3196) - Float input support for segmentation metrics (#3198)
Fixed
- Fixed unintended
sigmoidnormalization inBinaryPrecisionRecallCurve(#3182)
Key Contributors
@iamkulbhushansingh, @PussyCat0700, @simonreise
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.8.0...v1.8.1
Scientific Software - Peer-reviewed
- Python
Published by Borda 10 months ago
TorchMetrics - Measuring Reproducibility in PyTorch - First video and vertex metrics
The upcoming TorchMetrics v1.8.0 release introduces three flagship metrics, each designed to address critical evaluation needs in real-world applications.
Video Multi-Method Assessment Fusion (VMAF) brings a perceptual video-quality score that closely mirrors human judgment, powering streaming services such as Netflix and YouTube to optimize encoding ladders for consistent viewer experiences and enabling video-restoration labs to quantify improvements achieved by denoising and super-resolution algorithms.
Continuous Ranked Probability Score (CRPS) enables comprehensive evaluation of full predictive distributions rather than point estimates; meteorological centers leverage CRPS to benchmark probabilistic precipitation and temperature forecasts, improving public weather alerts, while energy companies apply it to assess uncertainty in load-demand predictions and refine grid management and trading strategies.
Lip Vertex Error (LVE) measures the discrepancy between predicted and ground-truth lip landmarks to quantify audio-visual synchronization. Localization studios use LVE to validate lip-sync accuracy during film dubbing, while AR/VR developers integrate it into avatar pipelines to ensure natural mouth movements in real-time virtual meetings and social experiences.
[1.8.0] - 2025-07-23
Added
- Added
VMAFmetric to new video domain (#2991) - Added
CRPSin regression domain (#3024) - Added
aggregation_levelargument toDiceScore(#3018) - Added support for
reduction="none"toLearnedPerceptualImagePatchSimilarity(#3053) - Added support single
strinput for functional interface ofbert_score(#3056) - Enhance:
BERTScoreto evaluate hypotheses against multiple references (#3069) - Added
Lip Vertex Error (LVE)in multimodal domain (#3090) - Added
antialiasargument toFIDmetric (#3177) - Added
mixedinput format to segmentation metrics (#3176)
Changed
- Changed
data_rangeargument inPSNRmetric to be a required argument (#3178)
Removed
- Removed
zero_divisionargument fromDiceScore(#3018)
Key Contributors
@nkaenzig, @rittik9, @simonreise, @SkafteNicki
New Contributors
- @lantiga made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3054
- @AlexVerine made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3057
- @ZhiyuanChen made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3059
- @ahmedhshahin made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3101
- @gratus907 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3103
- @cyyever made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3118
- @Armannas made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3124
- @alifa98 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3128
- @simonreise made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/3176
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.7.0...v1.8.0
Scientific Software - Peer-reviewed
- Python
Published by Borda 10 months ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.7.4] - 2025-07-04
Changed
- Improved numerical stability of pearson's correlation coefficient (#3152)
Fixed
- Fixed: Ignore zero and negative predictions in retrieval metrics (#3160)
- Fixed SSIM
dist_reduce_fxwhenreduction=Nonefor distributed training (#3162, #3166) - Fixed attribute error (#3154)
- Fixed incorrect shape in
_pearson_corrcoef_update(#3168)
Key Contributors
@AymenKallala, @gratus907, @Isalia20, @rittik9
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.7.3...v1.7.4
Scientific Software - Peer-reviewed
- Python
Published by Borda 11 months ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.7.3] - 2025-06-13
Fixed
- Fixed: ensure
WrapperMetricresetswrapped_metricstate (#3123) - Fixed
top_kinmulticlass_accuracy(#3117)
- Fixed compatibility to COCO format for pycocotools 2.0.10 (#3131)
Key Contributors
@rittik9
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.7.2...v1.7.3
Scientific Software - Peer-reviewed
- Python
Published by Borda 12 months ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.7.2] - 2025-05-27
Changed
- Enhance: improve performance of
_rank_data(#3103)
Fixed
- Fixed
UnboundLocalErrorinMatthewsCorrCoef(#3059) - Fixed MIFID incorrectly converts inputs to
bytedtype with custom encoders (#3064) - Fixed
ignore_indexinMultilabelExactMatch(#3085) - Fixed: disable non-blocking on MPS (#3101)
Key Contributors
@ahmedhshahin, @gratus907, @rittik9, @ZhiyuanChen
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.7.1...v1.7.2
Scientific Software - Peer-reviewed
- Python
Published by Borda about 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.7.1] - 2025-04-06
Changed
- Enhance Support Adding a
MetricCollectionto AnotherMetricCollectioninadd_metricsFunction (#3032)
Fixed
- Fixed absent class
MeanIOU(#2892) - Fixed detection IoU ignores predictions without ground truth (#3025)
- Fixed error raised in
MulticlassAccuracywhen top_k>1 (#3039)
Key Contributors
@Isalia20, @rittik9, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.7.0...v1.7.1
Scientific Software - Peer-reviewed
- Python
Published by Borda about 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - More image metrics
The upcoming release of TorchMetrics is set to deliver a range of innovative features and enhancements across multiple domains, further solidifying its position as a leading tool for machine learning metrics. In the image domain, significant additions include the ARNIQA and DeepImageStructureAndTextureSimilarity metrics, which provide new insights into image quality and similarity. Additionally, the CLIPScore metric now supports more models and processors, expanding its versatility in image-text alignment tasks.
Beyond image analysis, the regression package welcomes the JensenShannonDivergence metric, offering a powerful tool for comparing probability distributions. The clustering package also sees a notable update with the introduction of the ClusterAccuracy metric, which helps evaluate the performance of clustering algorithms more effectively.
In the realm of classification, the Equal Error Rate (EER) metric has been added, providing a crucial measure for assessing the performance of classification models, particularly in scenarios where false positives and false negatives have different costs. Furthermore, the MeanAveragePrecision metric now includes a functional interface, enhancing its usability and flexibility for users.
These updates collectively enhance the capabilities of TorchMetrics, making it an even more comprehensive and indispensable resource for machine learning practitioners and researchers.
[1.7.0] - 2025-03-20
Added
- Additions to image domain:
- Added
ARNIQAmetric (#2953) - Added
DeepImageStructureAndTextureSimilarity(#2993) - Added support for more models and processors in
CLIPScore(#2978)
- Added
- Added
JensenShannonDivergencemetric to regression package (#2992) - Added
ClusterAccuracymetric to cluster package (#2777) - Added
Equal Error Rate (EER)to classification package (#3013) - Added functional interface to
MeanAveragePrecisionmetric (#3011)
Changed
- Making
num_classesoptional forone-hotinputs inMeanIoU(#3012)
Removed
- Removed
Dicefrom classification (#3017)
Fixed
- Fixed edge case in integration between class-wise wrapper and metric tracker (#3008)
- Fixed
IndexErrorinMultiClassAccuracywhen usingtop_kwith single sample (#3021)
Key Contributors
@Isalia20, @LorenzoAgnolucci, @nathanpainchaud, @rittik9, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.6.0...v1.7.0
Scientific Software - Peer-reviewed
- Python
Published by Borda about 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.6.3] - 2024-03-13
Fixed
- Fixed logic in how metric states referencing is handled in
MetricCollection(#2990) - Fixed integration between class-wise wrapper and metric tracker (#3004)
Key Contributors
@SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.6.2...v1.6.3
Scientific Software - Peer-reviewed
- Python
Published by Borda about 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.6.2] - 2024-02-28
Added
- Added
zero_divisionargument toDiceScorein segmentation package (#2860) - Added
cache_sessiontoDNSMOSmetric to control caching behavior (#2974) - Added
disableoption tonan_strategyin basic aggregation metrics (#2943)
Changed
- Make
num_classesoptional for classification in case of micro averaging (#2841) - Enhance
Clip_Scoreto calculate similarities between same modalities (#2875)
Fixed
- Fixed
DiceScorewhen there is zero overlap between predictions and targets (#2860) - Fixed
MeanAveragePrecisionforaverage="micro"when 0 label is not present (#2968) - Fixed corner-case in
PearsonCorrCoefwhen input is constant (#2975) - Fixed
MetricCollection.updategives identical results (#2944) - Fixed missing
kwargsinPITmetric for permutation wise mode (#2977) - Fixed multiple errors in the
_final_aggregationfunction forPearsonCorrCoef(#2980) - Fixed incorrect CLIP-IQA type hints (#2952)
Key Contributors
@baskrahmer, @czmrand, @rbedyakin, @rittik9, @SkafteNicki, @wooseopkim
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.6.1...v1.6.2
Scientific Software - Peer-reviewed
- Python
Published by Borda about 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.6.1] - 2024-12-25
Changed
- Enabled specifying weights path for FID (#2867)
- Delete
Device2Hostcaused by comm with device and host (#2840)
Fixed
- Fixed plotting of multilabel confusion matrix (#2858)
- Fixed issue with shared state in metric collection when using dice score (#2848)
- Fixed
top_kformulticlassf1scorewith one-hot encoding (#2839) - Fixed slow calculations of classification metrics with MPS (#2876)
Key Contributors
@Isalia20, @nkaenzig, @podgorki, @rittik9, @yuvalkirstain, @zhaozheng09
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.6.0...v1.6.1
Scientific Software - Peer-reviewed
- Python
Published by Borda over 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - More metrics
The latest release of TorchMetrics introduces several significant enhancements and new features that will greatly benefit users across various domains. This update includes the addition of new metrics and methods that enhance the library's functionality and usability.
One of the key additions is the NISQA audio metric, which provides advanced capabilities for evaluating audio quality. In the classification domain, the new LogAUC and NegativePredictiveValue metrics offer improved tools for assessing model performance, particularly in imbalanced datasets. For regression tasks, the NormalizedRootMeanSquaredError metric has been introduced, providing a normalized measure of prediction accuracy that is less sensitive to outliers.
In the field of image segmentation, the new Dice metric enhances the evaluation of segmentation models by providing a robust measure of overlap between predicted and ground truth masks. Additionally, the merge_state method has been added to the Metric class, allowing for more efficient state management and aggregation across multiple devices or processes.
Furthermore, this release includes support for the propagation of the autograd graph in Distributed Data-Parallel (DDP) settings, enabling more efficient and scalable training of models across multiple GPUs. These enhancements collectively make TorchMetrics a more powerful and versatile tool for machine learning practitioners, enabling more accurate and efficient model evaluation across a wide range of applications.
[1.6.0] - 2024-11-12
Added
- Added audio metric
NISQA(#2792) - Added classification metric
LogAUC(#2377) - Added classification metric
NegativePredictiveValue(#2433) - Added regression metric
NormalizedRootMeanSquaredError(#2442) - Added segmentation metric
Dice(#2725) - Added method
merge_statetoMetric(#2786) - Added support for propagation of the autograd graph in DDP setting (#2754)
Changed
- Changed naming and input order arguments in
KLDivergence(#2800)
Deprecated
- Deprecated Dice from classification metrics (#2725)
Removed
- Changed minimum supported Pytorch version to 2.0 (#2671)
- Dropped support for Python 3.8 (#2827)
- Removed
num_outputsinR2Score(#2800)
Fixed
- Fixed segmentation
Dice+GeneralizedDicefor 2d index tensors (#2832) - Fixed mixed results of
rouge_scorewithaccumulate='best'(#2830)
Key Contributors
@Borda, @cw-tan, @philgzl, @rittik9, @SkafteNicki
New Contributors since 1.5.0
- @bfolie made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2793
- @StalkerShurik made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2811
- @philgzl made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2792
- @cw-tan made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2754
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.5.0...v1.6.0
Scientific Software - Peer-reviewed
- Python
Published by Borda over 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.5.2] - 2024-11-07
Changed
- Re-adding
numpy2+ support (#2804)
Fixed
- Fixed iou scores in detection for either empty predictions/targets leading to wrong scores (#2805)
- Fixed
MetricCollectioncompatibility withtorch.jit.script(#2813) - Fixed assert in PIT (#2811)
- Patched
np.Inffornumpy2.0+ (#2826)
Key Contributors
@adamjstewart, @Borda, @SkafteNicki, @StalkerShurik, @yurithefury
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.5.1...v1.5.2
Scientific Software - Peer-reviewed
- Python
Published by Borda over 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor compatibility patch
[1.5.1] - 2024-10-22
Fixed
- Changing
_modulesdict type in Pytorch 2.5 preventing to fail collections metrics (#2793)
Key Contributors
@bfolie
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.5.0...v1.5.1
Scientific Software - Peer-reviewed
- Python
Published by Borda over 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Shape metric
Shape metrics are quantitative methods used to assess and compare the geometric properties of objects, often in datasets that represent shapes. One such metric is the Procrustes Disparity, which measures the sum of the squared differences between two datasets after applying a Procrustes transformation. This transformation involves scaling, rotating, and translating the datasets to achieve optimal alignment. The Procrustes Disparity is particularly useful when comparing datasets that are similar in structure but not perfectly aligned, allowing for more meaningful comparison by minimizing differences due to orientation or size.
[1.5.0] - 2024-10-18
Added
- Added segmentation metric
HausdorffDistance(#2122) - Added audio metric
DNSMOS(#2525) - Added shape metric
ProcrustesDistance(#2723) - Added
MetricInputTransformerwrapper (#2392) - Added
input_formatargument to segmentation metrics (#2572) - Added
multi-outputsupport for MAE metric (#2605) - Added
truncationargument toBERTScore(#2776)
Changed
- Tracker higher is better integration (#2649)
- Updated
InfoLMclass to dynamically sethigher_is_better(#2674)
Deprecated
- Deprecated
num_outputsinR2Score(#2705)
Fixed
- Fixed corner case in
IoUmetric for single empty prediction tensors (#2780) - Fixed
PSNRcalculation for integer type input images (#2788)
Key Contributors
@Astraightrain, @grahamannett, @lgienapp, @matsumotosan, @quancs, @SkafteNicki
New Contributors since 1.4.0
- @kalekundert made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2543
- @lgienapp made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2392
- @sweber1 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2634
- @gxy-gxy made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2347
- @Astraightrain made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2605
- @ndrwrbgs made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2640
- @grahamannett made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2674
- @petertheprocess made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2721
- @rittik9 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2726
- @vkinakh made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2698
- @likawind made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2732
- @veera-puthiran-14082 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2753
- @GPPassos made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2727
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.4.0...v1.5.0
Scientific Software - Peer-reviewed
- Python
Published by Borda over 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.4.3] - 2024-10-10
Fixed
- Fixed for Pearson changes inputs (#2765)
- Fixed bug in
PESQmetric whereNoUtterancesErrorprevented calculating on a batch of data (#2753) - Fixed corner case in
MatthewsCorrCoef(#2743)
Key Contributors
@Borda, @SkafteNicki, @veera-puthiran-14082
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.4.2...v1.4.3
Scientific Software - Peer-reviewed
- Python
Published by Borda over 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.4.2] - 2022-09-12
Added
- Re-adding
Chrfimplementation (#2701)
Fixed
- Fixed wrong aggregation in
segmentation.MeanIoU(#2698) - Fixed handling zero division error in binary IoU (Jaccard index) calculation (#2726)
- Corrected the padding related calculation errors in SSIM (#2721)
- Fixed compatibility of audio domain with new
scipy(#2733) - Fixed how
prefix/postfixworks inMultitaskWrapper(#2722) - Fixed flakiness in tests related to
torch.uniquewithdim=None(#2650)
Key Contributors
@Borda, @petertheprocess, @rittik9, @SkafteNicki, @vkinakh
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.4.1...v1.4.2
Scientific Software - Peer-reviewed
- Python
Published by Borda over 1 year ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.4.1] - 2024-08-02
Changed
- Calculate the text color of
ConfusionMatrixplot based on luminance (#2590) - Updated
_safe_divideto allowAccuracyto run on the GPU (#2640) - Improved better error messages for intersection detection metrics for wrong user input (#2577)
Removed
- Dropped
Chrfimplementation due to licensing issues with the upstream package (#2668)
Fixed
- Fixed bug in
MetricCollectionwhen using compute groups andcomputeis called more than once (#2571) - Fixed class order of
panoptic_quality(..., return_per_class=True)output (#2548) - Fixed
BootstrapWrappernot being reset correctly (#2574) - Fixed integration between
ClasswiseWrapperandMetricCollectionwith custom_filter_kwargsmethod (#2575) - Fixed BertScore calculation: pred target misalignment (#2347)
- Fixed
_cumsumhelper function in multi-gpu (#2636) - Fixed bug in
MeanAveragePrecision.coco_to_tm(#2588) - Fixed missed f-strings in exceptions/warnings (#2667)
Key Contributors
@Borda, @gxy-gxy, @i-aki-y, @ndrwrbgs, @relativityhd, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.4.0...v1.4.1
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor dependency correction
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.4.0...v1.4.0.post0
Scientific Software - Peer-reviewed
- Python
Published by Borda about 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Metrics for segmentation
In Torchmetrics v1.4, we are happy to introduce a new domain of metrics to the library: segmentation metrics. Segmentation metrics are used to evaluate how well segmentation algorithms are performing, e.g., algorithms that take in an image and pixel-by-pixel decide what kind of object it is. These kind of algorithms are necessary in applications such as self driven cars. Segmentations are closely related to classification metrics, but for now, in Torchmetrics, expect the input to be formatted differently; see the documentation for more info. For now, MeanIoU and GeneralizedDiceScore have been added to the subpackage, with many more to follow in upcoming releases of Torchmetrics. We are happy to receive any feedback on metrics to add in the future or the user interface for the new segmentation metrics.
Torchmetrics v1.3 adds new metrics to the classification and image subpackage and has multiple bug fixes and other quality-of-life improvements. We refer to the changelog for the complete list of changes.
[1.4.0] - 2024-05-03
Added
- Added
SensitivityAtSpecificitymetric to classification subpackage (#2217) - Added
QualityWithNoReferencemetric to image subpackage (#2288) - Added a new segmentation metric:
MeanIoU(#1236)GeneralizedDiceScore(#1090)
- Added support for calculating segmentation quality and recognition quality in
PanopticQualitymetric (#2381) - Added
pretty-errorsfor improving error prints (#2431) - Added support for
torch.floatweighted networks for FID and KID calculations (#2483) - Added
zero_divisionargument to selected classification metrics (#2198)
Changed
- Made
__getattr__and__setattr__ofClasswiseWrappermore general (#2424)
Fixed
- Fix getitem for metric collection when prefix/postfix is set (#2430)
- Fixed axis names with Precision-Recall curve (#2462)
- Fixed list synchronization with partly empty lists (#2468)
- Fixed memory leak in metrics using list states (#2492)
- Fixed bug in computation of
ERGASmetric (#2498) - Fixed
BootStrapperwrapper not working withkwargsprovided argument (#2503) - Fixed warnings being suppressed in
MeanAveragePrecisionwhen requested (#2501) - Fixed corner-case in
binary_average_precisionwhen only negative samples are provided (#2507)
Key Contributors
@baskrahmer, @Borda, @ChristophReich1996, @daniel-code, @furkan-celik, @i-aki-y, @jlcsilva, @NielsRogge, @oguz-hanoglu, @SkafteNicki, @ywchan2005
New Contributors
- @eamonn-zh made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2345
- @nsmlzl made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2346
- @fschlatt made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2364
- @JonasVerbickas made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2358
- @AtomicVar made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2391
- @JDongian made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2400
- @daniel-code made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2390
- @baskrahmer made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2457
- @ChristophReich1996 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2381
- @lukazso made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2491
- @S-aiueo32 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2499
- @dominicgkerr made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2493
- @Shoumik-Gandre made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2482
- @randombenj made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2511
- @NielsRogge made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1236
- @i-aki-y made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2198
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.3.0...v1.4.0
Scientific Software - Peer-reviewed
- Python
Published by Borda about 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.3.2] - 2024-03-18
Fixed
- Fixed negative variance estimates in certain image metrics (#2378)
- Fixed dtype being changed by deepspeed for certain regression metrics (#2379)
- Fixed plotting of metric collection when prefix/postfix is set (#2429)
- Fixed bug when
top_k>1andaverage="macro"for classification metrics (#2423) - Fixed case where label prediction tensors in classification metrics were not validated correctly (#2427)
- Fixed how auc scores are calculated in
PrecisionRecallCurve.plotmethods (#2437)
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.3.1...v1.3.2
Key Contributors
@Borda, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda about 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[1.3.1] - 2024-02-12
Fixed
- Fixed how backprop is handled in
LPIPSmetric (#2326) - Fixed
MultitaskWrappernot being able to be logged in lightning when using metric collections (#2349) - Fixed high memory consumption in
Perplexitymetric (#2346) - Fixed cached network in
FeatureSharenot being moved to the correct device (#2348) - Fix naming of statistics in
MeanAveragePrecisionwith custom max det thresholds (#2367) - Fixed custom aggregation in retrieval metrics (#2364)
- Fixed initialize aggregation metrics with default floating type (#2366)
- Fixed plotting of confusion matrices (#2358)
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.3.0...v1.3.1
Key Contributors
@Borda, @fschlatt, @JonasVerbickas, @nsmlzl, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor package patch
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.3.0...v1.3.0.post0
Scientific Software - Peer-reviewed
- Python
Published by Borda over 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor package patch
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.3.0...v1.3.0.post
Scientific Software - Peer-reviewed
- Python
Published by Borda over 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - New Image metrics & wrappers
[1.3.0] - 2024-01-10
Added
- Added more tokenizers for
SacreBLEUmetric (#2068) - Added support for logging
MultiTaskWrapperdirectly with lightningslog_dictmethod (#2213) - Added
FeatureSharewrapper to share submodules containing feature extractors between metrics (#2120) - Added new metrics to image domain:
SpatialDistortionIndex(#2260)- Added
CriticalSuccessIndex(#2257) Spatial Correlation Coefficient(#2248)
- Added
averageargument to multiclass versions ofPrecisionRecallCurveandROC(#2084) - Added confidence scores when
extended_summary=TrueinMeanAveragePrecision(#2212) - Added
RetrievalAUROCmetric (#2251) - Added
aggregateargument to retrieval metrics (#2220) - Added utility functions in
segmentation.utilsfor future segmentation metrics (#2105)
Changed
- Changed minimum supported Pytorch version from 1.8 to 1.10 (#2145)
- Changed x-/y-axis order for
PrecisionRecallCurveto be consistent with scikit-learn (#2183)
Deprecated
- Deprecated
metric._update_called(#2141) - Deprecated
specicity_at_sensitivityin favour ofspecificity_at_sensitivity(#2199)
Fixed
- Fixed support for half precision + CPU in metrics requiring topk operator (#2252)
- Fixed warning incorrectly being raised in
Runningmetrics (#2256) - Fixed integration with custom feature extractor in
FIDmetric (#2277)
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.2.0...v1.3.0
Key Contributors
@Borda, @HoseinAkbarzadeh, @matsumotosan, @miskfi, @oguz-hanoglu, @SkafteNicki, @stancld, @ywchan2005
New Contributors
- @pme0 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2114
- @damiankucharski made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2173
- @clumsy made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2185
- @jankng made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2226
- @tanguymagne made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2230
- @kyle-dorman made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2184
- @oguz-hanoglu made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2199
- @miskfi made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2257
- @ywchan2005 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2260
- @HoseinAkbarzadeh made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2248
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Lazy imports
[1.2.1] - 2023-11-30
Added
- Added error if
NoTrainInceptionV3is being initialized withouttorch-fidelitynot being installed (#2143) - Added support for Pytorch
v2.1(#2142)
Changed
- Change default state of
SpectralAngleMapperandUniversalImageQualityIndexto be tensors (#2089) - Use
arangeand repeat for deterministic bincount (#2184)
Removed
- Removed unused
lpipsthird-party package as dependency ofLearnedPerceptualImagePatchSimilaritymetric (#2230)
Fixed
- Fixed numerical stability bug in
LearnedPerceptualImagePatchSimilaritymetric (#2144) - Fixed numerical stability issue in
UniversalImageQualityIndexmetric (#2222) - Fixed incompatibility for
MeanAveragePrecisionwithpycocotoolsbackend when too littlemax_detection_thresholdsare provided (#2219) - Fixed support for half precision in Perplexity metric (#2235)
- Fixed device and dtype for
LearnedPerceptualImagePatchSimilarityfunctional metric (#2234) - Fixed bug in
Metric._reduce_states(...)when usingdist_sync_fn="cat"(#2226) - Fixed bug in
CosineSimilaritywhere 2d is expected but 1d input was given (#2241) - Fixed bug in
MetricCollectionwhen using compute groups andcomputeis called more than once (#2211)
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.2.0...v1.2.1
Key Contributors
@Borda, @jankng, @kyle-dorman, @SkafteNicki, @tanguymagne
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Clustering metrics
Torchmetrics v1.2 is out now! The latest release includes 11 new metrics within a new subdomain: Clustering. In this blog post, we briefly explain what clustering is, why it’s a useful measure and newly added metrics that can be used with code samples.
Clustering - what is it?
Clustering is an unsupervised learning technique. The term unsupervised here refers to the fact that we do not have ground truth targets as we do in classification. The primary goal of clustering is to discover hidden patterns or structures within data without prior knowledge about the meaning or importance of particular features. Thus, clustering is a form of data exploration compared to supervised learning, where the goal is “just” to predict if a data point belongs to one class.
The key goal of clustering algorithms is to split data into clusters/sets where data points from the same cluster are more similar to each other than any other points from the remaining clusters. Some of the most common and widely used clustering algorithms are K-Means, Hierarchical clustering, and Gaussian Mixture Models (GMM).
An objective quality evaluation/measure is required regardless of the clustering algorithm or internal optimization criterion used. In general, we can divide all clustering metrics into two categories: extrinsic metrics and intrinsic metrics.
Extrinsic metrics
Extrinsic metrics are characterized by requirements of some ground truth labeling, even if used for an unsupervised method. This may seem counter-intuitive at first as we, by clustering definition, do not use such ground truth labeling. However, most clustering algorithms are still developed on datasets with labels available, so these metrics use this fact as an advantage.
Intrinsic metrics
In contrast, intrinsic metrics do not need any ground truth information. These metrics estimate inter-cluster consistency (cohesion of all points assigned to a single set) compared to other clusters (separation). This is often done by comparing the distance in the embedding space.
Update to Mean Average Precision
MeanAveragePrecision, the most widely used metric for object detection in computer vision, now supports two new arguments: average and backend.
The
averageargument controls averaging over multiple classes. By the core definition, the default way ismacroaveraging, where the metric is calculated for each class separately and then averaged together. This will continue to be the default in Torchmetrics, but now we also support the settingaverage="micro". Every object under this setting is essentially considered to be the same class, and the returned value is, therefore, calculated simultaneously over all objects.The second argument -
backend, is important, as it indicates what computational backend will be used for the internal computations. SinceMeanAveragePrecisionis not a simple metric to compute, and we value the correctness of our metric, we rely on some third-party library to do the internal computations. By default, we rely on users to have the official pycocotools installed, but with the new argument, we will also be supporting other backends.
[1.2.0] - 2023-09-22
Added
- Added metric to cluster package:
MutualInformationScore(#2008)RandScore(#2025)NormalizedMutualInfoScore(#2029)AdjustedRandScore(#2032)CalinskiHarabaszScore(#2036)DunnIndex(#2049)HomogeneityScore(#2053)CompletenessScore(#2053)VMeasureScore(#2053)FowlkesMallowsIndex(#2066)AdjustedMutualInfoScore(#2058)DaviesBouldinScore(#2071)
- Added
backendargument toMeanAveragePrecision(#2034)
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.1.0...v1.2.0
New Contributors since v1.1.0
- @matsumotosan made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2008
- @GlavitsBalazs made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2042
- @OmerShubi made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2081
- @munahaf made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/2082
Key Contributors
@matsumotosan, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Weekly patch release
[1.1.2] - 2023-09-11
Fixed
- Fixed tie breaking in ndcg metric (#2031)
- Fixed bug in
BootStrapperwhen very few samples were evaluated that could lead to crash (#2052) - Fixed bug when creating multiple plots that lead to not all plots being shown (#2060)
- Fixed performance issues in
RecallAtFixedPrecisionfor large batch sizes (#2042) - Fixed bug related to
MetricCollectionused with custom metrics haveprefix/postfixattributes (#2070)
Contributors
@GlavitsBalazs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 2 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Weekly patch release
[1.1.1] - 2023-08-29
Added
- Added
averageargument toMeanAveragePrecision(#2018)
Fixed
- Fixed bug in
PearsonCorrCoefis updated on single samples at a time (#2019) - Fixed support for pixel-wise MSE (#2017)
- Fixed bug in
MetricCollectionwhen used with multiple metrics that return dicts with same keys (#2027) - Fixed bug in detection intersection metrics when
class_metrics=Trueresulting in wrong values (#1924) - Fixed missing attributes
higher_is_better,is_differentiablefor some metrics (#2028)
Contributors
@adamjstewart, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Into Generative AI
In version v1.1 of Torchmetrics, in total five new metrics have been added, bringing the total number of metrics up to 128! In particular, we have two new exciting metrics for evaluating your favorite generative models for images.
Perceptual Path length
Introduced in the famous StyleGAN paper back in 2018 the Perceptual path length metric is used to quantify how smoothly a generator manages to interpolate between points in its latent space. Why does the smoothness of the latent space of your generative model matter? Assume you find an image at some point in your latent space that generates an image you like, but you would like to see if you could find a better one if you slightly change the latent point it was generated from. If your latent space could be smoother, this because very hard because even small changes to the latent point can lead to large changes in the generated image.
CLIP image quality assessment
CLIP image quality assessment (CLIPIQA) is a very recently proposed metric in this paper. The metrics build on the OpenAI CLIP model, which is a multi-modal model for connecting text and images. The core idea behind the metric is that different properties of an image can be assessed by measuring how similar the CLIP embedding of the image is to the respective CLIP embedding of a positive and negative prompt for that given property.
VIF, Edit, and SA-SDR
VisualInformationFidelityhas been added to the image package. The first proposed in this paper can be used to automatically assess the quality of images in a perceptual manner.EditDistancehave been added to the text package. A very classical metric for text that simply measures the amount of characters that need to be substituted, inserted, or deleted, to transform the predicted text into the reference text.SourceAggregatedSignalDistortionRatiohas been added to the audio package. Metric was originally proposed in this paper and is an improvement over the classical Signal-to-Distortion Ratio (SDR) metric (also found in torchmetrics) that provides more stable gradients during training when trying to train models for style source separation.
[1.1.0] - 2022-08-22
Added
- Added source aggregated signal-to-distortion ratio (SA-SDR) metric (#1882
- Added
VisualInformationFidelityto image package (#1830) - Added
EditDistanceto text package (#1906) - Added
top_kargument toRetrievalMRRin retrieval package (#1961) - Added support for evaluating
"segm"and"bbox"detection inMeanAveragePrecisionat the same time (#1928) - Added
PerceptualPathLengthto image package (#1939) - Added support for multioutput evaluation in
MeanSquaredError(#1937) - Added argument
extended_summarytoMeanAveragePrecisionsuch that precision, recall, iou can be easily returned (#1983) - Added warning to
ClipScoreif long captions are detected and truncate (#2001) - Added
CLIPImageQualityAssessmentto multimodal package (#1931) - Added new property
metric_stateto all metrics for users to investigate currently stored tensors in memory (#2006)
Full Changelog: https://github.com/Lightning-AI/torchmetrics/compare/v1.0.0...v1.1.0
New Contributors since v1.0.0
- @fansuregrin made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1892
- @salcc made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1934
- @IanMaquignaz made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1943
- @kn made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1955
- @Vivswan made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1982
- @njuaplusplus made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1986
Contributors
@bojobo, @lucadiliello, @quancs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Weekly patch release
[1.0.3] - 2022-08-08
Added
- Added warning to
MeanAveragePrecisionif too many detections are observed (#1978)
Fixed
- Fix support for int input for when
multidim_average="samplewise"in classification metrics (#1977) - Fixed x/y labels when plotting confusion matrices (#1976)
- Fixed IOU compute in cuda (#1982)
Contributors
@borda, @SkafteNicki^n, @Vivswan
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Weekly patch release
[1.0.2] - 2022-08-03
Added
- Added warning to
PearsonCorrCoeffif input has a very small variance for its given dtype (#1926)
Changed
- Changed all non-task specific classification metrics to be true subtypes of
Metric(#1963)
Fixed
- Fixed bug in
CalibrationErrorwhere calculations for double precision input was performed in float precision (#1919) - Fixed bug related to the
prefix/postfixarguments inMetricCollectionandClasswiseWrapperbeing duplicated (#1918) - Fixed missing AUC score when plotting classification metrics that support the
scoreargument (#1948)
Contributors
@borda, @SkafteNicki^n
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Weekly patch release
[1.0.1] - 2022-07-13
Fixed
- Fixes corner case when using
MetricCollectiontogether with aggregation metrics (#1896) - Fixed the use of
max_fprinAUROCmetric when only one class is present (#1895) - Fixed bug related to empty predictions for
IntersectionOverUnionmetric (#1892) - Fixed bug related to
MeanMetricand broadcasting of weights when Nans are present (#1898) - Fixed bug related to expected input format of pycoco in
MeanAveragePrecision(#1913)
Contributors
@fansuregrin, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Visualize metrics
We are happy to announce that the first major release of Torchmetrics, version v1.0, is publicly available. We have
worked hard on a couple of new features for this milestone release, but for v1.0.0, we have also managed to implement
over 100 metrics in torchmetrics.
Plotting
The big new feature of v1.0 is a built-in plotting feature. As the old saying goes: "A picture is worth a thousand words". Within machine learning, this is definitely also true for many things.
Metrics are one area that, in some cases, is definitely better showcased in a figure than as a list of floats. The only requirement for getting started with the plotting feature is installing matplotlib. Either install with pip install matplotlib or pip install torchmetrics[visual] (the latter option also installs Scienceplots and uses that as the default plotting style).
The basic interface is the same for any metric. Just call the new .plot method:
python
metric = AnyMetricYouLike()
for _ in range(num_updates):
metric.update(preds[i], target[i])
fig, ax = metric.plot()
The plot method by default does not require any arguments and will automatically call metric.compute internally on
whatever metric states have been accumulated.
[1.0.0] - 2022-07-04
Added
- Added
prefixandpostfixarguments toClasswiseWrapper(#1866) - Added speech-to-reverberation modulation energy ratio (SRMR) metric (#1792, #1872)
- Added new global arg
compute_with_cacheto control caching behaviour aftercomputemethod (#1754) - Added
ComplexScaleInvariantSignalNoiseRatiofor audio package (#1785) - Added
Runningwrapper for calculate running statistics (#1752) - Added
RelativeAverageSpectralErrorandRootMeanSquaredErrorUsingSlidingWindowto image package (#816) - Added support for
SpecificityAtSensitivityMetric (#1432) - Added support for plotting of metrics through
.plot()method (#1328, #1481, #1480, #1490, #1581, #1585, #1593, #1600, #1605, #1610, #1609, #1621, #1624, #1623, #1638, #1631, #1650, #1639, #1660, #1682, #1786) - Added support for plotting of audio metrics through
.plot()method (#1434) - Added
classesto output fromMAPmetric (#1419) - Added Binary group fairness metrics to classification package (#1404)
- Added
MinkowskiDistanceto regression package (#1362) - Added
pairwise_minkowski_distanceto pairwise package (#1362) - Added new detection metric
PanopticQuality(#929, #1527) - Added
PSNRBmetric (#1421) - Added
ClassificationTaskEnum and use in metrics (#1479) - Added
ignore_indexoption toexact_matchmetric (#1540) - Add parameter
top_ktoRetrievalMAP(#1501) - Added support for deterministic evaluation on GPU for metrics that uses
torch.cumsumoperator (#1499) - Added support for plotting of aggregation metrics through
.plot()method (#1485) - Added support for python 3.11 (#1612)
- Added support for auto clamping of input for metrics that uses the
data_range(#1606) - Added
ModifiedPanopticQualitymetric to detection package (#1627) - Added
PrecisionAtFixedRecallmetric to classification package (#1683) - Added multiple metrics to detection package (#1284)
IntersectionOverUnionGeneralizedIntersectionOverUnionCompleteIntersectionOverUnionDistanceIntersectionOverUnion
- Added
MultitaskWrapperto wrapper package (#1762) - Added
RelativeSquaredErrormetric to regression package (#1765) - Added
MemorizationInformedFrechetInceptionDistancemetric to image package (#1580)
Changed
- Changed
permutation_invariant_trainingto allow using a'permutation-wise'metric function (#1794) - Changed
update_countandupdate_calledfrom private to public methods (#1370) - Raise exception for invalid kwargs in Metric base class (#1427)
- Extend
EnumStrraisingValueErrorfor invalid value (#1479) - Improve speed and memory consumption of binned
PrecisionRecallCurvewith large number of samples (#1493) - Changed
__iter__method from raisingNotImplementedErrortoTypeErrorby setting toNone(#1538) FIDmetric will now raise an error if too few samples are provided (#1655)- Allowed FID with
torch.float64(#1628) - Changed
LPIPSimplementation to no more rely on third-party package (#1575) - Changed FID matrix square root calculation from
scipytotorch(#1708) - Changed calculation in
PearsonCorrCoeffto be more robust in certain cases (#1729) - Changed
MeanAveragePrecisiontopycocotoolsbackend (#1832)
Deprecated
- Deprecated domain metrics import from package root (#1685, #1694, #1696, #1699, #1703)
Removed
- Support for python 3.7 (#1640)
Fixed
- Fixed support in
MetricTrackerforMultioutputWrapperand nested structures (#1608) - Fixed restrictive check in
PearsonCorrCoef(#1649) - Fixed integration with
jsonargparseandLightningCLI(#1651) - Fixed corner case in calibration error for zero confidence input (#1648)
- Fix precision-recall curve based computations for float target (#1642)
- Fixed missing kwarg squeeze in
MultiOutputWrapper(#1675) - Fixed padding removal for 3d input in
MSSSIM(#1674) - Fixed
max_det_thresholdin MAP detection (#1712) - Fixed states being saved in metrics that use
register_buffer(#1728) - Fixed states not being correctly synced and device transfered in
MeanAveragePrecisionforiou_type="segm"(#1763) - Fixed use of
prefixandpostfixin nestedMetricCollection(#1773) - Fixed
axplotting logging in `MetricCollection (#1783) - Fixed lookup for punkt sources being downloaded in
RougeScore(#1789) - Fixed integration with lightning for
CompositionalMetric(#1761) - Fixed several bugs in
SpectralDistortionIndexmetric (#1808) - Fixed bug for corner cases in
MatthewsCorrCoef(#1812, #1863) - Fixed support for half precision in
PearsonCorrCoef(#1819) - Fixed number of bugs related to
average="macro"in classification metrics (#1821) - Fixed off-by-one issue when
ignore_index = num_classes + 1in Multiclass-jaccard (#1860)
New Contributors
- @theja-vanka made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1372
- @wilderrodrigues made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1391
- @Freed-Wu made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1402
- @reaganjlee made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1405
- @davidgilbertson made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1412
- @ValerianRey made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1430
- @EPronovost made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1427
- @felixdivo made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1438
- @ivnvalex made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1447
- @PangLuo made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1452
- @JustinGoheen made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1463
- @DavidZhang73 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1476
- @7shoe made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1474
- @srishti-git1110 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1481
- @niberger made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/929
- @shhs29 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1434
- @ihowell made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1525
- @venomouscyanide made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1480
- @ItamarChinn made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1540
- @vincentvaroquauxads made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1521
- @Bomme made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1501
- @alexkrz made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1490
- @clay-curry made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1547
- @clueless-skywatcher made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1362
- @marcocaccin made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1527
- @Piyush-97 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/816
- @FarzanT made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1583
- @basveeling made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1651
- @YeaMerci made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1684
- @fkroeber made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1712
- @soma2000-lang made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1421
- @maxi-w made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1726
- @wbeardall made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1765
- @RistoAle97 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1778
- @cdboer made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1820
- @bot66 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1828
- @cs-mshah made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1849
- @bojobo made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1851
- @goldenfire6 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1862
- @martinmeinke made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1860
- @Dibz15 made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1580
- @relativityhd made their first contribution in https://github.com/Lightning-AI/torchmetrics/pull/1866
Contributors
@alexkrz, @AndresAlgaba, @basveeling, @Bomme, @Borda, @Callidior, @clueless-skywatcher, @Dibz15, @EPronovost, @fkroeber, @ItamarChinn, @marcocaccin, @martinmeinke, @niberger, @Piyush-97, @quancs, @relativityhd, @shenoynikhil, @shhs29, @SkafteNicki, @soma2000-lang, @srishti-git1110, @stancld, @twsl, @ValerianRey, @venomouscyanide, @wbeardall
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.11.4] - 2023-03-10
Fixed
- Fixed evaluation of
R2Scorewith the near constant target (#1576) - Fixed
dtypeconversion when the metric is submodule (#1583) - Fixed bug related to
top_k>1andignore_index!=NoneinStatScoresbased metrics (#1589) - Fixed corner case for
PearsonCorrCoefwhen running in DDP mode but only on a single device (#1587) - Fixed overflow error for specific cases in
MAPwhen big areas are calculated (#1607)
Contributors
@borda, @FarzanT, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/metrics/compare/v0.11.3...v0.11.4
Scientific Software - Peer-reviewed
- Python
Published by Borda about 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.11.3] - 2023-02-28
Fixed
- Fixed classification metrics for
byteinput (#1521) - Fixed the use of
ignore_indexinMulticlassJaccardIndex(#1386)
Contributors
@SkafteNicki, @vincentvaroquauxads
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/metrics/compare/v0.11.2...v0.11.3
Scientific Software - Peer-reviewed
- Python
Published by Borda about 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.11.2] - 2023-02-21
Fixed
- Fixed compatibility between XLA in
_bincountfunction (#1471) - Fixed type hints in methods belonging to
MetricTrackerwrapper (#1472) - Fixed
multilabelinExactMatch(#1474)
Contributors
@7shoe, @borda, @SkafteNicki, @ValerianRey
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/metrics/compare/v0.11.1...v0.11.2
Scientific Software - Peer-reviewed
- Python
Published by Borda over 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.11.1] - 2023-01-30
Fixed
- Fixed type checking on the
maximizeparameter at the initialization ofMetricTracker(#1428) - Fixed mixed precision auto-cast for
SSIMmetric (#1454) - Fixed checking for
nltk.punktinRougeScoreif a machine is not online (#1456) - Fixed wrongly reset method in
MultioutputWrapper(#1460) - Fixed
dtypechecking inPrecisionRecallCurvefortargettensor (#1457)
Contributors
@borda, @SkafteNicki, @stancld
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: https://github.com/Lightning-AI/metrics/compare/v0.11.0...v0.11.1
Scientific Software - Peer-reviewed
- Python
Published by Borda over 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Adding Multimodal and nominal domain
We are happy to announce that Torchmetrics v0.11 is now publicly available. In Torchmetrics v0.11 we have primarily focused on the cleanup of the large classification refactor from v0.10 and adding new metrics. With v0.11 are crossing 90+ metrics in Torchmetrics nearing the milestone of having 100+ metrics.
New domains
In Torchmetrics we are not only looking to expand with new metrics in already established metric domains such as classification or regression, but also new domains. We are therefore happy to report that v0.11 includes two new domains: Multimodal and nominal.
Multimodal
If there is one topic within machine learning that is hot right now then it is generative models and in particular image-to-text generative models. Just recently stable diffusion v2 was released, able to create even more photorealistic images from a single text prompt than ever
In Torchmetrics v0.11 we are adding a new domain called multimodal to support the evaluation of such models. For now, we are starting out with a single metric, the CLIPScore from this paper that can be used to evaluate such image-to-text models. CLIPScore currently achieves the highest correlation with human judgment, and thus a high CLIPScore for an image-text pair means that it is highly plausible that an image caption and an image are related to each other.
Nominal
If you have ever taken any course in statistics or introduction to machine learning you should hopefully have heard about data can be of different types of attributes: nominal, ordinal, interval, and ratio. This essentially refers to how data can be compared. For example, nominal data cannot be ordered and cannot be measured. An example, would it be data that describes the color of your car: blue, red, or green? It does not make sense to compare the different values. Ordinal data can be compared but does have not a relative meaning. An example, would it be the safety rating of a car: 1,2,3? We can say that 3 is better than 1 but the actual numerical value does not mean anything.
In v0.11 of TorchMetrics, we are adding support for classic metrics on nominal data. In fact, 4 new metrics have already been added to this domain:
- CramersV
- PearsonsContingencyCoefficient
- TschuprowsT
- TheilsU
All metrics are measures of association between two nominal variables, giving a value between 0 and 1, with 1 meaning that there is a perfect association between the variables.
Small improvements
In addition to metrics within the two new domains v0.11 of Torchmetrics contains other smaller changes and fixes:
TotalVariationmetric has been added to the image package, which measures the complexity of an image with respect to its spatial variation.MulticlassExactMatchmetric has been added to the classification package, which for example can be used to measure sentence level accuracy where all tokens need to match for a sentence to be counted as correctKendallRankCorrCoefhave been added to the regression package for measuring the overall correlation between two variablesLogCoshErrorhave been added to the regression package for measuring the residual error between two variables. It is similar to the mean squared error close to 0 but similar to the mean absolute error away from 0.
Finally, Torchmetrics now only supports v1.8 and higher of Pytorch. It was necessary to increase from v1.3 to secure because we were running into compatibility issues with an older version of Pytorch. We strive to support as many versions of Pytorch, but for the best experience, we always recommend keeping Pytorch and Torchmetrics up to date.
[0.11.0] - 2022-11-30
Added
- Added
MulticlassExactMatchto classification metrics (#1343) - Added
TotalVariationto image package (#978) - Added
CLIPScoreto new multimodal package (#1314) - Added regression metrics:
KendallRankCorrCoef(#1271)LogCoshError(#1316)
- Added new nominal metrics:
CramersV(#1298)PearsonsContingencyCoefficient(#1334)TschuprowsT(#1334)TheilsU(#1337)
- Added option to pass
distributed_available_fnto metrics to allow checks for custom communication backend for makingdist_sync_fnactually useful (#1301) - Added
normalizeargument toInception,FID,KIDmetrics (#1246)
Changed
- Changed minimum Pytorch version to be 1.8 (#1263)
- Changed interface for all functional and modular classification metrics after refactor (#1252)
Removed
- Removed deprecated
BinnedAveragePrecision,BinnedPrecisionRecallCurve,RecallAtFixedPrecision(#1251) - Removed deprecated
LabelRankingAveragePrecision,LabelRankingLossandCoverageError(#1251) - Removed deprecated
KLDivergenceandAUC(#1251)
Fixed
- Fixed precision bug in
pairwise_euclidean_distance(#1352)
Contributors
@borda, @justusschock, @ragavvenkatesan, @shenoynikhil, @SkafteNicki, @stancld
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.10.3] - 2022-11-16
Fixed
- Fixed bug in
Metrictracker.best_metricwhenreturn_step=False(#1306) - Fixed bug to prevent users from going into an infinite loop if trying to iterate of a single metric (#1320)
- Fixed bug in
Metrictracker.best_metricwhenreturn_step=False(#1306)
Contributors
@SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Fixed Performance
[0.10.2] - 2022-10-31
Changed
- Changed in-place operation to out-of-place operation in
pairwise_cosine_similarity(#1288)
Fixed
- Fixed high memory usage for certain classification metrics when
average='micro'(#1286) - Fixed precision problems when
structural_similarity_index_measurewas used with autocast (#1291) - Fixed slow performance for confusion matrix-based metrics (#1302)
- Fixed restrictive dtype checking in
spearman_corrcoefwhen used with autocast (#1303)
Contributors
@SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.10.1] - 2022-10-21
Fixed
- Fixed broken clone method for classification metrics (#1250)
- Fixed unintentional downloading of
nltk.punktwhenlsumnot inrouge_keys(#1258) - Fixed type casting in
MAPmetric betweenboolandfloat32(#1150)
Contributors
@dreaquil, @SkafteNicki, @stancld
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Large changes to classifications
TorchMetrics v0.10 is now out, significantly changing the whole classification package. This blog post will go over the reasons why the classification package needs to be refactored, what it means for our end users, and finally, what benefits it gives. A guide on how to upgrade your code to the recent changes can be found near the bottom.
Why the classification metrics need to change
We have for a long time known that there were some underlying problems with how we initially structured the classification package. Essentially, classification tasks can e divided into either binary, multiclass, or multilabel, and determining what task a user is trying to run a given metric on is hard just based on the input. The reason a package such as sklearn can do this is to only support input in very specific formats (no multi-dimensional arrays and no support for both integer and probability/logit formats).
This meant that some metrics, especially for binary tasks, could have been calculating something different than expected if the user were to provide another shape but the expected. This is against the core value of TorchMetrics, that our users, of course should trust that the metric they are evaluating is given the excepted result.
Additionally, classification metrics were missing consistency. For some, metrics num_classes=2 meant binary, and for others num_classes=1 meant binary. You can read more about the underlying reasons for this refactor in this and this issue.
The solution
The solution we went with was to split every classification metric into three separate metrics with the prefix binary_*, multiclass_* and multilabel_*. This solves a number of the above problems out of the box because it becomes easier for us to match our users' expectations for any given input shape. It additionally has some other benefits both for us as developers and ends users
- Maintainability: by splitting the code into three distinctive functions, we are (hopefully) lowering the code complexity, making the codebase easier to maintain in the long term.
- Speed: by completely removing the auto-detection of task at runtime, we can significantly increase computational speed (more on this later).
- Task-specific arguments: by splitting into three functions, we also make it more clear what input arguments affect the computed result. Take - Accuracy as an example: both numclasses , topk , average are arguments that have an influence if you are doing multiclass classification but doing nothing for binary classification and vice versa with the thresholds argument. The task-specific versions only contain the arguments that influence the given task.
- There are many smaller quality-of-life improvements hidden throughout the refactor, however here are our top 3:
Standardized arguments
The input arguments for the classification package are now much more standardized. Here are a few examples:
- Each metric now only supports arguments that influence the final result. This means that numclasses is removed from all `binary
metrics are now required for allmulticlass_metrics and renamed tonumlabelsfor allmultilabel*` metrics. - The
ignore_indexargument is now supported by ALL classification metrics and supports any value and not only values in the [0,num_classes] range (similar to torch loss functions). Below is shown an example: - We added a new
validate_argsto all classification metrics to allow users to skip validation of inputs making the computations completely faster. By default, we will still do input validation because it is the safest option for the user. Still, if you are confident that the input to the metric is correct, then you can now disable this, checking for a potential speed-up (more on this later).
Constant memory implementations
Some of the most useful metrics for evaluating classification problems are metrics such as ROC, AUROC, AveragePrecision, etc., because they not only evaluate your model for a single threshold but a whole range of thresholds, essentially giving you the ability to see the trade-off between Type I and Type II errors. However, a big problem with the standard formulation of these metrics (which we have been using) is that they require access to all data for their calculation. Our implementation has been extremely memory-intensive for these kinds of metrics.
In v0.10 of TorchMetrics, all these metrics now have an argument called thresholds. By default, it is None and the metric will still save all targets and predictions in memory as you are used to. However, if this argument is instead set to a tensor - torch.linspace(0,1,100) it will instead use a constant-memory approximation by evaluating the metric under those provided thresholds.
Setting thresholds=None has an approximate memory footprint of O(numsamples) whereas using thresholds=torch.linspace(0,1,100) has an approximate memory footprint of `O(numthresholds)`. In this particular case, users will save memory when the metric is computed on more than 100 samples. This feature can save memory by comparing this to modern machine learning, where evaluation is often done on thousands to millions of data points.
This also means that the Binned* metrics that currently exist in TorchMetrics are being deprecated as their functionality is now captured by this argument.
All metrics are faster (ish)
By splitting each metric into 3 separate metrics, we reduce the number of calculations needed. We, therefore, expected out-of-the-box that our new implementations would be faster. The table below shows the timings of different metrics with the old and new implementations (with and without input validation). Numbers in parentheses denote speed-up over old implementations.
The following observations can be made:
- Some metrics are a bit faster (1.3x), and others are much faster (4.6x) after the refactor!
- Disabling input validation can speed up things. For example,
multiclass_confusion_matrixgoes from a speedup of 3.36x to 4.81 when input validation is disabled. A clear advantage for users that are familiar with the metrics and do not need validation of their input at every update. - If we compare binary with multiclass, the biggest speedup can be seen for multiclass problems.
- Every metric is faster except for the precision-recall curve, even the new approximative binning method. This is a bit strange, as the non-approximation should be equally fast (it's the same code). We are actively looking into this.
[0.10.0] - 2022-10-04
Added
- Added a new NLP metric
InfoLM(#915) - Added
Perplexitymetric (#922) - Added
ConcordanceCorrCoefmetric to regression package (#1201) - Added argument
normalizetoLPIPSmetric (#1216) - Added support for multiprocessing of batches in
PESQmetric (#1227) - Added support for multioutput in
PearsonCorrCoefandSpearmanCorrCoef(#1200)
Changed
- Classification refactor (#1054, #1143, #1145, #1151, #1159, #1163, #1167, #1175, #1189, #1197, #1215, #1195)
- Changed update in
FIDmetric to be done in an online fashion to save memory (#1199) - Improved performance of retrieval metrics (#1242)
- Changed
SSIMandMSSSIMupdate to be online to reduce memory usage (#1231)
Fixed
- Fixed a bug in
ssimwhenreturn_full_image=Truewhere the score was still reduced (#1204) - Fixed MPS support for:
- MAE metric (#1210)
- Jaccard index (#1205)
- Fixed bug in
ClasswiseWrappersuch thatcomputegave wrong result (#1225) - Fixed synchronization of empty list states (#1219)
Contributors
@Borda, @bryant1410, @geoffrey-g-delhomme, @justusschock, @lucadiliello, @nicolas-dufour, @Queuecumber, @SkafteNicki, @stancld
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 3 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.9.3] - 2022-08-22
Added
- Added global option
sync_on_computeto disable automatic synchronization whencomputeis called (#1107)
Fixed
- Fixed missing reset in
ClasswiseWrapper(#1129) - Fixed
JaccardIndexmulti-label compute (#1125) - Fix SSIM propagate device if
gaussian_kernelis False, add test (#1149)
Contributors
@KeVoyer1, @krshrimali, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.9.2] - 2022-06-29
Fixed
- Fixed mAP calculation for areas with 0 predictions (#1080)
- Fixed bug where avg precision state and auroc state was not merge when using MetricCollections (#1086)
- Skip box conversion if no boxes are present in
MeanAveragePrecision(#1097) - Fixed inconsistency in docs and code when setting
average="none"inAvaragePrecisionmetric (#1116)
Contributors
@23pointsNorth, @kouyk, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor PL compatibility patch
[0.9.1] - 2022-06-08
Added
- Added specific
RuntimeErrorwhen metric object is on the wrong device (#1056) - Added an option to specify own n-gram weights for
BLEUScoreandSacreBLEUScoreinstead of using uniform weights only. (#1075)
Fixed
- Fixed aggregation metrics when input only contains zero (#1070)
- Fixed
TypeErrorwhen providing superclass arguments askwargs(#1069) - Fixed bug related to state reference in metric collection when using compute groups (#1076)
Contributors
@jlcsilva, @SkafteNicki, @stancld
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Faster forward
Highligths
TorchMetrics v0.9 is now out, and it brings significant changes to how the forward method works. This blog post goes over these improvements and how they affect both users of TorchMetrics and users that implement custom metrics. TorchMetrics v0.9 also includes several new metrics and bug fixes.
Blog: TorchMetrics v0.9 — Faster forward
The Story of the Forward Method
Since the beginning of TorchMetrics, Forward has served the dual purpose of calculating the metric on the current batch and accumulating in a global state. Internally, this was achieved by calling update twice: one for each purpose, which meant repeating the same computation. However, for many metrics, calling update twice is unnecessary to achieve both the local batch statistics and accumulating globally because the global statistics are simple reductions of the local batch states.
In v0.9, we have finally implemented a logic that can take advantage of this and will only call update once before making a simple reduction. As you can see in the figure below, this can lead to a single call of forward being 2x faster in v0.9 compared to v0.8 of the same metric.
With the improvements to forward, many metrics have become significantly faster (up to 2x)
It should be noted that this change mainly benefits metrics (for example, confusionmatrix) where calling update is expensive.
We went through all existing metrics in TorchMetrics and enabled this feature for all appropriate metrics, which was almost 95% of all metrics. We want to stress that if you are using metrics from TorchMetrics, nothing has changed to the API, and no code changes are necessary.
[0.9.0] - 2022-05-31
Added
- Added
RetrievalPrecisionRecallCurveandRetrievalRecallAtFixedPrecisionto retrieval package (#951) - Added class property
full_state_updatethat determinesforwardshould callupdateonce or twice (#984,#1033) - Added support for nested metric collections (#1003)
- Added
Diceto classification package (#1021) - Added support to segmentation type
segmas IOU for mean average precision (#822)
Changed
- Renamed
reductionargument toaveragein Jaccard score and added additional options (#874)
Removed
- Removed deprecated
compute_on_stepargument (#962, #967, #979 ,#990, #991, #993, #1005, #1004, #1007)
Fixed
- Fixed non-empty state
dictfor a few metrics (#1012) - Fixed bug when comparing states while finding compute groups (#1022)
- Fixed
torch.doublesupport in stat score metrics (#1023) - Fixed
FIDcalculation for non-equal size real and fake input (#1028) - Fixed case where
KLDivergencecould outputNan(#1030) - Fixed deterministic for PyTorch<1.8 (#1035)
- Fixed default value for
mdmc_averageinAccuracy(#1036) - Fixed missing copy of property when using compute groups in
MetricCollection(#1052)
Contributors
@Borda, @burglarhobbit, @charlielito, @gianscarpe, @MrShevan, @phaseolud, @razmikmelikbekyan, @SkafteNicki, @tanmoyio, @vumichien
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.8.2] - 2022-05-06
Fixed
- Fixed multi-device aggregation in
PearsonCorrCoef(#998) - Fixed MAP metric when using a custom list of thresholds (#995)
- Fixed compatibility between compute groups in
MetricCollectionand prefix/postfix arg (#1007) - Fixed compatibility with future Pytorch 1.12 in
safe_matmul(#1011, #1014)
Contributors
@ben-davidson-6, @Borda, @SkafteNicki, @tanmoyio
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda about 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.8.1] - 2022-04-27
Changed
- Reimplemented the
signal_distortion_ratiometric, which removed the absolute requirement offast-bss-eval(#964)
Fixed
- Fixed "Sort currently does not support bool dtype on CUDA" error in MAP for empty preds (#983)
- Fixed
BinnedPrecisionRecallCurvewhenthresholdsargument is not provided (#968) - Fixed
CalibrationErrorto work on logit input (#985)
Contributors
@DuYicong515, @krshrimali, @quancs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda about 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Faster collection and more metrics!
We are excited to announce that TorchMetrics v0.8 is now available. The release includes several new metrics in the classification and image domains and some performance improvements for those working with metrics collections.
Metric collections just got faster
Common wisdom dictates that you should never evaluate the performance of your models using only a single metric but instead a collection of metrics. For example, it is common to simultaneously evaluate the accuracy, precision, recall, and f1 score in classification. In TorchMetrics, we have for a long time provided the MetricCollection object for chaining such metrics together for an easy interface to calculate them all at once. However, in many cases, such a collection of metrics shares some of the underlying computations that have been repeated for every metric in the collection. In Torchmetrics v0.8 we have introduced the concept of compute_groups to MetricCollection that will, as default, be auto-detected and group metrics that share some of the same computations.
Thus, if you are using MetricCollections in your code, upgrading to TorchMetrics v0.8 should automatically make your code run faster without any code changes.
Many exciting new metrics
TorchMetrics v0.8 includes several new metrics within the classification and image domain, both for the functional and modular API. We refer to the documentation for the full description of all metrics if you want to learn more about them.
SpectralAngleMapperor SAM was added to the image package. This metric can calculate the spectral similarity between given reference spectra and estimated spectra.CoverageErrorwas added to the classification package. This metric can be used when you are working with multi-label data. The metric works similar to thesklearncounterpart and computes how far you need to go through ranked scores such that all true labels are covered.LabelRankingAveragePrecisionandLabelRankingLosswere added to the classification package. Both metrics are used in multi-label ranking problems, where the goal is to give a better rank to the labels associated with each sample. Each metric gives a measure of how well your model is doing this.ErrorRelativeGlobalDimensionlessSynthesisor ERGAS was added to the image package. This metric can be used to calculate the accuracy of Pan sharpened images considering the normalized average error of each band of the resulting image.UniversalImageQualityIndexwas added to the image package. This metric can assess the difference between two images, which considers three different factors when computed: loss of correlation, luminance distortion, and contrast distortion.ClasswiseWrapperwas added to the wrapper package. This wrapper can be used in combinations with metrics that return multiple values (such as classification metrics with the average=None argument). The wrapper will unwrap the result into adictwith a label for each value.
[0.8.0] - 2022-04-14
Added
- Added
WeightedMeanAbsolutePercentageErrorto regression package (#948) - Added new classification metrics:
CoverageError(#787)LabelRankingAveragePrecisionandLabelRankingLoss(#787)
- Added new image metric:
SpectralAngleMapper(#885)ErrorRelativeGlobalDimensionlessSynthesis(#894)UniversalImageQualityIndex(#824)SpectralDistortionIndex(#873)
- Added support for
MetricCollectioninMetricTracker(#718) - Added support for 3D image and uniform kernel in
StructuralSimilarityIndexMeasure(#818) - Added smart update of
MetricCollection(#709) - Added
ClasswiseWrapperfor better logging of classification metrics with multiple output values (#832) - Added
**kwargsargument for passing additional arguments to base class (#833) - Added negative
ignore_indexfor the Accuracy metric (#362) - Added
adaptive_kfor theRetrievalPrecisionmetric (#910) - Added
reset_real_featuresargument image quality assessment metrics (#722) - Added new keyword argument
compute_on_cputo all metrics (#867)
Changed
- Made
num_classesinjaccard_indexa required argument (#853, #914) - Added normalizer, tokenizer to ROUGE metric (#838)
- Improved shape checking of
permutation_invariant_training(#864) - Allowed reduction
None(#891) MetricTracker.best_metricwill now give a warning when computing on metric that do not have a best (#913)
Deprecated
- Deprecated argument
compute_on_step(#792) - Deprecated passing in
dist_sync_on_step,process_group,dist_sync_fndirect argument (#833)
Removed
- Removed support for versions of Lightning lower than v1.5 (#788)
- Removed deprecated functions, and warnings in Text (#773)
WERandfunctional.wer
- Removed deprecated functions and warnings in Image (#796)
SSIMandfunctional.ssimPSNRandfunctional.psnr
- Removed deprecated functions, and warnings in classification and regression (#806)
FBetaandfunctional.fbetaF1andfunctional.f1Hingeandfunctional.hingeIoUandfunctional.iouMatthewsCorrcoefPearsonCorrcoefSpearmanCorrcoef
- Removed deprecated functions, and warnings in detection and pairwise (#804)
MAPandfunctional.pairwise.manhatten
- Removed deprecated functions, and warnings in Audio (#805)
PESQandfunctional.audio.pesqPITandfunctional.audio.pitSDRandfunctional.audio.sdrandfunctional.audio.si_sdrSNRandfunctional.audio.snrandfunctional.audio.si_snrSTOIandfunctional.audio.stoi
Fixed
- Fixed device mismatch for
MAPmetric in specific cases (#950) - Improved testing speed (#820)
- Fixed compatibility of
ClasswiseWrapperwith theprefixargument ofMetricCollection(#843) - Fixed
BestScoreon GPU (#912) - Fixed Lsum computation for
ROUGEScore(#944)
Contributors
@ankitaS11, @ashutoshml, @Borda, @hookSSi, @justusschock, @lucadiliello, @quancs, @rusty1s, @SkafteNicki, @stancld, @vumichien, @weningerleon, @yassersouri
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda about 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.7.3] - 2022-03-22
Fixed
- Fixed unsafe log operation in
TweedieDeviacefor power=1 (#847) - Fixed bug in MAP metric related to either no ground truth or no predictions (#884)
- Fixed
ConfusionMatrix,AUROCandAveragePrecisionon GPU when running in deterministic mode (#900) - Fixed NaN or Inf results returned by
signal_distortion_ratio(#899) - Fixed memory leak when using
updatemethod with tensor whererequires_grad=True(#902)
Contributors
@mtailanian, @quancs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda about 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - JOSS paper
[0.7.2] - 2022-02-10
Fixed
- Minor patches in JOSS paper.
Scientific Software - Peer-reviewed
- Python
Published by Borda over 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Improve mAP performance
[0.7.1] - 2022-02-03
Changed
- Used
torch.bucketizein calibration error whentorch>1.8for faster computations (#769) - Improve mAP performance (#742)
Fixed
- Fixed check for available modules (#772)
- Fixed Matthews correlation coefficient when the denominator is 0 (#781)
Contributors
@Borda, @ramonemiliani93, @SkafteNicki, @twsl
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - New NLP metrics and improved API
We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever.
NLP metrics - Text package
Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluation of the ROUGE score using multiple references.
Argument unification
Importantly, all text metrics assume preds, target input order with these explicit keyword arguments. If different naming was used before v0.7, it is deprecated and completely removed in v0.8.
Import and naming changes
TorchMetrics v0.7 brings more extensive and minor changes to how metrics should be imported. The import changes directly impact v0.7, meaning that you will most likely need to change the import statement for some specific metrics. All naming changes follow our standard deprecation process, meaning that in v0.7, any metric that is renamed will still work but raise an error asking to use the new metric name. From v0.8, the old metric names will no longer be available.
[0.7.0] - 2022-01-17
Added
- Added NLP metrics:
MatchErrorRate(#619)WordInfoLostandWordInfoPreserved(#630)SQuAD(#623)CHRFScore(#641)TranslationEditRate(#646)ExtendedEditDistance(#668)
- Added
MultiScaleSSIMinto image metrics (#679) - Added Signal to Distortion Ratio (
SDR) to audio package (#565) - Added
MinMaxMetricto wrappers (#556) - Added
ignore_indexto retrieval metrics (#676) - Added support for multi references in
ROUGEScore(#680) - Added a default VSCode devcontainer configuration (#621)
Changed
- Scalar metrics will now consistently have additional dimensions squeezed (#622)
- Metrics having third party dependencies removed from global import (#463)
- Untokenized for
BLEUScoreinput stay consistent with all the other text metrics (#640) - Arguments reordered for
TER,BLEUScore,SacreBLEUScore,CHRFScorenow the expected input order is predictions first and target second (#696) - Changed dtype of metric state from
torch.floattotorch.longinConfusionMatrixto accommodate larger values (#715) - Unify
preds,targetinput argument's naming across all text metrics (#723, #727)bert,bleu,chrf,sacre_bleu,wip,wil,cer,ter,wer,mer,rouge,squad
Deprecated
- Renamed IoU -> Jaccard Index (#662)
- Renamed text WER metric: (#714)
functional.wer->functional.word_error_rateWER->WordErrorRate
- Renamed correlation coefficient classes: (#710)
MatthewsCorrcoef->MatthewsCorrCoefPearsonCorrcoef->PearsonCorrCoefSpearmanCorrcoef->SpearmanCorrCoef
- Renamed audio STOI metric: (#753, #758)
audio.STOItoaudio.ShortTimeObjectiveIntelligibilityfunctional.audio.stoitofunctional.audio.short_time_objective_intelligibility
- Renamed audio PESQ metrics: (#751)
functional.audio.pesq->functional.audio.perceptual_evaluation_speech_qualityaudio.PESQ->audio.PerceptualEvaluationSpeechQuality
- Renamed audio SDR metrics: (#711)
functional.sdr->functional.signal_distortion_ratiofunctional.si_sdr->functional.scale_invariant_signal_distortion_ratioSDR->SignalDistortionRatioSI_SDR->ScaleInvariantSignalDistortionRatio
- Renamed audio SNR metrics: (#712)
functional.snr->functional.signal_distortion_ratiofunctional.si_snr->functional.scale_invariant_signal_noise_ratioSNR->SignalNoiseRatioSI_SNR->ScaleInvariantSignalNoiseRatio
- Renamed F-score metrics: (#731, #740)
-
functional.f1->functional.f1_score -
F1->F1Score -
functional.fbeta->functional.fbeta_score -
FBeta->FBetaScore
-
- Renamed Hinge metric: (#734)
-
functional.hinge->functional.hinge_loss -
Hinge->HingeLoss
-
- Renamed image PSNR metrics (#732)
functional.psnr->functional.peak_signal_noise_ratioPSNR->PeakSignalNoiseRatio
- Renamed image PIT metric: (#737)
-
functional.pit->functional.permutation_invariant_training -
PIT->PermutationInvariantTraining
-
- Renamed image SSIM metric: (#747)
-
functional.ssim->functional.scale_invariant_signal_noise_ratio -
SSIM->StructuralSimilarityIndexMeasure
-
- Renamed detection
MAPtoMeanAveragePrecisionmetric (#754) - Renamed Fidelity & LPIPS image metric: (#752)
-
image.FID->image.FrechetInceptionDistance -
image.KID->image.KernelInceptionDistance -
image.LPIPS->image.LearnedPerceptualImagePatchSimilarity
-
Removed
- Removed
embedding_similaritymetric (#638) - Removed argument
concatenate_textsfromwermetric (#638) - Removed arguments
newline_sepanddecimal_placesfromrougemetric (#638)
Fixed
- Fixed MetricCollection kwargs filtering when no
kwargsare present in update signature (#707)
Contributors
@ashutoshml, @Borda, @cuent, @Fariborzzz, @getgaurav2, @janhenriklambrechts, @justusschock, @karthikrangasai, @lucadiliello, @mahinlma, @mathemusician, @mona0809, @mrleu, @puhuk, @quancs, @SkafteNicki, @stancld, @twsl
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Fixing mAP on GPU
[0.6.2] - 2021-12-15
Fixed
- Fixed
torch.sortcurrently does not support booldtypeon CUDA (#665) - Fixed mAP properly checks if ground truths are empty (#684)
- Fixed initialization of tensors to be on the correct device for
MAPmetric (#673)
Contributors
@OlofHarrysson, @tkupek, @twsl
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Own mAP implementation
[0.6.1] - 2021-12-06
Changed
- Migrate MAP metrics from pycocotools to PyTorch (#632)
- Use
torch.topkinstead oftorch.argsortin retrieval precision for speedup (#627)
Fixed
- Fix empty predictions in MAP metric (#594, #610, #624)
- Fix edge case of AUROC with
average=weightedon GPU (#606) - Fixed
forwardin compositional metrics (#645)
Contributors
@Callidior, @SkafteNicki, @tkupek, @twsl, @zuoxingdong
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - More metrics than ever
[0.6.0] - 2021-10-28
We are excited to announce that Torchmetrics v0.6 is now publicly available. TorchMetrics v0.6 does not focus on specific domains but adds a ton of new metrics to several domains, thus increasing the number of metrics in the repository to over 60! Not only have v0.6 added metrics within already covered domains, but we also add support for two new: Pairwise metrics and detection.
https://devblog.pytorchlightning.ai/torchmetrics-v0-6-more-metrics-than-ever-e98c3983621e
Pairwise Metrics
TorchMetrics v0.6 offers a new set of metrics in its functional backend for calculating pairwise distances. Given a tensor X with shape [N,d] (N observations, each in d dimensions), a pairwise metric calculates [N,N] matrix of all possible combinations between the rows of X.
Detection
TorchMetrics v0.6 now includes a detection package that provides for the MAP metric. The implementation essentially wraps pycocotools around securing that we get the correct value, but with the benefit of now being able to scale to multiple devices (as any other metric in TorchMetrics).
New additions
In the
audiopackage, we have two new metrics: Perceptual Evaluation of Speech Quality (PESQ) and Short Term Objective Intelligibility (STOI). Both metrics can be used to assert speech quality.In the
retrievalpackage, we also have two new metrics: R-precision and Hit-rate. R-precision corresponds to recall at the R-th position of the query. The hit rate is the ratio of the total number of hits returned as a result of a query (hits) to the total number of hits returned.The
textpackage also receives an update in the form of two new metrics: Sacre BLEU score and character error rate. Sacre BLUE score provides and more systematic way of comparing BLUE scores across tasks. The character error rate is similar to the word error rate but instead calculates if a given algorithm has correctly predicted a sentence based on a character-by-character comparison.The
regressionpackage got a single new metric in the form of the Tweedie deviance score metric. Deviance scores are generally a better measure of fit than measures such as squared error when trying to model data coming from highly screwed distributions.Finally, we have added five new metrics for simple aggregation:
SumMetric,MeanMetric,MinMetric,MaxMetric,CatMetric. All five metrics take in a single input (either native python floats ortorch.Tensor) and keep track of the sum, average, min, etc. These new aggregation metrics are especially useful in combination with self.log from lightning if you want to log something other than the average of the metric you are tracking.
Detail changes
Added
- Added audio metrics:
- Perceptual Evaluation of Speech Quality (PESQ) (#353)
- Short Term Objective Intelligibility (STOI) (#353)
- Added Information retrieval metrics:
RetrievalRPrecision(#577)RetrievalHitRate(#576)
- Added NLP metrics:
SacreBLEUScore(#546)CharErrorRate(#575)
- Added other metrics:
- Tweedie Deviance Score (#499)
- Learned Perceptual Image Patch Similarity (LPIPS) (#431)
- Added
MAP(mean average precision) metric to new detection package (#467) - Added support for float targets in
nDCGmetric (#437) - Added
averageargument toAveragePrecisionmetric for reducing multi-label and multi-class problems (#477) - Added
MultioutputWrapper(#510) - Added metric sweeping:
higher_is_betteras constant attribute (#544)higher_is_betterto rest of codebase (#584)
- Added simple aggregation metrics:
SumMetric,MeanMetric,CatMetric,MinMetric,MaxMetric(#506) - Added pairwise submodule with metrics (#553)
pairwise_cosine_similaritypairwise_euclidean_distancepairwise_linear_similaritypairwise_manhatten_distance
Changed
AveragePrecisionwill now as default output themacroaverage for multilabel and multiclass problems (#477)half,double,floatwill no longer change the dtype of the metric states. Usemetric.set_dtypeinstead (#493)- Renamed
AverageMetertoMeanMetric(#506) - Changed
is_differentiablefrom property to a constant attribute (#551) ROCandAUROCwill no longer throw an error when either the positive or negative class is missing. Instead, return 0 scores and give a warning
Deprecated
- Deprecated
torchmetrics.functional.self_supervised.embedding_similarityin favour of new pairwise submodule
Removed
- Removed
dtypeproperty (#493)
Fixed
- Fixed bug in
F1withaverage='macro'andignore_index!=None(#495) - Fixed bug in
pitby using the returned first result to initialize device and type (#533) - Fixed
SSIMmetric using too much memory (#539) - Fixed bug where
deviceproperty was not properly updated when the metric was a child of a module (#542)
Contributors
@an1lam, @Borda, @karthikrangasai, @lucadiliello, @mahinlma, @Obus, @quancs, @SkafteNicki, @stancld, @tkupek
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Own NLP implementations
[0.5.1] - 2021-08-30
Added
- Added
deviceanddtypeproperties (#462) - Added
TextTesterclass for robustly testing text metrics (#450)
Changed
- Added support for float targets in
nDCGmetric (#437)
Removed
- Removed
rouge-scoreas dependency for text package (#443) - Removed
jiweras dependency for text package (#446) - Removed
bert-scoreas dependency for text package (#473)
Fixed
- Fixed ranking of samples in
SpearmanCorrCoefmetric (#448) - Fixed bug where compositional metrics where unable to sync because of type mismatch (#454)
- Fixed metric hashing (#478)
- Fixed
BootStrappermetrics not working on GPU (#462) - Fixed the semantic ordering of kernel height and width in
SSIMmetric (#474)
Contributors
@justusschock, @karthikrangasai, @kingyiusuen, @Obus, @SkafteNicki, @stancld
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda over 4 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Text-related (NLP) metrics
[0.5.0] - 2021-08-09
This release includes general improvements to the library and new metrics within the NLP domain.
https://devblog.pytorchlightning.ai/torchmetrics-v0-5-nlp-metrics-f4232467b0c5
Natural language processing is arguably one of the most exciting areas of machine learning, with models such as BERT, ROBERTA, GPT-3 etc., really pushing what automated text translation, recognition, and generation systems are capable of.
With the introduction of these models, many metrics have been proposed that measure how well these models perform. TorchMetrics v0.5 includes 4 such metrics: BERT score, BLEU, ROUGE and WER.
Detail changes
Added
- Added Text-related (NLP) metrics:
- Word Error Rate (WER) (#383)
- ROUGE (#399)
- BERT score (#424)
- BLUE score (#360)
- Added
MetricTrackerwrapper metric for keeping track of the same metric over multiple epochs (#238) - Added other metrics:
- Symmetric Mean Absolute Percentage error (SMAPE) (#375)
- Calibration error (#394)
- Permutation Invariant Training (PIT) (#384)
- Added support in
nDCGmetric for target with values larger than 1 (#349) - Added support for negative targets in
nDCGmetric (#378) - Added
Noneas reduction option inCosineSimilaritymetric (#400) - Allowed passing labels in (nsamples, nclasses) to
AveragePrecision(#386)
Changed
- Moved
psnrandssimfromfunctional.regression.*tofunctional.image.*(#382) - Moved
image_gradientfromfunctional.image_gradientstofunctional.image.gradients(#381) - Moved
R2Scorefromregression.r2scoretoregression.r2(#371) - Pearson metric now only store 6 statistics instead of all predictions and targets (#380)
- Use
torch.argmaxinstead oftorch.topkwhenk=1for better performance (#419) - Moved check for number of samples in R2 score to support single sample updating (#426)
Deprecated
- Rename
r2score>>r2_scoreandkldivergence>>kl_divergenceinfunctional(#371) - Moved
bleu_scorefromfunctional.nlptofunctional.text.bleu(#360)
Removed
- Removed restriction that
thresholdhas to be in (0,1) range to support logit input (#351, #401) - Removed restriction that
predscould not be bigger thannum_classesto support logit input (#357) - Removed module
regression.psnrandregression.ssim(#382): - Removed (#379):
- function
functional.mean_relative_error num_thresholdsargument inBinnedPrecisionRecallCurve
- function
Fixed
- Fixed bug where classification metrics with
average='macro'would lead to wrong result if a class was missing (#303) - Fixed
weighted,multi-classAUROC computation to allow for 0 observations of some class, as contribution to final AUROC is 0 (#376) - Fixed that
_forward_cacheand_computedattributes are also moved to the correct device if metric is moved (#413) - Fixed calculation in
IoUmetric when usingignore_indexargument (#328)
Contributors
@BeyondTheProof, @Borda, @CSautier, @discort, @edwardclem, @gagan3012, @hugoperrin, @karthikrangasai, @paul-grundmann, @quancs, @rajs96, @SkafteNicki, @vatch123
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 5 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Fixing DDP sync
[0.4.1] - 2021-07-05
Changed
- Extend typing (#330, #332, #333, #335, #314)
Fixed
- Fixed DDP by
is_synclogic toMetric(#339)
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 5 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Multimedia - audio & image quality
Overview
https://devblog.pytorchlightning.ai/torchmetrics-v0-4-introducing-multimedia-metrics-e6380a3ad354
Audio
The first highlight of v0.4.0 is a set of 3 new metrics for calculating for evaluating audio data: Scale-invariant signal-to-distortion ratio, Scale-invariant signal-to-noise ratio, and signal-to-noise ratio. All these metrics take a predicted audio tensor and a target tensor, both with the shape [...,time] and calculate the metric over the time axis.
Image
Version v0.4.0 also includes a completely new image package. Since its initial 0.2.0 release, Torchmetrics has had both PSNR and SSIM in its regression module, metrics that can be used to evaluate image quality. With the image module, we are adding three new metrics for evaluating the quality of generative models (such as GANS): Inception score (IS), Fréchet inception distance (FID) and kernel inception distance (KID).
More Functionality
In addition to the new audio and image package, we also want to highlight a couple of features:
* Addition of MeanAbsolutePercentageError (MAPE) metric to the regression package. Useful in regression settings where you want to focus on the relative instead of absolute error.
* Addition of KLDivergence metric to the classification package. Useful for measuring the distance between probability distributions like the ones outputted in variational auto-encoders.
* Addition of CosineSimilarity metric to the regression package. Useful for calculating the angle between two embedding vectors in domains such as metric learning.
* As requested by multiple users, Accuracy, Precision, Recall, FBeta, F1, StatScore, Hamming, ConfusionMatrix now directly support that predictions can be unnormalized, e.g. logits from your model. No need to call .softmax(dim=-1) anymore!
* All modular metrics now have both a sync and sync_context methods that allow the user full control over when metric states are synced. Note that we still automatically do this whenever calling the compute method.
* The is_differentiable property has been adopted by many more of our metrics!
Thanks
Big thanks to all community members for their contributions and feedback. A special thanks to @quancs for leading the development of the new audio package.
[0.4.0] - 2021-06-24
Added
- Added Cosine Similarity metric (#305)
- Added Specificity metric (#210)
- Added
add_metricsmethod toMetricCollectionfor adding additional metrics after initialization (#221) - Added pre-gather reduction in the case of
dist_reduce_fx="cat"to reduce communication cost (#217) - Added better error message for
AUROCwhennum_classesis not provided for multiclass input (#244) - Added support for unnormalized scores (e.g. logits) in
Accuracy,Precision,Recall,FBeta,F1,StatScore,Hamming,ConfusionMatrixmetrics (#200) - Added
MeanAbsolutePercentageError(MAPE)metric. (#248) - Added
squaredargument toMeanSquaredErrorfor computingRMSE(#249) - Added FID metric (#213)
- Added
is_differentiableproperty toConfusionMatrix,F1,FBeta,Hamming,Hinge,IOU,MatthewsCorrcoef,Precision,Recall,PrecisionRecallCurve,ROC,StatScores(#253) - Added audio metrics: SNR, SISDR, SISNR (#292)
- Added Inception Score metric to image module (#299)
- Added KID metric to image module (#301)
- Added
syncandsync_contextmethods for manually controlling when metric states are synced (#302) - Added
KLDivergencemetric (#247)
Changed
- Forward cache is reset when
resetmethod is called (#260) - Improved per-class metric handling for imbalanced datasets for
precision,recall,precision_recall,fbeta,f1,accuracy, andspecificity(#204) - Decorated
torch.jit.unusedtoMetricCollectionforward (#307) - Renamed
thresholdsargument to binned metrics for manually controlling the thresholds (#322)
Deprecated
- Deprecated
torchmetrics.functional.mean_relative_error(#248) - Deprecated
num_thresholdsargument inBinnedPrecisionRecallCurve(#322)
Removed
- Removed argument
is_multiclass(#319)
Fixed
- AUC can also support more dimensional inputs when all but one dimension are of size 1 (#242)
- Fixed
dtypeof modular metrics after reset has been called (#243) - Fixed calculation in
matthews_corrcoefto correctly match formula (#321)
Contributors
@AnselmC, @arvindmuralie77, @bhadreshpsavani, @Borda, @GiannisVagionakis, @hassiahk, @IgorHoholko, @johannespitz, @justusschock, @maximsch2, @pranjaldatta, @quancs, @simran2905, @SkafteNicki, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda almost 5 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor patch release
[0.3.2] - 2021-05-10
Added
- Added
is_differentiableproperty:- To
AUC,AUROC,CohenKappaandAveragePrecision(#178) - To
PearsonCorrCoef,SpearmanCorrcoef,R2ScoreandExplainedVariance(#225)
- To
Changed
MetricCollectionshould return metrics with prefix onitems(),keys()(#209)- Calling
computebeforeupdatewill now give an warning (#164)
Removed
- Removed
numpyas dependency (#212)
Fixed
- Fixed auc calculation and add tests (#197)
- Fixed loading persisted metric states using
load_state_dict()(#202) - Fixed
PSNRnot working withDDP(#214) - Fixed metric calculation with unequal batch sizes (#220)
- Fixed metric concatenation for list states for zero-dim input (#229)
- Fixed numerical instability in
AUROCmetric for large input (#230)
Contributors
@bhadreshpsavani, @hlin09, @maximsch2, @SkafteNicki, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda about 5 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Minor PL development patch
Cleaning remaining inconsistency and fix PL develop integration (#191, #192, #193, #194)
Scientific Software - Peer-reviewed
- Python
Published by Borda about 5 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Information retrieval
Information Retrieval
Information retrieval (IR) metrics are used to evaluate how well a system is retrieving information from a database or from a collection of documents. This is the case with search engines, where a query provided by the user is compared with many possible results, some of which are relevant and some are not.
When you query a search engine, you hope that results that could be useful are ranked higher on the results page. However, each query is usually compared with a different set of documents. For this reason, we had to implement a mechanism to allow users to easily compute the IR metrics in cases where each query is compared with a different number of possible candidates.
For this reason, IR metrics feature an additional argument called indexes that say to which query a prediction refers to. In the end, all query-document pairs are grouped by query index and then the final result is computed as the average of the metric over each group.
In total 6 new metrics have been added for doing information retrieval: - RetrievalMAP (Mean Average Precision) - RetrievalMRR (Mean Reciprocal Rank) - RetrievalPrecision (Precision for IR) - RetrievalRecall (Recall for IR) - RetrievalNormalizedDCG (Normalized Discounted Cumulative Gain) - RetrievalFallOut (Fall Out rate for IR)
Special thanks go to @lucadiliello, for implementing all IR.
Expanding and improving the collection
In addition to expanding our collection to the field of information retrieval, this release also includes new metrics for the classification domain: - BootStrapper metric that can wrap around any other metric in our collection for easy computation of confidence intervals - CohenKappa is a statistic that is used to measure inter-rater reliability for qualitative (categorical) items - MatthewsCorrcoef or phi coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications - Hinge loss is used for "maximum-margin" classification, most notably for support vector machines. - PearsonCorrcoef is a metric for measuring the linear correlation between two sets of data - SpearmanCorrcoef is a metric for measuring the rank correlation between two sets of data. It assesses how well the relationship between two variables can be described using a monotonic function.
Binned metrics
The current implementation of the AveragePrecision and PrecisionRecallCurve has the drawback that it saves all predictions and targets in memory to correctly calculate the metric value. These metrics now receive a binned version that calculates the value at fixed thresholds. This is less precise than original implementations but also much more memory efficient.
Special thanks go to @SkafteNicki, for letting all this happen.
https://devblog.pytorchlightning.ai/torchmetrics-v0-3-0-information-retrieval-metrics-and-more-c55265e9b94f
[0.3.0] - 2021-04-20
Added
- Added
BootStrapperto easily calculate confidence intervals for metrics (#101) - Added Binned metrics (#128)
- Added metrics for Information Retrieval:
- Added
RetrievalMAP(PL^5032) - Added
RetrievalMRR(#119) - Added
RetrievalPrecision(#139) - Added
RetrievalRecall(#146) - Added
RetrievalNormalizedDCG(#160) - Added
RetrievalFallOut(#161)
- Added
- Added other metrics:
- Added
CohenKappa(#69) - Added
MatthewsCorrcoef(#98) - Added
PearsonCorrcoef(#157) - Added
SpearmanCorrcoef(#158) - Added
Hinge(#120)
- Added
- Added
average='micro'as an option in AUROC for multilabel problems (#110) - Added multilabel support to
ROCmetric (#114) - Added testing for
halfprecision (#77, #135) - Added
AverageMeterfor ad-hoc averages of values (#138) - Added
prefixargument toMetricCollection(#70) - Added
__getitem__as metric arithmetic operation (#142) - Added property
is_differentiableto metrics and test for differentiability (#154) - Added support for
average,ignore_indexandmdmc_averageinAccuracymetric (#166) - Added
postfixarg toMetricCollection(#188)
Changed
- Changed
ExplainedVariancefrom storing all preds/targets to tracking 5 statistics (#68) - Changed behavior of
confusionmatrixfor multilabel data to better matchmultilabel_confusion_matrixfrom sklearn (#134) - Updated FBeta arguments (#111)
- Changed
resetmethod to usedetach.clone()instead ofdeepcopywhen resetting to default (#163) - Metrics passed as dict to
MetricCollectionwill now always be in deterministic order (#173) - Allowed
MetricCollectionpass metrics as arguments (#176)
Deprecated
- Rename argument
is_multiclass->multiclass(#162)
Removed
- Prune remaining deprecated (#92)
Fixed
- Fixed when
_stable_1d_sortto work whenn>=N(PL^6177) - Fixed
_computedattribute not being correctly reset (#147) - Fixed to Blau score (#165)
- Fixed backwards compatibility for logging with older version of pytorch-lightning (#182)
Contributors
@alanhdu, @arvindmuralie77, @bhadreshpsavani, @Borda, @ethanwharris, @lucadiliello, @maximsch2, @SkafteNicki, @thomasgaudelet, @victorjoos
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda about 5 years ago
TorchMetrics - Measuring Reproducibility in PyTorch - Initial release
What is Torchmetrics
TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics. It offers:
- A standardized interface to increase reproducability
- Reduces Boilerplate
- Distributed-training compatible
- Automatic accumulation over batches
- Automatic synchronization between multiple devices
You can use TorchMetrics in any PyTorch model, or with in PyTorch Lightning to enjoy additional features:
- Module metrics are automatically placed on the correct device.
- Native support for logging metrics in Lightning to reduce even more boilerplate.
Using functional metrics
Similar to torch.nn, most metrics have both a module-based and a functional version. The functional version implements the basic operations required for computing each metric. They are simple python functions that as input take torch.tensors and return the corresponding metric as a torch.tensor.
``` python import torch
import our library
import torchmetrics
simulate a classification problem
preds = torch.randn(10, 5).softmax(dim=-1) target = torch.randint(5, (10,))
acc = torchmetrics.functional.accuracy(preds, target) ```
Using Module metrics
Nearly all functional metrics have a corresponding module-based metric that calls it a functional counterpart underneath. The module-based metrics are characterized by having one or more internal metrics states (similar to the parameters of the PyTorch module) that allow them to offer additional functionalities:
- Accumulation of multiple batches
- Automatic synchronization between multiple devices
- Metric arithmetic
``` python import torch
import our library
import torchmetrics
initialize metric
metric = torchmetrics.Accuracy()
nbatches = 10 for i in range(nbatches): # simulate a classification problem preds = torch.randn(10, 5).softmax(dim=-1) target = torch.randint(5, (10,)) # metric on current batch acc = metric(preds, target) print(f"Accuracy on batch {i}: {acc}")
metric on all batches using custom accumulation
acc = metric.compute() print(f"Accuracy on all data: {acc}") ```
Built-in metrics
- Accuracy
- AveragePrecision
- AUC
- AUROC
- F1
- Hamming Distance
- ROC
- ExplainedVariance
- MeanSquaredError
- R2Score
- bleu_score
- embedding_similarity
And many more!
Contributors
@Borda, @SkafteNicki, @williamFalcon, @teddykoker, @justusschock, @tadejsv, @edenlightning, @ydcjeff, @ddrevicky, @ananyahjha93, @awaelchli, @rohitgr7, @akihironitta, @manipopopo, @Diuven, @arnaudgelas, @s-rog, @c00k1ez, @tgaddair, @elias-ramzi, @cuent, @jpcarzolio, @bryant1410, @shivdhar, @Sordie, @krzysztofwos, @abhik-99, @bernardomig, @peblair, @InCogNiTo124, @j-dsouza, @pranjaldatta, @ananthsub, @deng-cy, @abhinavg97, @tridao, @prampey, @abrahambotros, @ozen, @ShomyLiu, @yuntai, @pwwang
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Scientific Software - Peer-reviewed
- Python
Published by Borda about 5 years ago