grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Science Score: 46.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, ieee.org -
✓Committers with academic emails
2 of 42 committers (4.8%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Basic Info
- Host: GitHub
- Owner: jacobgil
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://jacobgil.github.io/pytorch-gradcam-book
- Size: 134 MB
Statistics
- Stars: 11,641
- Watchers: 44
- Forks: 1,636
- Open Issues: 159
- Releases: 0
Topics
Metadata Files
README.md
Advanced AI explainability for PyTorch
pip install grad-cam
Documentation with advanced tutorials: https://jacobgil.github.io/pytorch-gradcam-book
This is a package with state of the art methods for Explainable AI for computer vision. This can be used for diagnosing model predictions, either in production or while developing models. The aim is also to serve as a benchmark of algorithms and metrics for research of new explainability methods.
⭐ Comprehensive collection of Pixel Attribution methods for Computer Vision.
⭐ Tested on many Common CNN Networks and Vision Transformers.
⭐ Advanced use cases: Works with Classification, Object Detection, Semantic Segmentation, Embedding-similarity and more.
⭐ Includes smoothing methods to make the CAMs look nice.
⭐ High performance: full support for batches of images in all methods.
⭐ Includes metrics for checking if you can trust the explanations, and tuning them for best performance.

| Method | What it does |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------|
| GradCAM | Weight the 2D activations by the average gradient |
| HiResCAM | Like GradCAM but element-wise multiply the activations with the gradients; provably guaranteed faithfulness for certain models |
| GradCAMElementWise | Like GradCAM but element-wise multiply the activations with the gradients then apply a ReLU operation before summing |
| GradCAM++ | Like GradCAM but uses second order gradients |
| XGradCAM | Like GradCAM but scale the gradients by the normalized activations |
| AblationCAM | Zero out activations and measure how the output drops (this repository includes a fast batched implementation) |
| ScoreCAM | Perbutate the image by the scaled activations and measure how the output drops |
| EigenCAM | Takes the first principle component of the 2D Activations (no class discrimination, but seems to give great results) |
| EigenGradCAM | Like EigenCAM but with class discrimination: First principle component of Activations*Grad. Looks like GradCAM, but cleaner |
| LayerCAM | Spatially weight the activations by positive gradients. Works better especially in lower layers |
| FullGrad | Computes the gradients of the biases from all over the network, and then sums them |
| Deep Feature Factorizations | Non Negative Matrix Factorization on the 2D activations |
| KPCA-CAM | Like EigenCAM but with Kernel PCA instead of PCA |
| FEM | A gradient free method that binarizes activations by an activation > mean + k * std rule. |
| ShapleyCAM | Weight the activations using the gradient and Hessian-vector product.|
| FinerCAM | Improves fine-grained classification by comparing similar classes, suppressing shared features and highlighting discriminative details. |
Visual Examples
| What makes the network think the image label is 'pug, pug-dog' | What makes the network think the image label is 'tabby, tabby cat' | Combining Grad-CAM with Guided Backpropagation for the 'pug, pug-dog' class |
| ---------------------------------------------------------------|--------------------|-----------------------------------------------------------------------------|
|
|
|
Object Detection and Semantic Segmentation
| Object Detection | Semantic Segmentation |
| -----------------|-----------------------|
|
|
|
| 3D Medical Semantic Segmentation |
| -------------------------- |
|
|
Explaining similarity to other images / embeddings

Deep Feature Factorization

CLIP
| Explaining the text prompt "a dog" | Explaining the text prompt "a cat" |
| -----------------------------------|------------------------------------|
|
|
Classification
Resnet50:
| Category | Image | GradCAM | AblationCAM | ScoreCAM |
| ---------|-------|----------|------------|------------|
| Dog | |
|
|
|
| Cat | |
|
|
|
Vision Transfomer (Deit Tiny):
| Category | Image | GradCAM | AblationCAM | ScoreCAM |
| ---------|-------|----------|------------|------------|
| Dog | |
|
|
|
| Cat | |
|
|
|
Swin Transfomer (Tiny window:7 patch:4 input-size:224):
| Category | Image | GradCAM | AblationCAM | ScoreCAM |
| ---------|-------|----------|------------|------------|
| Dog | |
|
|
|
| Cat | |
|
|
|
Metrics and Evaluation for XAI

Usage examples
```python from pytorchgradcam import GradCAM, HiResCAM, ScoreCAM, GradCAMPlusPlus, AblationCAM, XGradCAM, EigenCAM, FullGrad from pytorchgradcam.utils.modeltargets import ClassifierOutputTarget from pytorchgradcam.utils.image import showcamonimage from torchvision.models import resnet50
model = resnet50(pretrained=True) targetlayers = [model.layer4[-1]] inputtensor = # Create an input tensor image for your model..
Note: input_tensor can be a batch tensor with several images!
We have to specify the target we want to generate the CAM for.
targets = [ClassifierOutputTarget(281)]
Construct the CAM object once, and then re-use it on many images.
with GradCAM(model=model, targetlayers=targetlayers) as cam: # You can also pass augsmooth=True and eigensmooth=True, to apply smoothing. grayscalecam = cam(inputtensor=inputtensor, targets=targets) # In this example grayscalecam has only one image in the batch: grayscalecam = grayscalecam[0, :] visualization = showcamonimage(rgbimg, grayscalecam, usergb=True) # You can also get the model outputs without having to redo inference model_outputs = cam.outputs ```
cam.py has a more detailed usage example.
Choosing the layer(s) to extract activations from
You need to choose the target layer to compute the CAM for. Some common choices are: - FasterRCNN: model.backbone - Resnet18 and 50: model.layer4[-1] - VGG, densenet161 and mobilenet: model.features[-1] - mnasnet1_0: model.layers[-1] - ViT: model.blocks[-1].norm1 - SwinT: model.layers[-1].blocks[-1].norm1
If you pass a list with several layers, the CAM will be averaged accross them. This can be useful if you're not sure what layer will perform best.
Adapting for new architectures and tasks
Methods like GradCAM were designed for and were originally mostly applied on classification models, and specifically CNN classification models. However you can also use this package on new architectures like Vision Transformers, and on non classification tasks like Object Detection or Semantic Segmentation.
The be able to adapt to non standard cases, we have two concepts. - The reshape transform - how do we convert activations to represent spatial images ? - The model targets - What exactly should the explainability method try to explain ?
The reshape_transform argument
In a CNN the intermediate activations in the model are a mult-channel image that have the dimensions channel x rows x cols, and the various explainabiltiy methods work with these to produce a new image.
In case of another architecture, like the Vision Transformer, the shape might be different, like (rows x cols + 1) x channels, or something else. The reshape transform converts the activations back into a multi-channel image, for example by removing the class token in a vision transformer. For examples, check here
The model_target argument
The model target is just a callable that is able to get the model output, and filter it out for the specific scalar output we want to explain.
For classification tasks, the model target will typically be the output from a specific category.
The targets parameter passed to the CAM method can then use ClassifierOutputTarget:
python
targets = [ClassifierOutputTarget(281)]
However for more advanced cases, you might want a different behaviour. Check here for more examples.
Tutorials
Here you can find detailed examples of how to use this for various custom use cases like object detection:
These point to the new documentation jupter-book for fast rendering. The jupyter notebooks themselves can be found under the tutorials folder in the git repository.
Notebook tutorial: XAI Recipes for the HuggingFace 🤗 Image Classification Models
Notebook tutorial: Deep Feature Factorizations for better model explainability
Notebook tutorial: Class Activation Maps for Object Detection with Faster-RCNN
Notebook tutorial: Class Activation Maps for Semantic Segmentation
Notebook tutorial: Adapting pixel attribution methods for embedding outputs from models
Notebook tutorial: May the best explanation win. CAM Metrics and Tuning
Guided backpropagation
```python from pytorchgradcam import GuidedBackpropReLUModel from pytorchgradcam.utils.image import ( showcamonimage, deprocessimage, preprocessimage ) gbmodel = GuidedBackpropReLUModel(model=model, device=model.device()) gb = gbmodel(inputtensor, target_category=None)
cammask = cv2.merge([grayscalecam, grayscalecam, grayscalecam]) camgb = deprocessimage(cammask * gb) result = deprocessimage(gb) ```
Metrics and evaluating the explanations
```python from pytorchgradcam.utils.modeltargets import ClassifierOutputSoftmaxTarget from pytorchgradcam.metrics.cammult_image import CamMultImageConfidenceChange
Create the metric target, often the confidence drop in a score of some category
metrictarget = ClassifierOutputSoftmaxTarget(281) scores, batchvisualizations = CamMultImageConfidenceChange()(inputtensor, inversecams, targets, model, returnvisualization=True) visualization = deprocessimage(batch_visualizations[0, :])
State of the art metric: Remove and Debias
from pytorchgradcam.metrics.road import ROADMostRelevantFirst, ROADLeastRelevantFirst cammetric = ROADMostRelevantFirst(percentile=75) scores, perturbationvisualizations = cammetric(inputtensor, grayscalecams, targets, model, returnvisualization=True)
You can also average across different percentiles, and combine
(LeastRelevantFirst - MostRelevantFirst) / 2
from pytorchgradcam.metrics.road import ROADMostRelevantFirstAverage, ROADLeastRelevantFirstAverage, ROADCombined cammetric = ROADCombined(percentiles=[20, 40, 60, 80]) scores = cammetric(inputtensor, grayscalecams, targets, model) ```
Smoothing to get nice looking CAMs
To reduce noise in the CAMs, and make it fit better on the objects, two smoothing methods are supported:
aug_smooth=True
Test time augmentation: increases the run time by x6.
Applies a combination of horizontal flips, and mutiplying the image by [1.0, 1.1, 0.9].
This has the effect of better centering the CAM around the objects.
eigen_smooth=True
First principle component of activations*weights
This has the effect of removing a lot of noise.
|AblationCAM | aug smooth | eigen smooth | aug+eigen smooth|
|------------|------------|--------------|--------------------|
|
|
|
|
Running the example script:
Usage: python cam.py --image-path <path_to_image> --method <method> --output-dir <output_dir_path>
To use with a specific device, like cpu, cuda, cuda:0, mps or hpu:
python cam.py --image-path <path_to_image> --device cuda --output-dir <output_dir_path>
You can choose between:
GradCAM , HiResCAM, ScoreCAM, GradCAMPlusPlus, AblationCAM, XGradCAM , LayerCAM, FullGrad, EigenCAM, ShapleyCAM, and FinerCAM.
Some methods like ScoreCAM and AblationCAM require a large number of forward passes, and have a batched implementation.
You can control the batch size with
cam.batch_size =
Citation
If you use this for research, please cite. Here is an example BibTeX entry:
@misc{jacobgilpytorchcam,
title={PyTorch library for CAM methods},
author={Jacob Gildenblat and contributors},
year={2021},
publisher={GitHub},
howpublished={\url{https://github.com/jacobgil/pytorch-grad-cam}},
}
References
https://arxiv.org/abs/1610.02391
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
https://arxiv.org/abs/2011.08891
Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks
Rachel L. Draelos, Lawrence Carin
https://arxiv.org/abs/1710.11063
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Aditya Chattopadhyay, Anirban Sarkar, Prantik Howlader, Vineeth N Balasubramanian
https://arxiv.org/abs/1910.01279
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks
Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, Xia Hu
https://ieeexplore.ieee.org/abstract/document/9093360/
Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization.
Saurabh Desai and Harish G Ramaswamy. In WACV, pages 972–980, 2020
https://arxiv.org/abs/2008.02312
Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs
Ruigang Fu, Qingyong Hu, Xiaohu Dong, Yulan Guo, Yinghui Gao, Biao Li
https://arxiv.org/abs/2008.00299
Eigen-CAM: Class Activation Map using Principal Components
Mohammed Bany Muhammad, Mohammed Yeasin
http://mftp.mmcheng.net/Papers/21TIP_LayerCAM.pdf
LayerCAM: Exploring Hierarchical Class Activation Maps for Localization
Peng-Tao Jiang; Chang-Bin Zhang; Qibin Hou; Ming-Ming Cheng; Yunchao Wei
https://arxiv.org/abs/1905.00780
Full-Gradient Representation for Neural Network Visualization
Suraj Srinivas, Francois Fleuret
https://arxiv.org/abs/1806.10206
Deep Feature Factorization For Concept Discovery
Edo Collins, Radhakrishna Achanta, Sabine Süsstrunk
https://arxiv.org/abs/2410.00267
KPCA-CAM: Visual Explainability of Deep Computer Vision Models using Kernel PCA
Sachin Karmani, Thanushon Sivakaran, Gaurav Prasad, Mehmet Ali, Wenbo Yang, Sheyang Tang
https://hal.science/hal-02963298/document
Features Understanding in 3D CNNs for Actions Recognition in Video
Kazi Ahmed Asif Fuad, Pierre-Etienne Martin, Romain Giot, Romain
Bourqui, Jenny Benois-Pineau, Akka Zemmar
https://arxiv.org/abs/2501.06261
CAMs as Shapley Value-based Explainers
Huaiguang Cai
https://arxiv.org/pdf/2501.11309
Finer-CAM : Spotting the Difference Reveals Finer Details for Visual Explanation
Ziheng Zhang*, Jianyang Gu*, Arpita Chowdhury, Zheda Mai, David Carlyn,Tanya Berger-Wolf, Yu Su, Wei-Lun Chao
Owner
- Name: Jacob Gildenblat
- Login: jacobgil
- Kind: user
- Location: Israel
- Website: jacobgil.github.io
- Twitter: jacobgildenblat
- Repositories: 16
- Profile: https://github.com/jacobgil
Playing with tensors.
GitHub Events
Total
- Issues event: 27
- Watch event: 1,565
- Issue comment event: 71
- Push event: 7
- Pull request review event: 5
- Pull request event: 11
- Fork event: 116
Last Year
- Issues event: 27
- Watch event: 1,565
- Issue comment event: 71
- Push event: 7
- Pull request review event: 5
- Pull request event: 11
- Fork event: 116
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jacob Gildenblat | j****t@g****m | 169 |
| jdecid | j****d@g****m | 7 |
| Oliver | o****9@g****m | 5 |
| Ming Lu | l****6@g****m | 3 |
| LucaButera | 2****a | 2 |
| Rachel Draelos, MD, PhD | r****s@g****m | 2 |
| Ziheng Zhang | z****7@o****u | 2 |
| jackyjinjing | h****3@1****m | 2 |
| Fan Jingbo | f****1@g****m | 2 |
| Justas Birgiolas | J****B | 1 |
| Junjie | 6****z | 1 |
| Garima Jain | g****9@g****m | 1 |
| Daniel De León | 1****3 | 1 |
| Christophe Foyer | c****r@g****m | 1 |
| Chris Hammill | c****l@g****m | 1 |
| ChiLin Chiou | c****u@g****m | 1 |
| Aray Karjauv | k****y@g****m | 1 |
| Anthony Dave | 4****i | 1 |
| Ambesh Shekhar | 3****a | 1 |
| Akon-Fiber | 5****r | 1 |
| Akash A Desai | 6****8 | 1 |
| priyavrat-misra | c****m@p****e | 1 |
| dependabot[bot] | 4****] | 1 |
| cai2-huaiguang | c****3@m****n | 1 |
| Zhou T | 1****w | 1 |
| Zachary Mostowsky | 3****y | 1 |
| Yuta Fukasawa | y****8@g****m | 1 |
| Yonghye Kwon | d****e@g****m | 1 |
| Ujjwal Sharma | m****a@g****m | 1 |
| Shreyas | s****a@g****m | 1 |
| and 12 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 210
- Total pull requests: 48
- Average time to close issues: about 1 month
- Average time to close pull requests: 7 months
- Total issue authors: 194
- Total pull request authors: 40
- Average comments per issue: 2.26
- Average comments per pull request: 1.44
- Merged pull requests: 23
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 33
- Pull requests: 15
- Average time to close issues: about 2 months
- Average time to close pull requests: 7 days
- Issue authors: 32
- Pull request authors: 9
- Average comments per issue: 0.58
- Average comments per pull request: 1.67
- Merged pull requests: 9
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sammlapp (4)
- MoH-assan (3)
- MaxPolak97 (3)
- C-C-Y (2)
- marios1861 (2)
- RMobina (2)
- PietroManganelliConforti (2)
- lunaryan (2)
- hxngu (2)
- jooseuk (2)
- vggls (2)
- iremalti (2)
- Yanll2021 (1)
- johnwalking (1)
- manuelGue (1)
Pull Request Authors
- jackyjinjing (6)
- Link7808 (4)
- ValMystletainn (4)
- TekayaNidham (2)
- EdgeObserver (2)
- ShoufaChen (2)
- anthonyweidai (2)
- daniel-de-leon-user293 (2)
- TrungKhoaLe (2)
- lgov (2)
- Christophe-Foyer (2)
- kumar-selvakumaran (2)
- ashishpatel26 (2)
- sgsangodkar (2)
- hoel-bagard (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 35,467 last-month
- Total docker downloads: 148
-
Total dependent packages: 12
(may contain duplicates) -
Total dependent repositories: 80
(may contain duplicates) - Total versions: 38
- Total maintainers: 2
pypi.org: grad-cam
Many Class Activation Map methods implemented in Pytorch for classification, segmentation, object detection and more
- Homepage: https://github.com/jacobgil/pytorch-grad-cam
- Documentation: https://grad-cam.readthedocs.io/
- License: MIT License
-
Latest release: 1.5.5
published 11 months ago
Rankings
conda-forge.org: grad-cam
- Homepage: https://github.com/jacobgil/pytorch-grad-cam
- License: MIT
-
Latest release: 1.4.0
published over 3 years ago
Rankings
Dependencies
- Pillow *
- numpy *
- opencv-python *
- torch >=1.7.1
- torchvision >=0.8.2
- tqdm *
- ttach *
- actions/checkout v2 composite
- actions/setup-python v2 composite