sea-nn

Code for our NN Attribution work accepted at AISTATS '22

https://github.com/piyushi-0/sea-nn

Keywords

aistats attribution-methods deep-learning neural-network pytorch submodularity

Last synced: 6 months ago · JSON representation ·

Repository

Code for our NN Attribution work accepted at AISTATS '22

Basic Info

Host: GitHub
Owner: Piyushi-0
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Homepage: https://piyushi-0.github.io/SEA-NN/
Size: 651 KB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

aistats attribution-methods deep-learning neural-network pytorch submodularity

Created about 4 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

README.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**This file contains 3 sections**\n",
    "\n",
    "    1. Details on Deep Submodular Function\n",
    "    \n",
    "    2. Details on implementation of Greedy Cardinality Constrained submodular maximization\n",
    "    \n",
    "    3. Relevant portions of our original codes with comments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "%%latex\n",
    "\n",
    "# SECTION (1/3) Deep Submodular Function - DSF\n",
    "**Architecture**\n",
    "* Currently, our work trains a DSF for each image separately.\n",
    "* We learn DSFs on images of size 28 x 28. When training DSF for a given image, inputs to our DSF are bit-vectors of size 28 x 28 (got by thresholding heatmaps corresponding to these images). Following is the architecture:\n",
    "\n",
    "\n",
    "        def sqrt(input):\n",
    "            return torch.sqrt(input)\n",
    "\n",
    "        class DSF(nn.Module): # In PyTorch, classes for Neural Networks should sub-class nn.Module which is the base-class.\n",
    "            def __init__(self):\n",
    "                super(DSF, self).__init__()\n",
    "                self.fc1 = nn.Linear(28 * 28, 512)\n",
    "                self.fc2 = nn.Linear(512, 256)\n",
    "                self.fc3 = nn.Linear(256, 32)\n",
    "                self.fc4 = nn.Linear(32, 1)\n",
    "\n",
    "            def forward(self, x):\n",
    "                x = x.view(-1, 28 * 28)\n",
    "                x = self.fc1(x)\n",
    "                x = sqrt(x)\n",
    "                x = self.fc2(x)\n",
    "                x = sqrt(x)\n",
    "                x = self.fc3(x)\n",
    "                x = sqrt(x)\n",
    "                x = self.fc4(x)\n",
    "                return x\n",
    "\n",
    "\n",
    "\n",
    "**Training**\n",
    "* We use **Batch-Gradient descent** as we do not have large enough dataset for a mini-batch setup.\n",
    "* OPTIMIZER: We use learning rates determined by Adagrad. Adagrad is usually preferred when the data is sparse & we observed the same.\n",
    "* GRADIENT DESCENT: At each epoch, we backpropagate (using \"**loss.backward()**\") and update the weights using gradient descent (using \"**optimizer.step()**\").\n",
    "* PROJECTION: The projection step with non-negativity constraints, is just the operation max(0, w) on weights w. Hence, after each weight update, we call \"**clamp_zero**\" class:\n",
    "\n",
    "        class clamp_zero(object):\n",
    "            def __init__(self):\n",
    "                pass\n",
    "\n",
    "            def __call__(self, module):\n",
    "                if hasattr(module, 'weight'):\n",
    "                    w = module.weight.data\n",
    "                    w.copy_(torch.clamp(w, min=0))\n",
    "                if hasattr(module, 'bias'):\n",
    "                    w = module.bias.data\n",
    "                    w.copy_(torch.clamp(w, min=0))\n",
    "                    \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "%%latex\n",
    "\n",
    "# SECTION (2/3) MORE ON LOSS COMPUTATION\n",
    "* The [original DSF paper](https://arxiv.org/pdf/1701.08939.pdf) trains DSF with only discrete supervision.  \n",
    "\n",
    "* Our loss (equation (2) in our [paper](https://arxiv.org/pdf/2104.09073.pdf)) comes from supervision via real inputs (multiplied with $\\lambda_1$ in the paper) and supervision via bit-vectors (multiplied with $\\lambda_2$ in the paper).\n",
    "\n",
    "* In order to compute our loss with discrete supervision, we need to solve Cardinality Constrained Submodular Maximization problems at each training epoch.\n",
    "    * We need solutions to this problem for a list of cardinalities. However, due to the greedy nature of Greedy Cardinality Constrained Submodular Maximization algorithm, we need to solve the problem only for the maximum value in the list of cardinalities.\n",
    "\n",
    " **Overview of the Greedy Cardinality Constrained Submodular Maximization**:\n",
    " \n",
    "$\\qquad$Initially, $A = \\{\\}; f(A) = f(0)$.\n",
    "\n",
    "$\\qquad$1.  Let $\\bar{A} = A \\cup \\{argmax_{v\\in V\\setminus A}f(A\\cup\\{v\\})\\}$\n",
    "\n",
    "$\\qquad$2. if $f(\\bar{A})>f(A)$:\n",
    "\n",
    "$\\qquad$   $\\qquad A = \\bar{A}$\n",
    "   \n",
    "$\\qquad$   else:\n",
    "   \n",
    "$\\qquad$   $\\qquad \\textrm{return } A$\n",
    "   \n",
    "The above two steps are repeated atmost $K$ times.\n",
    "\n",
    "**Problem with a naive implementation**: This would demand $O(|V|K)$ calls to the DSF Neural Network at every training epoch.\n",
    "\n",
    "**Solution**: At every training epoch, we can just have $O(K)$ calls to the DSF by everytime inferring on a batch of $|V|$-sized inputs where the $i^{th}$ input in the batch represents $A\\cup\\{v_i\\}$.\n",
    "\n",
    "\n",
    "\n",
    "### Implementation Details with an example\n",
    "* Let $V$ be the universe with $|V| = 4$. For our work, $|V|$ is the resolution of images which was always a perfect square in the datasets we used.\n",
    "\n",
    "* Initially,  $A = \\{\\}; f(A) = f(0)$.\n",
    "\n",
    "* We use a matrix $\\mathbf{x}$ whose column $i$ represents $A\\cup \\{v_i\\}$ where $v_i\\in V$. Initially,  $\\mathbf{x}$ = \n",
    "\\begin{bmatrix}\n",
    "1 & 0 & 0 & 0\\\\\n",
    "0 & 1 & 0 & 0\\\\\n",
    "0 & 0 & 1 & 0\\\\\n",
    "0 & 0 & 0 & 1\n",
    "\\end{bmatrix}\n",
    "\n",
    "* We repeat the following atmost $K$ times\n",
    "\n",
    "$\\qquad$ We reshape $\\mathbf{x}$ to get $\\mathbf{inputs}$ because PyTorch demands inputs to be of the form *(batch_size, number_of_channels, height, width)*. For our work, batch_size is $|V|$, number_of_channels is 1, height=width=$\\sqrt{|V|}$.\n",
    "\n",
    "$\\qquad$ $\\mathbf{outputs} = f(\\mathbf{inputs})$  # *The $i^{th}$ entry in this vector corresponds to $f(A\\cup \\{v_i\\})$*\n",
    "\n",
    "$\\qquad$ $i = argmax_i ~ \\mathbf{outputs}[i]$\n",
    "\n",
    "$\\qquad$ if $\\mathbf{outputs}[i]>f(A):$\n",
    "\n",
    "$\\qquad$ $\\qquad f(A) = \\mathbf{outputs}[i]$ # *We include \\{$v_i$\\} in A and update f(A)*\n",
    "\n",
    "$\\qquad$ $\\qquad$ As $A$ has been updated to $A\\cup \\{v_i\\}$, we update the $i^{th}$ row of $\\mathbf{x}$ to all 1's. E.g. if $i = 1$ then \n",
    "$\\mathbf{x}$ = \\begin{bmatrix}\n",
    "1 & 0 & 0 & 0\\\\\n",
    "1 & 1 & 1 & 1\\\\\n",
    "0 & 0 & 1 & 0\\\\\n",
    "0 & 0 & 0 & 1\n",
    "\\end{bmatrix}\n",
    "    \n",
    "$\\qquad$ $\\qquad$ The 2nd column is not of interest to us anymore as $v_1$ has already been chosen. We can remove that column and reduce our batch-size by 1 but for simplicity, we did not. *NOTE that due to monotonicity of DSF, the 2nd column won't again be chosen via argmax as it contains lesser number of 1's.*\n",
    "\n",
    "### Implementation of the procedure described above\n",
    "\n",
    "    import torch\n",
    "    def greedy_cardinality_constrained_submodular_max(f, k, n, device):\n",
    "        \"\"\"\n",
    "        Returns solution of cardinality constrained submodular maximization\n",
    "            Parameters:\n",
    "                f (PyTorch model) : DSF\n",
    "                k (int)           : Cardinality we want to solve\n",
    "                n (int)           : Where input is of the size n x n\n",
    "                device (str)      : Device ('CPU' or 'CUDA')\n",
    "            Returns :\n",
    "                selected (array)  : Solution\n",
    "        \"\"\"\n",
    "        card_V = n*n\n",
    "        x = torch.eye(card_V)\n",
    "        fA = f(torch.zeros(n, n).view(1, 1, n, n).double().to(device)).item() #reshaping for PyTorch\n",
    "        for iteration in range(1, k+1):\n",
    "            inputs = x.t().view(card_V, 1, n, n) #reshaping for PyTorch\n",
    "            outputs = f(torch.Tensor(inputs).double().to(device))\n",
    "            i = outputs.argmax(dim = 0).item()\n",
    "            if outputs[i]>fA:\n",
    "                fA = outputs[i]\n",
    "                selected = x[:, i] #solution when cardinality constraint is j\n",
    "                x[i, :] = 1\n",
    "            else:\n",
    "                break\n",
    "        return selected"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE** : The recently launched [submodlib](https://arxiv.org/pdf/2202.10680.pdf) package, might have a more efficient solver."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SECTION (3/3) Relevant portions of our original codes with comments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "_____Greedy cardinality constrained submodular maximization\n",
    "\"\"\"\n",
    "\n",
    "import torch\n",
    "def c_sb_mx(f, Klist, sq_n_sb_px, device):\n",
    "    '''\n",
    "    Returns solution of cardinality constrained submodular maximization\n",
    "            Parameters:\n",
    "                    f (PyTorch model): DSF\n",
    "                    Klist (list)     : List of cardinalities\n",
    "                    sq_n_sb_px (int) : square-root of no. of sub-pixels (ie. resolution of the sub-sampled image)\n",
    "                    device (str)     : device('cpu' or 'cuda') on which to run\n",
    "            Returns:\n",
    "                    AList (dic): Dictionary with keys as cardinalities and values as solutions  \n",
    "    '''\n",
    "    k = int(np.array(Klist).max())#we only need to solve for max cardinality\n",
    "    card_V = sq_n_sb_px*sq_n_sb_px#cardinality of V\n",
    "    x = torch.eye(card_V)#card_V number of candidate A's each arranged in columns\n",
    "    fA = f(torch.zeros(sq_n_sb_px, sq_n_sb_px).view(1, 1, sq_n_sb_px, sq_n_sb_px).double().to(device)).item()#f(A), initially A is {}\n",
    "    AList = {}#dic with key k, value A*_k where A*_k is the optimal subset of cardinality k\n",
    "\n",
    "    for it in range(1, k+1):#here iteration j means solving for cardinality j\n",
    "        inputs = x.t().view(card_V, 1, sq_n_sb_px, sq_n_sb_px)#'x' reshaped as PyTorch input\n",
    "        outputs = f(torch.Tensor(inputs).double().to(device))\n",
    "        i = outputs.argmax(dim=0).item()\n",
    "        if outputs[i]>fA:\n",
    "            fA = outputs[i]\n",
    "            selected = x[:, i] #solution \n",
    "            x[i, :] = 1\n",
    "            if it in Klist: #Recall that in iteration j, we are solving for cardinality j.\n",
    "                AList[it] = selected.detach().clone()\n",
    "        else:\n",
    "            break\n",
    "    try:\n",
    "        for it in Klist:\n",
    "            if it not in AList: #e.g. we want solution for cardinality j but there is no further gain after i0:\n",
    "                if loss_1 is None:\n",
    "                    loss_1 = to_add\n",
    "                else:\n",
    "                    loss_1 = loss_1+to_add\n",
    "    \"\"\"\n",
    "    Computing loss_2\n",
    "    \"\"\"\n",
    "    ones_f = f(torch.ones(sq_n_sb_px*sq_n_sb_px).double().view(1, 1, sq_n_sb_px, sq_n_sb_px).to(device))#f_w(\\mathcal{H}^*)\n",
    "    tensor_sub_h = [torch.Tensor(s_h) for s_h in sub_h] #list of sub-sampled heatmaps\n",
    "    \n",
    "    # Feed all sub-sampled hearmaps to the DSF neural network\n",
    "    sub_h_f = f(torch.stack(tensor_sub_h).double().view(len(tensor_sub_h), 1, sq_n_sb_px, sq_n_sb_px).to(device))\n",
    "    for xs_h, _ in enumerate(tensor_sub_h):\n",
    "        to_also_add = ones_f-sub_h_f[xs_h] #computes f_w(\\mathcal{H}^*)-f_w(\\mathcal{H}_i)\n",
    "        if to_also_add>0:\n",
    "            if loss_2 is None:\n",
    "                loss_2 = to_also_add\n",
    "            else:\n",
    "                loss_2 = loss_2+to_also_add\n",
    "    loss = None\n",
    "    if loss_1 is not None:\n",
    "        loss = ld1*loss_1\n",
    "    if loss_2 is not None:\n",
    "        if loss is not None:\n",
    "            loss = loss+ld2*loss_2\n",
    "        else:\n",
    "            loss = ld2*loss_2\n",
    "    if loss is None:\n",
    "        break\n",
    "    loss_plt.append(loss.item())\n",
    "    f.zero_grad()\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "    f.apply(clipper)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "main_phd",
   "language": "python",
   "name": "main_phd"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

Owner

Name: Piyushi Manupriya
Login: Piyushi-0
Kind: user

Website: https://piyushi-0.github.io
Repositories: 1
Profile: https://github.com/Piyushi-0

Google PhD Fellow 2021, IIT Hyderabad

Citation (citation.bib)

@InProceedings{pmlr-v151-manupriya22a,
  title = 	 { Improving Attribution Methods by Learning Submodular Functions },
  author =       {Manupriya, Piyushi and Ram Menta, Tarun and Jagarlapudi, Sakethanath N. and Balasubramanian, Vineeth N.},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2173--2190},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/manupriya22a/manupriya22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/manupriya22a.html},
  abstract = 	 { This work explores the novel idea of learning a submodular scoring function to improve the specificity/selectivity of existing feature attribution methods. Submodular scores are natural for attribution as they are known to accurately model the principle of diminishing returns. A new formulation for learning a deep submodular set function that is consistent with the real-valued attribution maps obtained by existing attribution methods is proposed. The final attribution value of a feature is then defined as the marginal gain in the induced submodular score of the feature in the context of other highly attributed features, thus decreasing the attribution of redundant yet discriminatory features. Experiments on multiple datasets illustrate that the proposed attribution method achieves higher specificity along with good discriminative power. The implementation of our method is publicly available at https://github.com/Piyushi-0/SEA-NN. }
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science