covid_19_comp

Kaggle competition for covid19

https://github.com/janmichael88/covid_19_comp

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic links in README
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Unable to calculate vocabulary similarity
Last synced: 10 months ago · JSON representation ·

Repository

Kaggle competition for covid19

Basic Info
  • Host: GitHub
  • Owner: janmichael88
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 297 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 6 years ago · Last pushed about 6 years ago
Metadata Files
Citation

Owner

  • Name: Jan Michael Cayabyab Austria
  • Login: janmichael88
  • Kind: user
  • Location: Claremont, CA
  • Company: Genentech

Data Engineer at Genentech (but really I sold my soul to LeetCode and now I'm walking over glass to get into FAANG).

Citation (Citation_Rank_Matrix_Exponentiation.ipynb)

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Google's Page Rank be applied to the covid_19 papers?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import pickle\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "papers = pd.read_csv(\"covid19_papers_compiled.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#load in authors and citations as list\n",
    "with open(\"covid_authors_list.txt\", \"rb\") as fp:\n",
    "    b = pickle.load(fp)\n",
    "papers['Authors'] = b\n",
    "with open(\"covid_citations_list.txt\", \"rb\") as fp:\n",
    "    b = pickle.load(fp)\n",
    "papers['Citations'] = b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#filter out only english ones\n",
    "papers = papers[papers['Language'] == 'en']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "#drop papers without any titles\n",
    "papers = papers[papers['Titles'].notnull()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Exhaled Air Dispersion During Oxygen Delivery Via a Simple Oxygen Mask*'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "papers.iloc[635]['Titles']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create string matcher"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#pull all possible citations\n",
    "all_citations = []\n",
    "for i in range(0,len(papers)):\n",
    "    citations = papers['Citations'].iloc[i]\n",
    "    for foo in citations:\n",
    "        paper = foo[0]\n",
    "        year = foo[1]\n",
    "        paper_year = paper + \" \" + str(year)\n",
    "        all_citations.append(paper_year)\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1398382"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(all_citations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "917982"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(set(all_citations))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "480400"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "1398382 - 917982"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For a total of 23515 papers, they share a unique set of 917982 papers. Does that make sense?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "36"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "917982 // 25315"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This implies that each paper has an average of 36 citations to it?\n",
    "Asked Sarah, this somes about right.\n",
    "\n",
    "However, we only want to make the transition matrix with papers that are common to the list of all papers all possible citations. We need to make this square"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "#pull all possible citations\n",
    "all_titles = []\n",
    "for i in range(0,len(papers)):\n",
    "    citations = papers['Citations'].iloc[i]\n",
    "    for foo in citations:\n",
    "        paper = foo[0]\n",
    "        all_titles.append(paper)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1398382"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(all_titles)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6495"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(set(list(papers['Titles'])).intersection(set(all_titles)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Our Matrix will only be $6495 \\times 6495$.\n",
    "* Lets subset so that we only. have those papers!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "common_papers = set(list(papers['Titles'])).intersection(set(all_titles))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6597"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sum(papers['Titles'].isin(common_papers))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "papers = papers[papers['Titles'].isin(common_papers)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6597, 10)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "papers.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate Transition Matrix\n",
    "* where a row is a paper\n",
    "* columns are citations\n",
    "* i,j entry is where citation of a paper is in a paper\n",
    "* i could use sklearn countvectorizer, but i want to avoid using a dependency on this project\n",
    "* derive from first principles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_title(x):\n",
    "    all_citations = []\n",
    "    for foo in x:\n",
    "        paper = foo[0]\n",
    "        all_citations.append(paper)\n",
    "    return(all_citations)\n",
    "    \n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    }
   ],
   "source": [
    "papers['Title_Year'] = papers['Citations'].apply(create_title)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong'"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "papers['Title_Year'].iloc[9][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "citations_per_paper = list(papers['Title_Year'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6597, 6597)"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(citations_per_paper),len(papers['Titles'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy import sparse\n",
    "#test to create sparse matrix\n",
    "transition_matrix = np.zeros((len(citations_per_paper),len(papers)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       ...,\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.]])"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transition_matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "set_all_citations = list(papers['Titles'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "#only need to go across rows\n",
    "for i in range(0,len(citations_per_paper)-1):\n",
    "    #pull citations for i'th paper\n",
    "    ith_citations = citations_per_paper[i]\n",
    "    #search across ith_citations, to get indices\n",
    "    indices = []\n",
    "    for j in range(0, len(ith_citations)-1):\n",
    "        if ith_citations[j] in list(common_papers):\n",
    "            index = set_all_citations.index(ith_citations[j])\n",
    "            indices.append(index)\n",
    "        else:\n",
    "            pass\n",
    "    #update entries in transition matrix\n",
    "    for idx in indices:\n",
    "        transition_matrix[i,idx] = 1 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Notes and Transition Matrix and Citation Rank\n",
    "* Citation Rank is very similiar to how google uses its page rank\n",
    "* We begin with the tranistion matrix, $M$, which is $M\\times N$\n",
    "* An element $M_{ij}$ in this matrix is flagged with a 1\n",
    "* This indicates that there is a connection between the $i'th$ paper in the corpus and the $j'th$ paper in the citations\n",
    "* We can normalize this matrix across all citations in the row:\n",
    "    * $M_{ij} = \\frac{M_{ij}}{\\sum_{j}^{N} \\delta(M_{ij},1) = 1} $ \n",
    "    * $\\delta$ is the indicator function where $M_{ij} =1$, or just row sum noramlized to get proabilites, which are all uniform\n",
    "* We can call this normalized matrix $M$ as $M^*$\n",
    "* There is a nifty little derivation that says if you multiple $M^*$ $n$ times as $n \\to \\infty$, we would reach a steady state, where the probabilites would not differ much\n",
    "* We can factorize $M*$ into matrices D,M, and P:\n",
    "    * $D = P M^* P^{-1}$ \n",
    "    * this just means $M^*$ can be diagonlized\n",
    "    * for any number $n$ we can write:\n",
    "    * $P D^n P^{-1} = M^n$\n",
    "* Since D is diagonal, matrix mulplyting it n times is the same as exponentiating each term in the diagonal n times\n",
    "* The whole point was to multiply $M^*$ n times, this can be easiliy done my first diagnolizaing M, which is can be done by solving the eigen value problem for $M*$\n",
    "    * which is $M^{*T} v = \\lambda v$\n",
    "* M has a really nice property since its stochastic, where the leading eigenvalue has an upper bound of 1, we can take the eigenvector with the leading eigenvalue of 1\n",
    "* This would give us the relative ranking for all papers, just sort that eigenvector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<6597x6597 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 12197 stored elements in LInked List format>"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transition_matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Percent Sparsity is  0.97\n"
     ]
    }
   ],
   "source": [
    "print('Percent Sparsity is ',1-np.round((12197 / 6597**2)*100,2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.348163272"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.zeros((len(citations_per_paper),len(citations_per_paper))).nbytes*1e-9"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Stochastic Matrices\n",
    "* normalize so that the column sum is equal to 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "#go across columns\n",
    "for i in range(0,transition_matrix.shape[0]-1):\n",
    "    #get column sum\n",
    "    col_sum = np.sum(transition_matrix[:,i])\n",
    "    if col_sum > 0.0:\n",
    "        #normalize elements\n",
    "        for j in range(0,len(transition_matrix[:,i])-1):\n",
    "            transition_matrix[j,i] = transition_matrix[j,i] / col_sum\n",
    "    else:\n",
    "        pass\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "about 8 hours"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check that colum sums to 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAASwklEQVR4nO3df6zd9X3f8ecr5ke6JStQLpFnezNtHTUkUg26M0yRtjRkYJgUUymZjNTGRWjuOpjaLapGuj9IkzEl21IkpJTOEV5M1YbQ9AcWc8dcQpRlGj8ujeNgGOIWGNzawrc10EaorLD3/jgfTwdzf5zre+9xbj7Ph3R0vt/39/M938+He3md7/18v+c4VYUkqQ/vONMdkCSNj6EvSR0x9CWpI4a+JHXE0Jekjpx1pjuwkAsvvLA2b958prshSWvK448//mdVNTHXtu/r0N+8eTNTU1NnuhuStKYk+d/zbXN6R5I6YuhLUkcMfUnqyKKhn+SdSR5N8p0kR5L8aqt/OclzSQ61x9ZWT5I7kkwnOZzksqHX2pXkmfbYtXrDkiTNZZQLua8DH66q7yU5G/hWkj9s2365qr52SvtrgC3tcTlwJ3B5kguAW4FJoIDHk+yvqpdXYiCSpMUteqZfA99rq2e3x0Lf0rYDuLvt9zBwXpL1wNXAwao60YL+ILB9ed2XJC3FSHP6SdYlOQQcZxDcj7RNt7UpnNuTnNtqG4AXh3afabX56qcea3eSqSRTs7OzSxyOJGkhI4V+Vb1ZVVuBjcC2JB8APgX8BPD3gAuAf92aZ66XWKB+6rH2VNVkVU1OTMz52QJJ0mla0t07VfUK8A1ge1Uda1M4rwP/GdjWms0Am4Z22wgcXaAuSRqTRS/kJpkA/rqqXknyQ8BHgM8nWV9Vx5IEuA54ou2yH7g5yT0MLuS+2to9APy7JOe3dlcx+GtBkr5vbb7lv5yR4z7/uX+8Kq87yt0764F9SdYx+Mvg3qq6P8nX2xtCgEPAP2vtDwDXAtPAa8ANAFV1Islngcdau89U1YmVG4okaTGLhn5VHQYunaP+4XnaF3DTPNv2AnuX2EdJ0grxE7mS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1ZJR/RGXN+kH7F28kabk805ekjhj6ktQRQ1+SOmLoS1JHDH1J6siioZ/knUkeTfKdJEeS/GqrX5zkkSTPJPlqknNa/dy2Pt22bx56rU+1+tNJrl6tQUmS5jbKmf7rwIer6ieBrcD2JFcAnwdur6otwMvAja39jcDLVfXjwO2tHUkuAXYC7we2A7+eZN1KDkaStLBFQ78GvtdWz26PAj4MfK3V9wHXteUdbZ22/cokafV7qur1qnoOmAa2rcgoJEkjGWlOP8m6JIeA48BB4E+AV6rqjdZkBtjQljcALwK07a8CPzJcn2Of4WPtTjKVZGp2dnbpI5IkzWuk0K+qN6tqK7CRwdn5++Zq1p4zz7b56qcea09VTVbV5MTExCjdkySNaEl371TVK8A3gCuA85Kc/BqHjcDRtjwDbAJo238YODFcn2MfSdIYjHL3zkSS89ryDwEfAZ4CHgI+1prtAu5ry/vbOm3716uqWn1nu7vnYmAL8OhKDUSStLhRvnBtPbCv3WnzDuDeqro/yZPAPUn+LfBt4K7W/i7gN5NMMzjD3wlQVUeS3As8CbwB3FRVb67scCRJC1k09KvqMHDpHPVnmePum6r6K+Dj87zWbcBtS++mJGkl+IlcSeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4sGvpJNiV5KMlTSY4k+cVW/3SSP01yqD2uHdrnU0mmkzyd5Oqh+vZWm05yy+oMSZI0n7NGaPMG8Mmq+uMk7wYeT3Kwbbu9qv7jcOMklwA7gfcDfxv4oyTvbZu/CPwjYAZ4LMn+qnpyJQYiSVrcoqFfVceAY235L5M8BWxYYJcdwD1V9TrwXJJpYFvbNl1VzwIkuae1NfQlaUyWNKefZDNwKfBIK92c5HCSvUnOb7UNwItDu8202nz1U4+xO8lUkqnZ2dmldE+StIiRQz/Ju4DfBX6pqv4CuBP4MWArg78EvnCy6Ry71wL1txaq9lTVZFVNTkxMjNo9SdIIRpnTJ8nZDAL/t6rq9wCq6qWh7V8C7m+rM8Cmod03Akfb8nx1SdIYjHL3ToC7gKeq6teG6uuHmv008ERb3g/sTHJukouBLcCjwGPAliQXJzmHwcXe/SszDEnSKEY50/8g8LPAd5McarVfAa5PspXBFM3zwM8DVNWRJPcyuED7BnBTVb0JkORm4AFgHbC3qo6s4FgkSYsY5e6dbzH3fPyBBfa5DbhtjvqBhfaTJK0uP5ErSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1JFFQz/JpiQPJXkqyZEkv9jqFyQ5mOSZ9nx+qyfJHUmmkxxOctnQa+1q7Z9Jsmv1hiVJmssoZ/pvAJ+sqvcBVwA3JbkEuAV4sKq2AA+2dYBrgC3tsRu4EwZvEsCtwOXANuDWk28UkqTxWDT0q+pYVf1xW/5L4ClgA7AD2Nea7QOua8s7gLtr4GHgvCTrgauBg1V1oqpeBg4C21d0NJKkBS1pTj/JZuBS4BHgPVV1DAZvDMBFrdkG4MWh3WZabb76qcfYnWQqydTs7OxSuidJWsTIoZ/kXcDvAr9UVX+xUNM5arVA/a2Fqj1VNVlVkxMTE6N2T5I0gpFCP8nZDAL/t6rq91r5pTZtQ3s+3uozwKah3TcCRxeoS5LGZJS7dwLcBTxVVb82tGk/cPIOnF3AfUP1T7S7eK4AXm3TPw8AVyU5v13AvarVJEljctYIbT4I/Czw3SSHWu1XgM8B9ya5EXgB+HjbdgC4FpgGXgNuAKiqE0k+CzzW2n2mqk6syCgkSSNZNPSr6lvMPR8PcOUc7Qu4aZ7X2gvsXUoHJUkrx0/kSlJHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHVk0dBPsjfJ8SRPDNU+neRPkxxqj2uHtn0qyXSSp5NcPVTf3mrTSW5Z+aFIkhYzypn+l4Htc9Rvr6qt7XEAIMklwE7g/W2fX0+yLsk64IvANcAlwPWtrSRpjM5arEFVfTPJ5hFfbwdwT1W9DjyXZBrY1rZNV9WzAEnuaW2fXHKPJUmnbTlz+jcnOdymf85vtQ3Ai0NtZlptvvrbJNmdZCrJ1Ozs7DK6J0k61emG/p3AjwFbgWPAF1o9c7StBepvL1btqarJqpqcmJg4ze5Jkuay6PTOXKrqpZPLSb4E3N9WZ4BNQ003Akfb8nx1SdKYnNaZfpL1Q6s/DZy8s2c/sDPJuUkuBrYAjwKPAVuSXJzkHAYXe/effrclSadj0TP9JF8BPgRcmGQGuBX4UJKtDKZongd+HqCqjiS5l8EF2jeAm6rqzfY6NwMPAOuAvVV1ZMVHI0la0Ch371w/R/muBdrfBtw2R/0AcGBJvZMkrSg/kStJHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUkUVDP8neJMeTPDFUuyDJwSTPtOfzWz1J7kgyneRwksuG9tnV2j+TZNfqDEeStJBRzvS/DGw/pXYL8GBVbQEebOsA1wBb2mM3cCcM3iSAW4HLgW3ArSffKCRJ47No6FfVN4ETp5R3APva8j7guqH63TXwMHBekvXA1cDBqjpRVS8DB3n7G4kkaZWd7pz+e6rqGEB7vqjVNwAvDrWbabX56m+TZHeSqSRTs7Ozp9k9SdJcVvpCbuao1QL1txer9lTVZFVNTkxMrGjnJKl3pxv6L7VpG9rz8VafATYNtdsIHF2gLkkao9MN/f3AyTtwdgH3DdU/0e7iuQJ4tU3/PABcleT8dgH3qlaTJI3RWYs1SPIV4EPAhUlmGNyF8zng3iQ3Ai8AH2/NDwDXAtPAa8ANAFV1Islngcdau89U1akXhyVJq2zR0K+q6+fZdOUcbQu4aZ7X2QvsXVLvJEkryk/kSlJHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHVkWaGf5Pkk301yKMlUq12Q5GCSZ9rz+a2eJHckmU5yOMllKzEASdLoVuJM/6eqamtVTbb1W4AHq2oL8GBbB7gG2NIeu4E7V+DYkqQlWI3pnR3Avra8D7huqH53DTwMnJdk/SocX5I0j+WGfgH/LcnjSXa32nuq6hhAe76o1TcALw7tO9Nqb5Fkd5KpJFOzs7PL7J4kadhZy9z/g1V1NMlFwMEk/2uBtpmjVm8rVO0B9gBMTk6+bbsk6fQt60y/qo625+PA7wPbgJdOTtu05+Ot+QywaWj3jcDR5RxfkrQ0px36Sf5mknefXAauAp4A9gO7WrNdwH1teT/wiXYXzxXAqyengSRJ47Gc6Z33AL+f5OTr/HZV/dckjwH3JrkReAH4eGt/ALgWmAZeA25YxrElSafhtEO/qp4FfnKO+p8DV85RL+Cm0z2eJGn5/ESuJHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkfGHvpJtid5Osl0klvGfXxJ6tlYQz/JOuCLwDXAJcD1SS4ZZx8kqWfjPtPfBkxX1bNV9X+Ae4AdY+6DJHXrrDEfbwPw4tD6DHD5cIMku4HdbfV7SZ5exvEuBP5sGfuflnx+3Ed8izMy5jOot/GCY+5CPr+sMf/d+TaMO/QzR63eslK1B9izIgdLpqpqciVea63obcy9jRcccy9Wa8zjnt6ZATYNrW8Ejo65D5LUrXGH/mPAliQXJzkH2AnsH3MfJKlbY53eqao3ktwMPACsA/ZW1ZFVPOSKTBOtMb2NubfxgmPuxaqMOVW1eCtJ0g8EP5ErSR0x9CWpI2s+9Bf7Wock5yb5atv+SJLN4+/lyhphzP8qyZNJDid5MMm89+yuFaN+fUeSjyWpJGv+9r5Rxpzkn7Sf9ZEkvz3uPq60EX63/06Sh5J8u/1+X3sm+rlSkuxNcjzJE/NsT5I72n+Pw0kuW/ZBq2rNPhhcDP4T4EeBc4DvAJec0uafA7/RlncCXz3T/R7DmH8K+Btt+Rd6GHNr927gm8DDwOSZ7vcYfs5bgG8D57f1i850v8cw5j3AL7TlS4Dnz3S/lznmfwBcBjwxz/ZrgT9k8BmnK4BHlnvMtX6mP8rXOuwA9rXlrwFXJpnrQ2JrxaJjrqqHquq1tvowg89DrGWjfn3HZ4F/D/zVODu3SkYZ8z8FvlhVLwNU1fEx93GljTLmAv5WW/5h1vjnfKrqm8CJBZrsAO6ugYeB85KsX84x13roz/W1Dhvma1NVbwCvAj8ylt6tjlHGPOxGBmcKa9miY05yKbCpqu4fZ8dW0Sg/5/cC703yP5I8nGT72Hq3OkYZ86eBn0kyAxwA/sV4unbGLPX/90WN+2sYVtqiX+swYpu1ZOTxJPkZYBL4h6vao9W34JiTvAO4Hfi5cXVoDEb5OZ/FYIrnQwz+mvvvST5QVa+sct9Wyyhjvh74clV9IcnfB36zjfn/rn73zogVz6+1fqY/ytc6/P82Sc5i8CfhQn9Ofb8b6assknwE+DfAR6vq9TH1bbUsNuZ3Ax8AvpHkeQZzn/vX+MXcUX+376uqv66q54CnGbwJrFWjjPlG4F6AqvqfwDsZfBnbD6oV/+qatR76o3ytw35gV1v+GPD1aldI1qhFx9ymOv4Tg8Bf6/O8sMiYq+rVqrqwqjZX1WYG1zE+WlVTZ6a7K2KU3+0/YHDRniQXMpjueXasvVxZo4z5BeBKgCTvYxD6s2Pt5XjtBz7R7uK5Ani1qo4t5wXX9PROzfO1Dkk+A0xV1X7gLgZ/Ak4zOMPfeeZ6vHwjjvk/AO8Cfqdds36hqj56xjq9TCOO+QfKiGN+ALgqyZPAm8AvV9Wfn7leL8+IY/4k8KUk/5LBNMfPreWTuCRfYTA9d2G7TnErcDZAVf0Gg+sW1wLTwGvADcs+5hr+7yVJWqK1Pr0jSVoCQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR15P8B09xy/AfmjzMAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "plt.hist(transition_matrix.sum(axis=0))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.000000000000002"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.max(transition_matrix.sum(axis=0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Solve the eigen value problem"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {},
   "outputs": [],
   "source": [
    "w,v = np.linalg.eig(transition_matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find eigenvalues greater than 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.70710678, 0.28867513, 1.        , 0.35355339, 0.70710678,\n",
       "       0.28867513, 1.        , 1.        , 1.        , 1.        ,\n",
       "       1.        , 1.        , 1.        , 1.        , 0.5       ,\n",
       "       1.        , 1.        , 1.        , 0.33333333, 1.        ,\n",
       "       1.        , 1.        , 1.        , 0.5       , 1.        ,\n",
       "       1.        , 1.        , 1.        , 1.        , 1.        ,\n",
       "       1.        , 0.2       , 1.        , 1.        , 1.        ,\n",
       "       1.        , 1.        , 1.        , 1.        , 1.        ,\n",
       "       0.25      , 1.        , 1.        , 1.        , 1.        ])"
      ]
     },
     "execution_count": 173,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "w[w>0] "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The eigenvalues have are exactly correspond to papers that did not have any citations to any another papers in the transition matrix. Grab the eigen values that are greater than zero but less than one!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.70710678, 0.28867513, 0.35355339, 0.70710678, 0.28867513,\n",
       "       0.5       , 0.33333333, 0.5       , 0.2       , 0.25      ])"
      ]
     },
     "execution_count": 174,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "w[(w>0) & (w <1)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([ 291,  293,  297,  299,  301, 1819, 2457, 3787, 4714, 5508]),)"
      ]
     },
     "execution_count": 175,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.where((w>0) & (w <1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>score</th>\n",
       "      <th>paper</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6195</th>\n",
       "      <td>0.707107</td>\n",
       "      <td>Specific mutations in H5N1 mainly impact the m...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6531</th>\n",
       "      <td>0.500000</td>\n",
       "      <td>New Metrics for Evaluating Viral Respiratory P...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3847</th>\n",
       "      <td>0.500000</td>\n",
       "      <td>Moving H5N1 studies into the era of systems bi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>Airborne bioaerosols and their impact on human...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4404</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>Influence of age and body condition on astrovi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4403</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>Development of a novel detection system for mi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4402</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>Endogenous ribosomal frameshift signals operat...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4401</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>ALV-J strain SCAU-HN06 induces innate immune r...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4400</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>The reproductive number of COVID-19 is higher ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4399</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>Community Case Clusters of Middle East Respira...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         score                                              paper\n",
       "6195  0.707107  Specific mutations in H5N1 mainly impact the m...\n",
       "6531  0.500000  New Metrics for Evaluating Viral Respiratory P...\n",
       "3847  0.500000  Moving H5N1 studies into the era of systems bi...\n",
       "0     0.000000  Airborne bioaerosols and their impact on human...\n",
       "4404  0.000000  Influence of age and body condition on astrovi...\n",
       "4403  0.000000  Development of a novel detection system for mi...\n",
       "4402  0.000000  Endogenous ribosomal frameshift signals operat...\n",
       "4401  0.000000  ALV-J strain SCAU-HN06 induces innate immune r...\n",
       "4400  0.000000  The reproductive number of COVID-19 is higher ...\n",
       "4399  0.000000  Community Case Clusters of Middle East Respira..."
      ]
     },
     "execution_count": 176,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "article_rankings = pd.DataFrame({'score':v[:,291],'paper': set_all_citations})\n",
    "article_rankings.sort_values('score',ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From an initial observation it would like as if papers in the same eigenvector of a certain eigen value are similiar (they most likely cited each other I believe).  Lets write a script that goes through of the the selected eigen values, and pull the papers that correspond to the cited eigenvector and examin them!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 208,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "For the eigenvalue of 0.7071067811865476\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.7071067811865475 Specific mutations in H5N1 mainly impact the magnitude and velocity of the host response in mice\n",
      "0.5 New Metrics for Evaluating Viral Respiratory Pathogenesis\n",
      "0.5 Moving H5N1 studies into the era of systems biology\n",
      "0.0 Airborne bioaerosols and their impact on human health\n",
      "0.0 Influence of age and body condition on astrovirus infection of bats in Singapore: An evolutionary and epidemiological analysis\n",
      "0.0 Development of a novel detection system for microbes from bovine diarrhea by real-time PCR\n",
      "0.0 Endogenous ribosomal frameshift signals operate as mRNA destabilizing elements through at least two molecular pathways in yeast\n",
      "0.0 ALV-J strain SCAU-HN06 induces innate immune responses in chicken primary monocyte-derived macrophages\n",
      "0.0 The reproductive number of COVID-19 is higher compared to SARS coronavirus\n",
      "0.0 Community Case Clusters of Middle East Respiratory Syndrome Coronavirus in Hafr Al-Batin, Kingdom of Saudi Arabia: A Descriptive Genomic study\n",
      "\n",
      "\n",
      "For the eigenvalue of 0.28867513459481287\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.4543506764424155 Estimates of global research productivity in using nicotine replacement therapy for tobacco cessation: a bibliometric study\n",
      "0.2973589179514111 Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China\n",
      "0.28052997695772625 Clinical characteristics and intrauterine vertical transmission potential of COVID-19 infection in nine pregnant women: a retrospective review of medical records\n",
      "0.26785598399064386 Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study\n",
      "0.22330322651599863 Potential inhibitors for 2019-nCoV coronavirus M protease from clinically approved medicines\n",
      "0.17495810329469336 Middle East Respiratory Syndrome Coronavirus and the One Health concept\n",
      "0.17470362990101188 Are children less susceptible to COVID-19?\n",
      "0.17470362990101188 Are children less susceptible to COVID-19?\n",
      "0.1399664826357547 Middle East respiratory syndrome coronavirus: transmission and phylogenetic evolution\n",
      "0.1399664826357547 Middle East respiratory syndrome coronavirus in the last two years: Health care workers still at risk\n",
      "\n",
      "\n",
      "For the eigenvalue of 0.3535533905932738\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.0 Airborne bioaerosols and their impact on human health\n",
      "0.0 Comparison of three multiplex PCR assays for detection of respiratory viruses: Anyplex II RV16, AdvanSure RV, and Real-Q RV\n",
      "0.0 The reproductive number of COVID-19 is higher compared to SARS coronavirus\n",
      "0.0 Community Case Clusters of Middle East Respiratory Syndrome Coronavirus in Hafr Al-Batin, Kingdom of Saudi Arabia: A Descriptive Genomic study\n",
      "0.0 Shiga toxin-producing Escherichia coli (STEC) isolated from fecal samples of African dromedary camels\n",
      "0.0 CoVDB: a comprehensive database for comparative analysis of coronavirus genes and genomes\n",
      "0.0 Positive-sense RNA viruses reveal the complexity and dynamics of the cellular and viral epitranscriptomes during infection\n",
      "0.0 Clinical Progression and Cytokine Profiles of Middle East Respiratory Syndrome Coronavirus Infection\n",
      "0.0 Supporting Information\n",
      "0.0 Re-emergent Human Adenovirus Genome Type 7d Caused an Acute Respiratory Disease Outbreak in Southern China After a Twenty-one Year Absence\n",
      "\n",
      "\n",
      "For the eigenvalue of 0.7071067811865476\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.7071067811865475 Experimental reproduction of winter dysentery in lactating cows using BCV Ð comparison with BCV infection in milk-fed calves\n",
      "0.5 A new comprehensive method for detection of livestock-related pathogenic viruses using a target enrichment system\n",
      "0.5 Capture ELISA systems for the detection of bovine coronavirus-speci®c IgA and IgM antibodies in milk and serum\n",
      "0.0 Airborne bioaerosols and their impact on human health\n",
      "0.0 Clinical Progression and Cytokine Profiles of Middle East Respiratory Syndrome Coronavirus Infection\n",
      "0.0 Influence of age and body condition on astrovirus infection of bats in Singapore: An evolutionary and epidemiological analysis\n",
      "0.0 Development of a novel detection system for microbes from bovine diarrhea by real-time PCR\n",
      "0.0 Endogenous ribosomal frameshift signals operate as mRNA destabilizing elements through at least two molecular pathways in yeast\n",
      "0.0 ALV-J strain SCAU-HN06 induces innate immune responses in chicken primary monocyte-derived macrophages\n",
      "0.0 The reproductive number of COVID-19 is higher compared to SARS coronavirus\n",
      "\n",
      "\n",
      "For the eigenvalue of 0.28867513459481287\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.8295859279342762 Origins and Evolution of the Global RNA Virome\n",
      "0.29791820759197263 Homologous genetic recombination in the yellow head complex of nidoviruses infecting Penaeus monodon shrimp\n",
      "0.23948082940438994 A decade of RNA virus metagenomics is (not) enough\n",
      "0.23948082940438994 Identification of a novel nidovirus as a potential cause of large scale mortalities in the endangered Bellinger River snapping turtle (Myuchelys georgesi)\n",
      "0.1596538862695933 RNA transcription analysis and completion of the genome sequence of yellow head nidovirus\n",
      "0.1596538862695933 Real-time reverse transcription loop-mediated isothermal amplification for rapid detection of yellow head virus in shrimp\n",
      "0.13826432132237937 Identification of a novel nidovirus in an outbreak of fatal respiratory disease in ball pythons (Python regius)\n",
      "0.13826432132237937 Genetic diversity in the yellow head nidovirus complex\n",
      "0.13826432132237937 An Insect Nidovirus Emerging from a Primary Tropical Rainforest\n",
      "0.0 Positive-sense RNA viruses reveal the complexity and dynamics of the cellular and viral epitranscriptomes during infection\n",
      "\n",
      "\n",
      "For the eigenvalue of 0.5\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.6172133998483676 A metaanalysis of bat phylogenetics and positive selection based on genomes and transcriptomes from 18 species\n",
      "0.41147559989891175 Longitudinal survey of two serotine bat (Eptesicus serotinus) maternity colonies exposed to EBLV-1 (European Bat Lyssavirus type 1): Assessment of survival and serological status variations using capture-recapture models\n",
      "0.3086066999241838 Going to Bat(s) for Studies of Disease Tolerance\n",
      "0.3086066999241838 Lack of inflammatory gene expression in bats: a unique role for a transcription repressor\n",
      "0.3086066999241838 Tools to study pathogen-host interactions in bats\n",
      "0.1543033499620919 Ecological Factors Associated with European Bat Lyssavirus Seroprevalence in Spanish Bats\n",
      "0.1543033499620919 Immunology of Bats and Their Viruses: Challenges and Opportunities\n",
      "0.1543033499620919 Adaptive modeling of viral diseases in bats with a focus on rabies\n",
      "0.1543033499620919 Bats, emerging infectious diseases, and the rabies paradigm revisited\n",
      "0.1543033499620919 Ecology of Zoonotic Infectious Diseases in Bats: Current Knowledge and Future Directions\n",
      "\n",
      "\n",
      "For the eigenvalue of 0.3333333333333333\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.5773502691896258 Acute disseminated encephalomyelitis\n",
      "0.5773502691896258 Acute disseminated encephalomyelitis\n",
      "0.5773502691896258 Disseminated encephalomyelitis in children\n",
      "0.0 Caspase cleavage of viral proteins, another way for viruses to make the best of apoptosis\n",
      "0.0 EuPathDB: the eukaryotic pathogen genomics database resource\n",
      "0.0 Interleukin-18 expression and the response to treatment in patients with psoriasis\n",
      "0.0 Influence of age and body condition on astrovirus infection of bats in Singapore: An evolutionary and epidemiological analysis\n",
      "0.0 Development of a novel detection system for microbes from bovine diarrhea by real-time PCR\n",
      "0.0 Endogenous ribosomal frameshift signals operate as mRNA destabilizing elements through at least two molecular pathways in yeast\n",
      "0.0 ALV-J strain SCAU-HN06 induces innate immune responses in chicken primary monocyte-derived macrophages\n",
      "\n",
      "\n",
      "For the eigenvalue of 0.5\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.8528028654224417 The human coronavirus HCoV-229E S-protein structure and receptor binding\n",
      "0.42640143271122083 Human Coronaviruses: A Review of Virus-Host Interactions\n",
      "0.21320071635561041 Enterovirus infections of the central nervous system\n",
      "0.21320071635561041 Human coronaviruses: Viral and cellular factors involved in neuroinvasiveness and neuropathogenesis\n",
      "0.0 Interleukin-18 expression and the response to treatment in patients with psoriasis\n",
      "0.0 Influence of age and body condition on astrovirus infection of bats in Singapore: An evolutionary and epidemiological analysis\n",
      "0.0 Development of a novel detection system for microbes from bovine diarrhea by real-time PCR\n",
      "0.0 Endogenous ribosomal frameshift signals operate as mRNA destabilizing elements through at least two molecular pathways in yeast\n",
      "0.0 ALV-J strain SCAU-HN06 induces innate immune responses in chicken primary monocyte-derived macrophages\n",
      "0.0 The reproductive number of COVID-19 is higher compared to SARS coronavirus\n",
      "\n",
      "\n",
      "For the eigenvalue of 0.2\n",
      "==================\n",
      "The scores and papers are: \n",
      "0.9657950439579434 Entrapment of H1N1 Influenza Virus Derived Conserved Peptides in PLGA Nanoparticles Enhances T Cell Response and Vaccine Efficacy in Pigs\n",
      "0.10721211038432217 The viral innate immune antagonism and an alternative vaccine design for PRRS virus\n",
      "0.10721211038432217 Biodegradable Nanoparticle-Entrapped Vaccine Induces Cross-Protective Immune Response against a Virulent Heterologous Respiratory Viral Infection in Pigs\n",
      "0.08063059541300262 PLGA nanoparticle entrapped killed porcine reproductive and respiratory syndrome virus vaccine helps in viral clearance in pigs\n",
      "0.07977182024516763 Viral Metagenomics Revealed Sendai Virus and Coronavirus Infection of Malayan Pangolins (Manis javanica)\n",
      "0.07531429241873872 Elevated dietary zinc oxide levels do not have a substantial effect on porcine reproductive and respiratory syndrome virus (PPRSV) vaccination and infection\n",
      "0.058479332936903 Evaluation of immune responses to porcine reproductive and respiratory syndrome virus in pigs during early stage of infection under farm conditions\n",
      "0.05101128046800234 Characterization and Pathogenicity of the Porcine Deltacoronavirus Isolated in Southwest China\n",
      "0.04873277744741916 Impact of genotype 1 and 2 of porcine reproductive and respiratory syndrome viruses on interferon-α responses by plasmacytoid dendritic cells\n",
      "0.04873277744741916 The porcine innate immune system: An update\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "eig_vals_needed = np.where((w>0) & (w <1))\n",
    "for i in range(0,len(eig_vals_needed[0])-1):\n",
    "    print('For the eigenvalue of', w[eig_vals_needed[0][i]])\n",
    "    print('==================')\n",
    "    article_rankings = pd.DataFrame({'score':v[:,eig_vals_needed[0][i]],'paper': set_all_citations})\n",
    "    #grab only the first ten\n",
    "    article_rankings = article_rankings.sort_values('score',ascending=False).head(10)\n",
    "    print('The scores and papers are: ')\n",
    "    for j in range(0,len(article_rankings)):\n",
    "        score = article_rankings['score'].iloc[j]\n",
    "        paper = article_rankings['paper'].iloc[j]\n",
    "        print(score,paper)\n",
    "    print('\\n')\n",
    "        \n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Notes\n",
    "* For some of the groupings of papers, you would expect them to share citations among one another\n",
    "* I had to drop a lot of papers because I didn't do string matching for all paper titles among citations\n",
    "* I'm only matching papers in the corpus to citations that are papers in the corpus\n",
    "* I could have had the full 29000~ by 29000~ matrix\n",
    "* But reduced to 6700 by 6700 ish"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 182,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 291  293  297  299  301 1819 2457 3787 4714 5508]\n"
     ]
    }
   ],
   "source": [
    "for foo in eig_vals_needed:\n",
    "    print(foo)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 185,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "291"
      ]
     },
     "execution_count": 185,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eig_vals_needed[0][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 201,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10"
      ]
     },
     "execution_count": 201,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(article_rankings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

GitHub Events

Total
Last Year