clinical-case-reports
A collection of functions to assist with PubMed queries.
Science Score: 18.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Repository
A collection of functions to assist with PubMed queries.
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Clinical Case Reports
This repository contains a collection of functions in Python 2.7 to assist with our project using PubMed queries.
In order to generate a citation count distribution, use the function buildDistribution() (line 125 - 150) in the file citation_distrib.py.
Input: First, define the publication set from which you wish to generate the distribution. You must specify a list of MeSH terms (e.g., Cardiovascular Disease) that are required for a publication to be included in the set. For a publication to be counted it requires only one of the specified MeSH terms.
Optionally, you may specify a list of PMIDs that will be excluded from the set, as well as a range of publication dates (line 129). You may also specify whether you wish to search MeSH Terms or MeSH Major Topics.
If you wish to search MeSH Major Topics (line 16), then set the variable to TRUE. All other parameters can be set on line 129.
Output: The output is a map in which each key is the number of citations for a given publication in PubMed Central (PMC), and each value is the number of publications in the queried set with that citation total. For example, the distribution
{'0': 23, '1': 3, '4': 2}
would describe a set of 28 publications, 23 of which had no citations in PMC, 3 of which had 1 citation in PMC, and 2 of which had 4 citations in PMC.
These counts represent minimum citation counts. Notably, a publication could be cited by other publications which are not tracked in PMC.
Owner
- Name: Brian Bleakley
- Login: bleakley
- Kind: user
- Repositories: 5
- Profile: https://github.com/bleakley
Citation (citation_distrib.py)
from xml.etree.ElementTree import fromstring
import operator
import requests, json, io, xmljson, HTMLParser, time
working_dir='/home/bleakley/case-reports/'
# MeSH Term lists
ihd = ['Myocardial Ischemia', 'Myocardial Stunning']
cva = ['Cerebrovascular Disorders']
cardiomyopathy = ['Cardiomegaly','Cardiomyopathies','Endocarditis','Heart Failure','Ventricular Dysfunction','Ventricular Outflow Obstruction']
arrhythmia = ['Arrhythmias, Cardiac', 'Heart Arrest']
valvedisease = ['Heart Valve Diseases','Rheumatic Heart Disease']
chd = ['Cardiovascular Abnormalities','Heart Defects, Congenital']
majorTopics = True
# Get a list of PMIDs match a set of MeSH terms, excluded MeSH terms, and optional dates
def getPmids(meshTerms, maxCount, excluded={}, startDate=1900, endDate=3000):
termString = ""
for i, term in enumerate(meshTerms):
if majorTopics:
termString += '("' + term + '"[MeSH Major Topic])'
else:
termString += '("' + term + '"[MeSH Terms])'
if i != len(meshTerms) - 1 :
termString += ' OR '
step = 100000
trueCount = 99999999
startCount = 0
list = []
while startCount < trueCount:
query = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=("case reports"[Publication Type]) AND (' + termString + ') AND("' + str(startDate) + '"[Date - Publication] : "' + str(endDate) + '"[Date - Publication]) AND ("english"[Language])&retmode=json&retmax=' + str(step) + '&retstart=' + str(startCount)
print query
d = json.loads(requests.get(query).content)["esearchresult"]
trueCount = int(d["count"])
list += d["idlist"]
startCount += step
newlist = []
for x in list:
if x not in excluded:
newlist.append(x)
return newlist[0:maxCount]
def getAuthorString(authors):
authorString = ""
for author in authors:
authorString += author["name"] + ", "
authorString = authorString[:-2]
return authorString
refCounts = {}
def getTableRow(jsonData):
row = ""
row += getAuthorString(jsonData["authors"])
row += '\t'
row += jsonData["title"]
row += '\t'
row += jsonData["source"]
row += '\t'
doi = "None"
pmid = "None"
otherIds = jsonData["articleids"]
for otherId in otherIds:
if otherId["idtype"] == "doi":
doi = otherId["value"]
if otherId["idtype"] == "pubmed":
pmid = otherId["value"]
row += doi
row += '\t'
row += pmid
row += '\t'
year = jsonData["pubdate"].split()[0]
row += year
row += '\t'
link = 'https://www.ncbi.nlm.nih.gov/pubmed/' + pmid
if doi != "None":
link = 'https://dx.doi.org/' + doi
row += link
refCount = str(jsonData["pmcrefcount"])
if refCount == "":
refCount = "0"
if refCount in refCounts:
refCounts[refCount] += 1
else:
refCounts[refCount] = 1
row += '\t'
row += refCount
return row
def countRowRef(jsonData):
try:
refCount = str(jsonData["pmcrefcount"])
if refCount == "":
refCount = "0"
if refCount in refCounts:
refCounts[refCount] += 1
else:
refCounts[refCount] = 1
except:
print "row failed"
def getTableRows(pmids):
query = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&retmode=json&rettype=abstract&id='
for pmid in pmids:
query += pmid + '+'
query = query[:-1]
try:
result = json.loads(requests.get(query).content)["result"]
except:
print "request failed"
print query
return
rows = ""
for uid in result["uids"]:
countRowRef(result[uid])
return rows
def buildDistribution():
excluded = {}
file = io.open(working_dir + 'dump', 'w', encoding='utf8')
print "Getting list of PMIDs..."
pmids = getPmids(['Cardiovascular Diseases'], 1000000, excluded, 1950, 2015)
print "Total found " + str(len(pmids))
print "Downloading data..."
step = 500
lastPercent = 0
last_time = time.time()
for i in range(0,len(pmids),step):
rows = getTableRows(pmids[i:min(i+step,len(pmids)-1)])
newPercent = int(100.0*i/len(pmids))
if newPercent != lastPercent:
delta_time = time.time() - last_time
last_time = time.time()
eta = delta_time*(100 - newPercent)/60.0
hours = int(eta/60.0)
print str(newPercent) + "% " + str(hours) + " hours, " + str(int(eta-60*hours)) + " minutes remaining"
lastPercent = newPercent
file.close()
counts = refCounts.keys()
for c in counts:
print c + "\t" + str(refCounts[c])
print refCounts
print "Done."
buildDistribution()
GitHub Events
Total
Last Year
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Brian Bleakley | b****y@g****m | 7 |
| dliemmd | d****d@g****m | 6 |
| HowardChoiUCLA | c****5@g****m | 3 |
| vincekyi | v****i@g****m | 1 |