clinical-case-reports

A collection of functions to assist with PubMed queries.

https://github.com/bleakley/clinical-case-reports

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

A collection of functions to assist with PubMed queries.

Basic Info
  • Host: GitHub
  • Owner: bleakley
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 5.92 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 9 years ago · Last pushed about 9 years ago
Metadata Files
Readme Citation

README.md

Clinical Case Reports

This repository contains a collection of functions in Python 2.7 to assist with our project using PubMed queries.

In order to generate a citation count distribution, use the function buildDistribution() (line 125 - 150) in the file citation_distrib.py.

Input: First, define the publication set from which you wish to generate the distribution. You must specify a list of MeSH terms (e.g., Cardiovascular Disease) that are required for a publication to be included in the set. For a publication to be counted it requires only one of the specified MeSH terms.

Optionally, you may specify a list of PMIDs that will be excluded from the set, as well as a range of publication dates (line 129). You may also specify whether you wish to search MeSH Terms or MeSH Major Topics.

If you wish to search MeSH Major Topics (line 16), then set the variable to TRUE. All other parameters can be set on line 129.

Output: The output is a map in which each key is the number of citations for a given publication in PubMed Central (PMC), and each value is the number of publications in the queried set with that citation total. For example, the distribution

{'0': 23, '1': 3, '4': 2}

would describe a set of 28 publications, 23 of which had no citations in PMC, 3 of which had 1 citation in PMC, and 2 of which had 4 citations in PMC.

These counts represent minimum citation counts. Notably, a publication could be cited by other publications which are not tracked in PMC.

Owner

  • Name: Brian Bleakley
  • Login: bleakley
  • Kind: user

Citation (citation_distrib.py)

from xml.etree.ElementTree import fromstring

import operator
import requests, json, io, xmljson, HTMLParser, time

working_dir='/home/bleakley/case-reports/'

# MeSH Term lists
ihd = ['Myocardial Ischemia', 'Myocardial Stunning']
cva = ['Cerebrovascular Disorders']
cardiomyopathy = ['Cardiomegaly','Cardiomyopathies','Endocarditis','Heart Failure','Ventricular Dysfunction','Ventricular Outflow Obstruction']
arrhythmia = ['Arrhythmias, Cardiac', 'Heart Arrest']
valvedisease = ['Heart Valve Diseases','Rheumatic Heart Disease']
chd = ['Cardiovascular Abnormalities','Heart Defects, Congenital']

majorTopics = True

# Get a list of PMIDs match a set of MeSH terms, excluded MeSH terms, and optional dates
def getPmids(meshTerms, maxCount, excluded={}, startDate=1900, endDate=3000):
    termString = ""
    for i, term in enumerate(meshTerms):
        if majorTopics:
            termString += '("' + term + '"[MeSH Major Topic])'
        else:
            termString += '("' + term + '"[MeSH Terms])'
        if i != len(meshTerms) - 1 :
            termString += ' OR '

    step = 100000
    trueCount = 99999999
    startCount = 0
    list = []
    while startCount < trueCount:
        query = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=("case reports"[Publication Type]) AND (' + termString + ') AND("' + str(startDate) +  '"[Date - Publication] : "' + str(endDate) + '"[Date - Publication]) AND ("english"[Language])&retmode=json&retmax=' + str(step) + '&retstart=' + str(startCount)
        print query
        d = json.loads(requests.get(query).content)["esearchresult"]
        trueCount = int(d["count"])
        list += d["idlist"]
        startCount += step

    newlist = []
    for x in list:
        if x not in excluded:
            newlist.append(x)
    return newlist[0:maxCount]

def getAuthorString(authors):
    authorString = ""
    for author in authors:
        authorString += author["name"] + ", "
    authorString = authorString[:-2]
    return authorString

refCounts = {}

def getTableRow(jsonData):
    row = ""
    row += getAuthorString(jsonData["authors"])
    row += '\t'
    row += jsonData["title"]
    row += '\t'
    row += jsonData["source"]
    row += '\t'

    doi = "None"
    pmid = "None"
    otherIds = jsonData["articleids"]
    for otherId in otherIds:
        if otherId["idtype"] == "doi":
            doi = otherId["value"]
        if otherId["idtype"] == "pubmed":
            pmid = otherId["value"]
    row += doi
    row += '\t'
    row += pmid
    row += '\t'
    year = jsonData["pubdate"].split()[0]
    row += year
    row += '\t'
    link = 'https://www.ncbi.nlm.nih.gov/pubmed/' + pmid
    if doi != "None":
        link = 'https://dx.doi.org/' + doi
    row += link
    refCount = str(jsonData["pmcrefcount"])
    if refCount == "":
        refCount = "0"
    if refCount in refCounts:
        refCounts[refCount] += 1
    else:
        refCounts[refCount] = 1
    row += '\t'
    row += refCount
    return row

def countRowRef(jsonData):
    try:
        refCount = str(jsonData["pmcrefcount"])
        if refCount == "":
            refCount = "0"
        if refCount in refCounts:
            refCounts[refCount] += 1
        else:
            refCounts[refCount] = 1
    except:
        print "row failed"


def getTableRows(pmids):
    query = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&retmode=json&rettype=abstract&id='
    for pmid in pmids:
        query += pmid + '+'
    query = query[:-1]
    try:
        result = json.loads(requests.get(query).content)["result"]
    except:
        print "request failed"
        print query
        return
    rows = ""
    for uid in result["uids"]:
        countRowRef(result[uid])
    return rows


def buildDistribution():
    excluded = {}
    file = io.open(working_dir + 'dump', 'w', encoding='utf8')
    print "Getting list of PMIDs..."
    pmids = getPmids(['Cardiovascular Diseases'], 1000000, excluded, 1950, 2015)
    print "Total found " + str(len(pmids))
    print "Downloading data..."
    step = 500
    lastPercent = 0
    last_time = time.time()
    for i in range(0,len(pmids),step):
        rows = getTableRows(pmids[i:min(i+step,len(pmids)-1)])
        newPercent = int(100.0*i/len(pmids))
        if newPercent != lastPercent:
            delta_time = time.time() - last_time
            last_time = time.time()
            eta = delta_time*(100 - newPercent)/60.0
            hours = int(eta/60.0)
            print str(newPercent) + "% " + str(hours) + " hours, " + str(int(eta-60*hours)) + " minutes remaining"
        lastPercent = newPercent
    file.close()
    counts = refCounts.keys()
    for c in counts:
        print c + "\t" + str(refCounts[c])
    print refCounts
    print "Done."

buildDistribution()

GitHub Events

Total
Last Year

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 17
  • Total Committers: 4
  • Avg Commits per committer: 4.25
  • Development Distribution Score (DDS): 0.588
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Brian Bleakley b****y@g****m 7
dliemmd d****d@g****m 6
HowardChoiUCLA c****5@g****m 3
vincekyi v****i@g****m 1