predictivebreakthroughs
Code for Prediction of Transformative Breakthroughs in Biomedical Research manuscript
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Keywords
Repository
Code for Prediction of Transformative Breakthroughs in Biomedical Research manuscript
Basic Info
- Host: GitHub
- Owner: NIHOPA
- Language: Java
- Default Branch: main
- Homepage: https://dpcpsi.nih.gov/opa
- Size: 188 KB
Statistics
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Predictive Breakthroughs
This repository contains code to reproduce the results of the Prediction of transformative breakthroughs in biomedical research manuscript. This includes the co-citation network calculation for all of PubMed, the R-MCL clustering of the co-cocitation networks, the trajectory generation, and the logistic regression and post regression processing for identification of breakthrough signals.
In addition, it includes code that can be used independently for co-citation network (CCN) computation and MCL/R-MCL (Regularized Markov Clustering Algorithm) clustering. The CCN code has been tested on the entire nih open citation collection with more than 25 million cited publications and 750 million citations. The R-MCL code algorithm has been tested on networks with more than 25 million nodes and 1 billion edges.
Setup Environment
- Checkout code from this GitHub repository
- Install Java 21 or later
- Install mongodb 7.x or later
- Restore citations database snapshot
- Download data from https://figshare.com/s/ca05c0539e77fb2a1dc1
- mongoimport cited2019snapshot-f39ea.json --db cocitations --collection cited
- the first time the code runs it will create an index on citesPmid which will take some time
- Compile code
- from the command line
./gradlew installDist
- from the command line
Run CCN calculation
- If using mongodb that is not local set
mongoURIjava system property to desired url (defaults tomongodb://localhost:27017) ./ccnCmd/build/install/ccnCmd/bin/ccn --output /path/to/output/ccn.tsv --threads threads --threshold thresholdToWrite --cacheSize cacheSize --maxYear maxYear- set threads to the number of threads that you wish to use for the calculation
- set threshold to the minimum edge weight to be written to the csv (0.35 used by manuscript and is recommended)
- cacheSize determines the amount of CCN that are cached in memory. Set cacheSize based on the amount of RAM available
- To cache all publications and achieve maximum performance for the most recent year set to 20000000
- requires at least 175GB of RAM
- Specific
JAVA_OPTS="-Xmx175g"at the start of the command line, set the environment variable or edit the bash script
- Good performance can be achieved at cache size of 10000000
- Specific
JAVA_OPTS="-Xmx85g"after java in the command line, set the environment variable or edit the bash script
- Specific
- In other cases adjust cache size based on memory available
- To cache all publications and achieve maximum performance for the most recent year set to 20000000
- maxYear filters the citation network to the year in question. Stepping the max year through a range over a series of runs allows the analysis of the CCN network over time
Run R-MCL calculation
./mclCmd/build/install/mclCmd/bin/mcl run --input /path/to/output/ccn.tsv --output /path/to/output/rmcl --inflation 1.2 --maxIterations 500- The recommend inflation is 1.2 to match the results of the manuscript. Inflation the amount of granularity in clustering with higher inflation values causing more broken up clusters
- The recommend max iterations is 500 to match the results of the manuscript. This parameter determines the maximum iterations of the RMCL algorithm are run before terminating
- If running latest year set JVM size to 350g, earlier years take far less
- Specific
JAVA_OPTS="-Xmx350g"at the start of the command line, set the environment variable or edit the bash script
- Specific
Automation
- The runallcnnandrmcl.sh script is provided as a helper to run the CCN and RMCL calculations with the values in the manuscript for a year range.
Trajectory generation
- The trajectorycoder subfolder contains the R scripts needed to generate trajectory information (and the clusternodeattr summary). config.ini contains the necessary path information, after which run scripts in numerical order (1, 2, etc)
Breakthrough candidate generation
- Logistic regression and breakthrough candidate generation scripts are in the breakthroughcandidatescodepy subfolder. Folder paths to the breakthrough data, trajectory output and regression/candidate output will need to be provided in the code, after which run logisticregression.py to train a regressor and predict candidates for the SetA/B period, and breakthrough_candidates.py to filter them for breakthrough signals.
License
Use of this software requires a license. No rights are granted beyond the view access permitted by the GitHub Terms and Conditions. Please contact the authors for a license agreement if you wish to use this software.
Owner
- Name: NIH Office of Portfolio Analysis, DPCPSI/OD
- Login: NIHOPA
- Kind: organization
- Email: OPAInformation@od.nih.gov
- Location: 2 Center Dr., Bethesda, MD
- Website: http://dpcpsi.nih.gov/opa
- Repositories: 2
- Profile: https://github.com/NIHOPA
Citation (citationNetworkCompute/build.gradle.kts)
plugins {
application
alias(libs.plugins.shadow)
}
dependencies {
api(project(":spreadsheet"))
api(libs.guava)
api(libs.koloboke.api)
api(libs.mongodb.driver.sync)
implementation(libs.koloboke.impl)
}
application {
mainClass.set("gov.nih.opa.ccn.driver.CocitationVectorComputeDriverAllVsCCN")
}
GitHub Events
Total
- Watch event: 1
- Push event: 2
- Public event: 1
- Fork event: 1
Last Year
- Watch event: 1
- Push event: 2
- Public event: 1
- Fork event: 1
Dependencies
- actions/checkout v4 composite
- actions/setup-java v4 composite
- gradle/actions/dependency-submission ec92e829475ac0c2315ea8f9eced72db85bb337a composite
- gradle/actions/setup-gradle ec92e829475ac0c2315ea8f9eced72db85bb337a composite