Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: transbioZI
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Size: 90.1 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

PRScope

Contents

Outline

PRScope automatically generates all the polygenic scores (PGS) associated with selected ontology IDs (e.g., Experimental Factor Ontology, EFO) for a given genotype dataset.

Inputs

  • Ontology IDs defining the PGS of interest
  • Genotype data to calculate the PGS

Output

  • A dataset (multi-PGS matrix) containing:
  • Subject IDs from the genotype data
  • Values of all selected PGS

The setup is optimized for a minimal-effort "vanilla use case" but supports advanced configurations.


Repo Contents

  • config: config files for parameter specification.
  • input: genotype-, reference files and PRS trait specification.
  • main: contains pipeline and scripts for PRS calculation, biotype identification and tools.
  • output: intermediate and final results are saved here.

System Requirements

Hardware Requirements

PRScope requires only a standard computer with enough RAM (4GB) to support the in-memory operations, but for best performance, we suggest a computer with higher specifications:

RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core

The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 cores each 3.3 GHz) and internet of speed 100 Mbps.

Software Requirements

PRScope requires the following:

  • conda
  • Python
  • R
  • Snakemake
  • PLINK[1]
  • PRSice[2] or LDpred-2[3]
  • liftOverPlink[4]

Only conda must be installed manually. All other dependencies are managed via the Conda environment.

PRScope has been tested on the Ubuntu 22.04.5 and requires a Linux system.


Setup

1. Setting Up

a. Cloning the Repository

bash git clone https://github.com/transbioZI/PRScope

b. Download the required files from the provided link

Reference files for PRScope

tar -xf reference.tar.gz

Place them into this folder input/reference/:

  • The folder should now include:

    • ldpred2_ref/
    • eur_hg38.phase3.bed
    • eur_hg38.phase3.bim
    • eur_hg38.phase3.fam
    • eur_hg38.phase3.frq

Installation Instructions

1. Install Conda

Conda Installation Guide

2. Create the Conda Environment

bash conda create -c conda-forge -c bioconda -n snakemake snakemake python=3.12.1 - Environment name: snakemake - Wait for installation to complete (~15 minutes)

3. Activate the Environment

bash conda activate snakemake


Running PRScope (please see Demo below)

bash cd PRScope ./run.sh

This command initiates the PRScope pipeline.

  • Wait for installation to complete conda environment
  • May take up to an hour

PRScope tested with

conda version : 23.1.0
snakemake version : 8.4.8
python : 3.12.1

R-sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x8664-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86
64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

locale:
LCCTYPE=C.UTF-8, LCNUMERIC=C, LCTIME=C.UTF-8, LCCOLLATE=C.UTF-8, LCMONETARY=C.UTF-8, LCMESSAGES=C.UTF-8, LCPAPER=C.UTF-8, LCNAME=C, LCADDRESS=C, LCTELEPHONE=C, LCMEASUREMENT=C.UTF-8, LCIDENTIFICATION=C

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
parallel, stats, graphics, grDevices, utils, datasets, methods, base

other attached packages:
reshape21.4.4, cluster2.1.7, xgboost1.7.8.1, bigutilsr0.3.4, reshape0.8.9, ggpubr0.6.0,doParallel1.0.17, iterators1.0.14, foreach1.5.2, glmnet4.1-8, Matrix1.7-1, lubridate1.9.4, forcats1.0.0, purrr1.0.2, readr2.1.5, tidyr1.3.1, tibble3.2.1, tidyverse2.0.0, data.table1.16.4, dplyr1.1.4, gwasrapidd0.99.17, caret7.0-1, lattice0.22-6, ranger0.17.0, stringr1.5.1, ggplot23.5.1, fmsb0.7.6, optparse1.7.5, tidyselect1.2.1, timeDate4041.110, bigassertr0.1.6, pROC1.18.5, digest0.6.37, rpart4.1.23,timechange0.3.0, lifecycle1.0.4, survival3.7-0, magrittr2.0.3, compiler4.4.1, rlang1.1.4, tools4.4.1, utf81.2.4, ggsignif0.6.4, plyr1.8.9, abind1.4-8, withr3.0.2, nnet7.3-19, grid4.4.1, stats44.4.1, fansi1.0.6, colorspace2.1-1, future1.34.0, globals0.16.3, scales1.3.0, MASS7.3-61, cli3.6.3, generics0.1.3, RSpectra0.16-2, rstudioapi0.17.1, future.apply1.11.3, tzdb0.4.0, getopt1.20.4, splines4.4.1, vctrs0.6.5, hardhat1.4.0, jsonlite1.8.9, carData3.0-5, car3.1-3, hms1.1.3, rstatix0.7.2, Formula1.2-5, listenv0.9.1, gower1.0.1, recipes1.1.0, glue1.8.0,parallelly1.40.1, codetools0.2-20, stringi1.8.4, gtable0.3.6, shape1.4.6.1, munsell0.5.1, pillar1.9.0, ipred0.9-15, lava1.8.0, R62.5.1, backports1.5.0, broom1.0.7, class7.3-22, Rcpp1.0.13-1, nlme3.1-166, prodlim2024.06.25, ModelMetrics1.2.2.2, pkgconfig_2.0.3


Pipeline Description

The following pipelines can be found in the main/snakefiles/ directory:

  • find_sumstats.snakefile – Selection of summary statistics for specified EFO IDs
  • qc_sumstats.snakefile – Quality control of the selected summary statistics
  • qc_genotype_with_liftover.snakefile – Quality control of genotype data with liftover
  • qc_genotype.snakefile – Quality control of genotype data
  • prs_calculation_prsice.snakefile – For PRS calculation using PRSice
  • prs_calculation_ldpred.snakefile – For PRS calculation using LDpred
  • ldsc_heritability_calculation.snakefile – Heritability calculation

Demo

a. Navigate to the repository path

bash cd PRScope

b. Folder Structure

  • config/ – For advanced parameter customization
  • input/ – The only folder requiring user modifications
  • main/ – Contains the main pipeline
  • output/ – Will contain output after pipeline execution

c. Preparing the Input

  1. Navigate to input/
  2. Edit efo_ids.txt:

    • Default content: EFO_0003898
    • Replace with your own EFO IDs as needed
  3. In input/genotype/, you’ll find:

  • EUR.bed
  • EUR.bim
  • EUR.fam

(Replace with your genotype data if desired. Keep EUR. as filenames)

  1. Running PRScope

bash cd PRScope ./run.sh

  1. Expected Outcome
  • output/gwas_list/gwas_search.txt - GWAS list meeting the criteria for use in PGS calculation.
  • output/qced_gwas/GCST* - GWAS in gwas_search.txt, downloaded and preprocessed, ready for PGS calculation.
  • output/qced_genotype/corrected[hg19,hg38] - EUR.FINAL.* - preprocessed version of the simulated genotype data in input/genotype/.
  • output/calculated_pgs_prsice/ all calculated PGS of GWAS in the list gwas_search.txt as data table pgs_datatable_prsice_100.tsv. The suffix _100 means, the min number of SNPs used to calculate a PGS.

References

[1] HannahVMeyer. Meyer-Lab-cshl/plinkQC: plinkQC 0.3.2. (Zenodo, 2020). 976 doi:10.5281/ZENODO.3934294.

[2] Choi, S. W. & O’Reilly, P. F. PRSice-2: Polygenic Risk Score software for biobank-scale 906 data. GigaScience 8, (2019).

[3] Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 913 36, 5424–5431 (2021).

[4] https://github.com/sritchie73/liftOverPlink

Citation

For usage of the PRScope and associated manuscript, please cite according to the enclosed citation.bib.

Owner

  • Name: transbioZI
  • Login: transbioZI
  • Kind: organization

Citation (citation.bib)

@article {Kocak2025.04.29.25326656,
	author = {Kocak, Ersoy and Naamanka, Joonas and Gradinger, Tobias and Klaassen, Fiona and Nitsche, Johannes and Grotehusmann, Philipp and Adorjan, Kristina and Antonucci, Linda A and Blasi, Giuseppe and Budde, Monika and Di Palo, Piergiuseppe and Heilbronner, Maria and Kikidis, Gianluca C and Navarro-Flores, Alba and Kohshour, Mojtaba Oraki and Papiol, Sergi and Raio, Alessandra and Rampino, Antonio and Reich-Erkelenz, Daniela and Schulte, Eva C. and Senner, Fanny and Sportelli, Leonardo and FinnGen and Bertolino, Alessandro and Falkai, Peter and Heilbronner, Urs and Pergola, Giulio and Schulze, Thomas G. and Meyer-Lindenberg, Andreas and Streit, Fabian and Schwarz, Emanuel},
	title = {Development and validation of genomic biotypes for schizophrenia susceptibility from multiple polygenic scores},
	elocation-id = {2025.04.29.25326656},
	year = {2025},
	doi = {10.1101/2025.04.29.25326656},
	publisher = {Cold Spring Harbor Laboratory Press},
	abstract = {Understanding the genetic architecture of schizophrenia (SCZ) is invaluable for the development of personalized treatment. In three independent cohorts, using an automated pipeline, we calculated 413 psychiatry-relevant polygenic scores. Using these scores, machine learning was applied to stratify SCZ patients into two biotypes in FinnGen (nSCZ=7486), and to validate these results in the PsyCourse Study (nSCZ=421) and Bari (nSCZ=531). While the two biotypes showed comparable polygenic SCZ risk, they were primarily distinguished by a greater predisposition for neuroticism, depression-related traits, and lower cognitive performance in Biotype 1. The genetic prediction of Biotype 1 was phenotypically characterized by an increased prevalence of SCZ and a more severe and complex clinical manifestation. This illustrates that the penetrance of genetic SCZ risk might partially depend on the predisposition for the aforementioned traits through pleiotropic variants. Our results provide novel, replicable insights into the genetic architecture of SCZ and might inform future personalized treatments.Competing Interest StatementGiuseppe Blasi has received lecture and consultant fees by Lundbeck. Antonio Rampino has received lecture fees by Janssen. Alessandro Bertolino received consulting fees from Biogen and lecture fees from Otsuka, Janssen, and Lundbeck. Giulio Pergola has received lecture fees by Lundbeck. Andreas Meyer-Lindenberg has received consultant fees from: Agence Nationale de la Recherche, Brain Mind Institute, Brainsway, CISSN (Catania International Summer School of Neuroscience), Daimler und Benz Stiftung, Fondation FondaMental, Hector Stiftung II, Janssen-Cilag GmbH, Lundbeck A/S, Lundbeckfonden, Lundbeck Int. Neuroscience Foundation, MedinCell, Sage Therapeutics, Techspert.io, The LOOP Zuerich, University Medical Center Utrecht, von Behring Roentgen Stiftung. Andreas Meyer-Lindenberg has received speaker fees from: Aerztekammer Nordrhein, BAG Psychiatrie Oberbayern, Biotest AG, Forum Werkstatt Karlsruhe, International Society of Psychiatric Genetics, Brentwood, Klinik fuer Psychiatrie und Psychotherapie Ingolstadt, Lundbeck SAS France, med Update GmbH, Merz-Stiftung, Siemens Healthineers, Society of Biological Psychiatry. Andreas Meyer-Lindenberg has received editorial fees from: American Association for the Advancement of Science, Elsevier, Thieme Verlag. Emanuel Schwarz received speaker fees from bfd buchholz-fachinformationsdienst GmbH and Lundbeckfonden as well as editorial fees from Lundbeckfonden and the Wellcome Trust. The other authors have declared that there are no conflicts of interest in relation to the subject of this study.Funding StatementThis work was supported by the Hector foundation II, the German Federal Ministry of Education and Research (BEST project, grant 01EK2101B), the German Research Foundation (DFG) (TRR 265/2 TP A06), and was endorsed by German Center for Mental Health (DZPG). U.H. was supported by European Union{\textquoteright}s Horizon 2020 Research and Innovation Programme (PSY-PGx, grant agreement No 945151) and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, project number 514201724). T.G.S. was supported by the European Union Horizon 2020 Research and Innovation Program (PSY-PGx, grant agreement No 945151), and also by the Deutsche Forschungsgemeinschaft within the framework of the projects www.kfo241.de and www.PsyCourse.de [SCHU 1603/4-1, 5-1, 7-1]. T.G.S. was further supported by the Dr Lisa Oehler Foundation (Kassel, Germany), the Bundesministerium fuer Bildung und Forschung (BMBF, Federal Ministry of Education and Research; projects: IntegraMent [01ZX1614K], BipoLife [01EE1404H], e:Med Program [01ZX1614K]) and European Union{\textquoteright}s Horizon 2020 Research and Innovation Programme (ERA-NET Neuron Projects GEPI-BIOPSY [BMBF No 01EW2005] and MulioBio [BMBF No 01EW2009]). G.C.K.{\textquoteright}s PhD scholarship is financially supported by Exprivia S.p.A. under the Italian ministerial decree D.M. 351. Furthermore, this study was supported by FAIR - Future AI Research (PE00000013), spoke 6 - Symbiotic AI, under the NRRP MUR program funded by the NextGenerationEU; Project CUP : H97G22000210007, and the Apulian regional government for the project: "Early Identification of Psychosis Risk"Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Ethics Committee II of the University of Heidelberg Medical Faculty Mannheim gave ethical approval for this work (protocol no. 2024-845). FinnGen was approved by the Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS), and patients and control subjects provided informed consent for biobank research, based on the Finnish Biobank Act. Bari Protocols and procedures were approved by the ethics committee of the University of Bari Aldo Moro. PsyCourse project was approved by the Ethics Committee of the University Medical Center Goettingen. Some clinical centers were teaching hospitals of the University Medical Center Goettingen, and were thus covered by this initial approval. For those clinical sites that were not covered, additional approval from the respective Ethics Committees were obtained. For all centers, these were (clinical centers in parentheses): Ethics Committees of the University Medical Center Goettingen (UMG Goettingen, Bad Zwischenahn, Eschwege, Asklepios Specialized Hospital Goettingen, Hildesheim, Lueneburg, Liebenburg, Osnabrueck, Rotenburg, Tiefenbrunn, Wilhemshaven), Medical Faculty of the LMU Munich (Munich and Augsburg), Medical Faculty of the RU Bochum (Bochum), Medical Association Bremen (Bremen Ost), Medical University of Graz (Graz), Ulm University (Guenzburg) and Medical Association Westfalen-Lippe and Medical Faculty University of Muenster (Muenster).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesFor FinnGen, researchers can apply for health data from the Finnish Data Authority Findata (https://findata.fi/en/permits/) and individual-level genotype data available through the Fingenious portal (https://site.fingenious.fi/en/). These resources are hosted by the Finnish Biobank Cooperative FINBB (https://finbb.fi/en/). Access can only be provided for research projects within the scope of the Finnish Biobank Act, which includes health promotion, understanding disease mechanisms or developing medical products or treatment practices. PsyCourse data is available to bona fide reseachers upon a submission and approval of a research proposal. Individual genotype and clinical data from the University of Bari Aldo Moro with demographic and behavioral characteristics cannot be shared at individual level in raw format because of ethic restrictions. http://www.psycourse.de/openscience-de.html https://docs.finngen.fi/},
	URL = {https://www.medrxiv.org/content/early/2025/04/30/2025.04.29.25326656},
	eprint = {https://www.medrxiv.org/content/early/2025/04/30/2025.04.29.25326656.full.pdf},
	journal = {medRxiv}
}

GitHub Events

Total
  • Watch event: 2
  • Push event: 30
  • Public event: 2
  • Create event: 1
Last Year
  • Watch event: 2
  • Push event: 30
  • Public event: 2
  • Create event: 1

Dependencies

main/environment.yaml pypi