nf-pipeline-regenie
GWAS and rare variants tests at high speed using regenie
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Repository
GWAS and rare variants tests at high speed using regenie
Basic Info
- Host: GitHub
- Owner: HTGenomeAnalysisUnit
- License: mit
- Language: Nextflow
- Default Branch: main
- Size: 46.9 MB
Statistics
- Stars: 13
- Watchers: 0
- Forks: 3
- Open Issues: 7
- Releases: 9
Metadata Files
README.md
nf-pipeline-regenie
A nextflow pipeline to perform genome-wide association studies (GWAS) and rare variant association analysis using regenie at high speed.

Main features
- The pipeline is optimized for massive scaling, by chunking operations as much as possible. When computational resources are available, you can cut down run time by increasing the limit on concurrent tasks.
- All major data types are accepted as input, including plink1 binary dataset (bed/bim/fam), plink2 binary dataset (pgen/pvar/psam), bgen format (bgen/bgi/sample), and vcf.gz format.
- The pipeline can perform both standard GWAS analysis on single variants, and aggregated rare variant tests using burden test and any of the tests available in regenie, namely skat, skato, sakto-acat, acatv, acato, acato-full.
- Taking advantage of regenie you can also perform GxE and GxG interaction analysis and conditional analysis providing a list of variants to condition on.
- Results include summary statistics, but also filtered tophits / loci annotated with nearby genes and an HTML report for each phenotype with Manhattan plot and regional plots for the best loci.
- Two running modes are available: single project mode and multi models mode. Using the multi models mode it is possible fully automate the test of multiple association models for a cohort. You just need to provide a trait table with phenotype and covariates and a model table containing all the desired combinations of models. The pipeline will take care of setting up uniform analysis groups.
Documentation
Complete documentation is available in GitHub pages
How to use
The suggested way to run the pipeline is to create a config file defining your computations environment and a config file for your project. You can use the templates provided in the templates folder.
Then you can invoke the pipeline using nextflow run HTGenomeAnalysisUnit/nf-pipeline-regenie -profile singularity,myprofile -c your_project.conf -c your_profile.conf
Quick Start
Create a folder for your project (e.g.
yourproject)Prepare a tab-separated table of phenotypes and eventually covariates (see the input section).
Prepare and configure the required input data for step 2, usually an imputed or sequencing dataset, and step 1, usually a QCed and pruned dataset. You can eventually prepare also a set of files for LD computation, suggested when analyzing a large dataset with > 100k samples.
If you want to perform a multi-models or multi-projects execution, prepare the models table or the projects table to describe your analyses.
Prepare the necessary config files, using the templates provided in the
templatesfolder:- A config file describing settings and inputs for your project
- A config file to define the profile for your computational environment.
- Optionally, you can also add configuration to enable execution monitoring using Nextflow Tower
Invoke the pipeline using for example
nextflow run HTGenomeAnalysisUnit/nf-pipeline-regenie -profile singularity,slurm -c my_project.conf. We have basic executors already configured (namely slurm, sge and lsf), but it is suggested to prepare a speific profile for your computational environment like suggested in point 5.
Usually, you want to prepare a script to submit the pipeline in your project folder. In this example we use sbatch submission system, but this can be adapted to any scheduler. myprofile corresponds to a profile you created for your computational environment:
```bash
!/bin/bash
SBATCH --job-name nf-regenie
SBATCH --output nf-regeniemaster%A.log
SBATCH --partition cpuq
SBATCH --cpus-per-task 1
SBATCH --mem 8G
SBATCH --time 1-00:00:00
module load nextflow/22.10.1 singularity/3.8.5
export NXFOPTS="-Xms1G -Xmx8G" nextflow run HTGenomeAnalysisUnit/nf-pipeline-regenie \ -profile singularity,myprofile -c yourproject.conf ```
Alternatively, you can clone the latest pipeline version using
git clone --depth 1 https://github.com/HTGenomeAnalysisUnit/nf-pipeline-regenie.git
This will create a new folder called nf-pipeline-regenie in the current folder containing all the pipeline files.
You can eventually chose a specific version of the pipeline using the --branch option
git clone --depth 1 --branch v1.8.2 https://github.com/HTGenomeAnalysisUnit/nf-pipeline-regenie.git
Performances
Using the default settings, 100 quantitative phenotypes can be analyzed on the full UKBB dataset (about 500k individuals and 45M SNPs) including tophit annotation and top loci identification (but without HTML reports) in ~7h applying a limit of 200 concurrent tasks (peak of 800 CPUs usage).
Credits
The original concept is based on this amazing github repository from Institute of Genetic Epidemiology, Innsbruck maintained by Sebastian Schönherr and Lukas Forer.
License
nf-highspeed-gwas is MIT Licensed.
Contact
Owner
- Name: HT Genome Analysis Unit
- Login: HTGenomeAnalysisUnit
- Kind: organization
- Email: edoardo.giacopuzzi@fht.org
- Repositories: 4
- Profile: https://github.com/HTGenomeAnalysisUnit
Genome Analysis Unit in the Population Genomic theme
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Giacopuzzi" given-names: "Edoardo" title: "nf-pipeline-regenie - Perform regenie GWAS at high speed." version: 1.8 url: "https://github.com/HTGenomeAnalysisUnit/nf-pipeline-regenie"
GitHub Events
Total
- Create event: 3
- Issues event: 7
- Release event: 3
- Watch event: 3
- Delete event: 2
- Issue comment event: 2
- Push event: 15
Last Year
- Create event: 3
- Issues event: 7
- Release event: 3
- Watch event: 3
- Delete event: 2
- Issue comment event: 2
- Push event: 15