https://github.com/clinical-genomics/mip

Mutation Identification Pipeline. Read the latest documentation:

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary

Keywords

analysis clinical pipeline variants

Last synced: 8 months ago · JSON representation

Repository

Mutation Identification Pipeline. Read the latest documentation:

Basic Info

Host: GitHub
Owner: Clinical-Genomics
License: mit
Language: Perl
Default Branch: develop
Homepage: https://clinical-genomics.gitbook.io/project-mip/
Size: 71.7 MB

Statistics

Stars: 44
Watchers: 11
Forks: 10
Open Issues: 10
Releases: 68

Topics

analysis clinical pipeline variants

Created over 13 years ago · Last pushed about 2 years ago

Metadata Files

Readme Changelog Contributing License Codeowners

MIP - Mutation Identification Pipeline

MIP enables identification of potential disease causing variants from sequencing data.

Citing MIP

Integration of whole genome sequencing into a healthcare setting: high diagnostic rates across multiple clinical entities in 3219 rare disease patients Stranneheim H, Lagerstedt-Robinson K, Magnusson M, Kvarnung M, Nilsson D, Lesko N, Engvall M, Anderlid BM, Arnell H, Johansson CB, Barbaro M, Björck E, Bruhn H, Eisfeldt J, Freyer C, Grigelioniene G, Gustavsson P, Hammarsjö A, Hellström-Pigg M, Iwarsson E, Jemt A, Laaksonen M, Enoksson SL, Malmgren H, Naess K, Nordenskjöld M, Oscarson M, Pettersson M, Rasi C, Rosenbaum A, Sahlin E, Sardh E, Stödberg T, Tesi B, Tham E, Thonberg H, Töhönen V, von Döbeln U, Vassiliou D, Vonlanthen S, Wikström AC, Wincent J, Winqvist O, Wredenberg A, Ygberg S, Zetterström RH, Marits P, Soller MJ, Nordgren A, Wirta V, Lindstrand A, Wedell A. Genome Med. 2021 Mar 17;13(1):40. doi: 10.1186/s13073-021-00855-5. PMID: 33726816; PMCID: PMC7968334.

Rapid pulsed whole genome sequencing for comprehensive acute diagnostics of inborn errors of metabolism Stranneheim H, Engvall M, Naess K, Lesko N, Larsson P, Dahlberg M, Andeer R, Wredenberg A, Freyer C, Barbaro M, Bruhn H, Emahazion T, Magnusson M, Wibom R, Zetterström RH, Wirta V, von Döbeln U, Wedell A. BMC Genomics. 2014 Dec 11;15(1):1090. doi: 10.1186/1471-2164-15-1090. PMID:25495354

Overview

MIP is being rewritten in NextFlow as a part of the nf-core project. This repo will mainly receive bugfixes as we are focusing our resources on the new pipeline. You can follow the progress here :point_right: raredisease.

MIP performs whole genome or target region analysis of sequenced single-end and/or paired-end reads from the Illumina platform in fastq(.gz) format to generate annotated ranked potential disease causing variants.

MIP performs QC, alignment, coverage analysis, variant discovery and annotation, sample checks as well as ranking the found variants according to disease potential with a minimum of manual intervention. MIP is compatible with Scout for visualization of identified variants.

MIP rare disease DNA analyses single nucleotide variants (SNVs), insertions and deletions (INDELs) and structural variants (SVs).

MIP rare disease RNA analyses mono allelic expression, fusion transcripts, transcript expression and alternative splicing.

MIP rare disease DNA vcf rerun performs re-runs starting from BCFs or VCFs.

MIP has been in use in the clinical production at the Clinical Genomics facility at Science for Life Laboratory since 2014.

Example Usage

MIP analyse rare disease DNA

Bash $ mip analyse rd_dna [case_id] --config_file [mip_config_dna.yaml] --pedigree_file [case_id_pedigree.yaml]

MIP analyse rare disease DNA VCF rerun

Bash mip analyse rd_dna_vcf_rerun [case_id] --config_file [mip_config_dna_vcf_rerun.yaml] --vcf_rerun_file vcf.bcf --sv_vcf_rerun_file sv_vcf.bcf --pedigree [case_id_pedigree_vcf_rerun.yaml]

MIP analyse rare disease RNA

Bash $ mip analyse rd_rna [case_id] --config_file [mip_config_rna.yaml] --pedigree_file [case_id_pedigree_rna.yaml]

Features

Installation
- Simple automated install of all programs using conda/docker/singularity via supplied install application
- Downloads and prepares references in the installation process
Autonomous
- Checks that all dependencies are fulfilled before launching
- Builds and prepares references and/or files missing before launching
- Decompose and normalise reference(s) and variant VCF(s)
Automatic
- A minimal amount of hands-on time
- Tracks and executes all recipes without manual intervention
- Creates internal queues at nodes to optimize processing
Flexible:
- Design your own workflow by turning on/off relevant recipes in predefined pipelines
- Restart an analysis from anywhere in your workflow
- Process one, or multiple samples
- Supply parameters on the command line, in a pedigree.yaml file or via config files
- Simulate your analysis before performing it
- Limit a run to a specific set of genomic intervals or chromosomes
- Use multiple variant callers for both SNV, INDELs and SV
- Use multiple annotation programs
- Optionally split data into clinical variants and research variants
Fast
- Analyses an exome trio in approximately 4 h
- Analyses a genome in approximately 21 h
Traceability
- Track the status of each recipe through dynamically updated status logs
- Recreate your analysis from the MIP log or generated config files
- Log sample meta-data and sequence meta-data
- Log version numbers of softwares and databases
- Checks sample integrity (sex, contamination, duplications, ancestry, inbreeding and relationship)
- Test data output file creation and integrity using automated tests
Annotation
- Gene annotation
- Summarize over all transcript and output on gene level
- Transcript level annotation
- Separate pathogenic transcripts for correct downstream annotation
- Annotate all alleles for a position
- Split multi-allelic records into single records to facilitate annotation
- Left align and trim variants to normalise them prior to annotation
- Extracts QC-metrics and stores them in YAML format
- Annotate coverage across genetic regions via Sambamba and Chanjo
Standardized
- Use standard formats whenever possible
Visualization
- Ranks variants according to pathogenic potential
- Output is directly compatible with Scout

Getting Started

Installation

MIP is written in perl and therefore requires that perl is installed on your OS.

Prerequisites

Perl, version 5.26.0 or above
Cpanm
Miniconda version 4.5.11
[Singularity] version 3.2.1

We recommend miniconda for installing perl and cpanm libraries. However, perlbrew can also be used for installing and managing perl and cpanm libraries together with MIP. Installation instructions and setting up specific cpanm libraries using perlbrew can be found here.

Automated Installation (Linux x86_64)

Below are instructions for installing the Mutation Identification Pipeline (MIP).

1. Clone the official git repository

Bash $ git clone https://github.com/Clinical-Genomics/MIP.git $ cd MIP

2. Install required perl modules from cpan to a specified conda environment

Bash $ bash mip_install_perl.sh -e [mip] -p [$HOME/miniconda3]

3. Test conda and mip installation files (optional, but recommended)

Bash $ perl t/mip_install.test A conda environment will be created where MIP with all dependencies will be installed.

4. Install MIP

Bash $ perl mip install --environment_name [mip] --reference_dir [$HOME/mip_references] This will cache the containers that are used by MIP.

Note:

For a full list of available options and parameters, run: $ perl mip install --help

6. Test your MIP installation (optional, but recommended)

Make sure to activate your MIP conda environment before executing prove.

Bash $ prove t -r $ perl t/mip_analyse_rd_dna.test

When setting up your analysis config file

A starting point for the config is provided in MIP's template directory. You will have to modify the loadenv keys to whatever you named the environment. If you are using the default environment name the loadenv part of the config should look like this:

Yml load_env: mip: mip: method: conda

Usage

MIP is called from the command line and takes input from the command line (precedence) or falls back on defaults where applicable.

Lists are supplied as repeated flag entries on the command line or in the config using the yaml format for arrays. Only flags that will actually be used needs to be specified and MIP will check that all required parameters are set before submitting to SLURM.

Recipe parameters can be set to "0" (=off), "1" (=on) and "2" (=dry run mode). Any recipe can be set to dry run mode and MIP will create the sbatch scripts, but not submit them to SLURM. MIP can be restarted from any recipe using the --start_with_recipe flag and after any recipe using the --start_after_recipe flag.

MIP will overwrite data files when reanalyzing, but keeps all "versioned" sbatch scripts for traceability.

You can always supply mip [process] [pipeline] --help to list all available parameters and defaults.

Example usage: Bash $ mip analyse rd_dna case_3 --sample_ids 3-1-1A --sample_ids 3-2-1U --sample_ids 3-2-2U --start_with_recipe samtools_merge --config 3_config.yaml

This will analyse case 3 using 3 individuals from that case and begin the analysis with recipes after Bwa mem and use all parameter values as specified in the config file except those supplied on the command line, which has precedence.

Running programs in containers

Aside from a conda environment, MIP uses containers to run programs. You can use either singularity or docker as your container manager. Containers that are downloaded using MIP's automated installer will need no extra setup. By default MIP will make the reference-, outdata- and temp directory available to the container. Extra directories can be made available to each recipe by adding the key recipe_bind_path in the config.

In the example below the config has been modified to include the infile directories for the bwamem recipe: ```Yml recipebindpath: bwamem: - ```

Input

Fastq file directories can be supplied with --infile_dirs [PATH_TO_FASTQ_DIR=SAMPLE_ID]
All references and template files should be placed directly in the reference directory specified by --reference_dir.

Meta-Data

Configuration file (YAML-format)
Gene panel file
Pedigree file (YAML-format)
Rank model file (Ini-format; SNV/INDEL)
SV rank model file (Ini-format; SV)
Qc regexp file (YAML-format)

Output

Analyses done per individual is found in each sample_id directory and analyses done including all samples can be found in the case directory.

Sbatch Scripts

MIP will create sbatch scripts (.sh) and submit them in proper order with attached dependencies to SLURM. These sbatch script are placed in the output script directory specified by --outscript_dir. The sbatch scripts are versioned and will not be overwritten if you begin a new analysis. Versioned "xargs" scripts will also be created where possible to maximize the use of the cores processing power.

Data

MIP will place any generated data files in the output data directory specified by --outdata_dir. All data files are regenerated for each analysis. STDOUT and STDERR for each recipe is written in the recipe/info directory.

Owner

Name: Clinical Genomics
Login: Clinical-Genomics
Kind: organization
Location: Stockholm, Sweden

Website: https://clinical-genomics.github.io
Repositories: 67
Profile: https://github.com/Clinical-Genomics

GitHub Events

Total

Issues event: 2
Watch event: 2
Issue comment event: 5

Last Year

Issues event: 2
Watch event: 2
Issue comment event: 5

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 117
Total pull requests: 82
Average time to close issues: 2 months
Average time to close pull requests: 9 days
Total issue authors: 17
Total pull request authors: 6
Average comments per issue: 2.29
Average comments per pull request: 0.46
Merged pull requests: 79
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 3
Pull requests: 5
Average time to close issues: 3 days
Average time to close pull requests: 3 days
Issue authors: 3
Pull request authors: 1
Average comments per issue: 2.67
Average comments per pull request: 0.6
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

henrikstranneheim (51)
robinandeer (26)
dnil (12)
emmser (6)
jemten (4)
1ctw (3)
moahaegglund (2)
nitzankol (2)
Mropat (1)
raysloks (1)
KickiLagerstedt (1)
ingkebil (1)
torbjorgen (1)
AnnHam (1)
henningonsbring (1)

Pull Request Authors

jemten (65)
henrikstranneheim (11)
raysloks (3)
robinandeer (3)
pbiology (1)
ingkebil (1)

Top Labels

Issue Labels

Enhancement (39) Bug (7) Discuss (3) Urgency L (1) Question (1)

Pull Request Labels

Dependencies

.github/workflows/build_and_publish_latest_docker.yml actions

actions/checkout v2 composite
elgohr/Publish-Docker-Github-Action master composite

.github/workflows/build_and_publish_prod_docker.yml actions

actions/checkout v2 composite
elgohr/Publish-Docker-Github-Action master composite

.github/workflows/conda_prod_install.yml actions

actions/checkout v2 composite
conda-incubator/setup-miniconda v2 composite

.github/workflows/coverage.yml actions

actions/checkout v2 composite
conda-incubator/setup-miniconda v2 composite
shogo82148/actions-setup-perl v1 composite

.github/workflows/testing.yml actions

actions/checkout v2 composite
conda-incubator/setup-miniconda v2 composite
shogo82148/actions-setup-perl v1 composite

Dockerfile docker

perl 5.26 build

containers/Dockerfile docker

ubuntu 16.04 build

containers/bedtools/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/blobfish/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/bootstrapann/Dockerfile docker

python 2.7-slim build

containers/bwa/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/bwa-mem2/Dockerfile docker

ubuntu bionic build

containers/bwakit/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/cadd/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/chromograph/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/cnvnator/Dockerfile docker

ubuntu bionic build

containers/cyrius/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/delly/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/expansionhunter/Dockerfile docker

ubuntu bionic build

containers/fastqc/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/genmod/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/gens_preproc/Dockerfile docker

clinicalgenomics/htslib 1.13 build

containers/gffcompare/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/glnexus/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/hmtnote/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/htslib/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/manta/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/megafusion/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/peddy/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/perl/Dockerfile docker

perl 5.26 build

containers/plink/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/preseq/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/rhocall/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/rseqc/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/sambamba/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/smncopynumbercaller/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/star/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/stranger/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/stringtie/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/telomerecat/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/tiddit/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/trim_galore/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/ucsc/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/upd/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/utilities/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

containers/vcfanno/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

templates/code/Dockerfile docker

clinicalgenomics/mip_base 2.1 build

https://github.com/clinical-genomics/mip

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

MIP - Mutation Identification Pipeline

Citing MIP

Overview

Example Usage

MIP analyse rare disease DNA

MIP analyse rare disease DNA VCF rerun

MIP analyse rare disease RNA

Features

Getting Started

Installation

Prerequisites

Automated Installation (Linux x86_64)

1. Clone the official git repository

2. Install required perl modules from cpan to a specified conda environment

3. Test conda and mip installation files (optional, but recommended)

4. Install MIP

Note:

6. Test your MIP installation (optional, but recommended)

When setting up your analysis config file

Usage

Running programs in containers

Input

Meta-Data

Output

Sbatch Scripts

Data

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies