moi--an-integrated-solution-for-omics-analyses

https://github.com/asaglab/moi--an-integrated-solution-for-omics-analyses

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: ASAGlab
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Size: 24.8 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

Introduction

multiOmicsIntegrator is a bioinformatics best-practice analysis pipeline for analysis of multi-Omics data.

The pipeline is built using Nextflow version 23.04.2.5870 (IMPORTANT), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

Pipeline summary

  1. RNAseq analysis on the level of :
    • mRNAs
    • miRNAs
    • isoforms
  2. Functional annotation of transcripts
  3. Metabolomics analysis
  4. Proteomics analysis
  5. Integration of multi omics data

Supplementary materials for this pipeline can be found at this zenodo repository:

https://zenodo.org/records/10813721

General inputs and outputs

The MOI pipeline is organized into individual modules, each responsible for a specific step in the analysis workflow. The modular design facilitates code flexibility in incorporating new analyses techniques or custom implementations, as well as easy maintenance and scalability.

MOI’s behavior is regulated through the params.yml files, each named to align with the specific analysis segment they govern. In those files the user is tasked with specifying input and output parameters and with the optional fine-tuning intricacies such as algorithm selection and algorithmic configurations.

The pipeline's inputs are streamlined to one csv file. This file accommodates either a solitary column of SRA codes or a directory pointing to the location of fastq files, along with any other metadata pertaining to their samples. If the analysis commences with count matrices the user can specify the directory of the feature matrix along with a phenotype file.

MOI produces extensive outputs, including informative plots and intermediate results in the form of text and RData objects for each module, accommodating users who seek further utilization or detailed inspection of results. Outputs are organized hierarchically based on the user’s parameterization; for example, the pathway enrichment analysis of genes will be located under the directory “/userdefinedoutput_directory/genes/biotranslator/”.

Most important tools

| Omics | Functionality | Tools |-------|---------------|------- | Genes, miRNA, isoforms | SRA download | SRA toolkit | | Genes, miRNA, isoforms | Quality control | FastQC, trimgalore | | Genes, miRNA, isoforms | Align and Assembly | Salmon, samtools, STAR, Hisat2, StringTie2 | | Genes, miRNA, isoforms, proteins, lipids | Data preprocessing | R packages: edger, limma, sva, ggplot2, ComplexHeatmap | | Proteins, lipids | Specific for proteins and lipids | R packages: preprocesscore, mstus normalization | | Lipids | Specific for lipids | R packags: lipidr | | Genes, miRNA, isoforms, proteins, lipids | Differential expression analyss | R packages: DESeq2, edger, RankProd, ggplot2 ComplexHeatmap | | Genes, miRNA, isoforms, proteins, lipids | Correlation analysis | R package stats | | Genes, miRNA, isoforms, proteins, lipids | Pathway enrichment analysis | Clusterprofiler, Biotranslator | | Lipids | Specific for lipids pathway enrichment analysis | Custom tool: Lipidb | | Genes, miRNA, isoforms, proteins | RIDDER (module to identify IRE1 substrates) | gRIDD, RNAeval, fimo | | Genes, miRNA, isoforms | Functional annotation | CPAT, signalP, pfam | | Genes, miRNA, isoforms, proteins | Secondary structure prediction | RNAfold, RNAeval | | Genes, miRNA, isoforms, proteins | Find motif | fimo | | Isoforms | Genome wide isoform analysis | IsoformSwitchAnalyzer |

Quick Start

  1. Install Nextflow (>=22.10.1)

  2. Install Docker.

  3. Download the pipeline and rename it:

bash git clone https://github.com/ASAGlab/MOI--An-integrated-solution-for-omics-analyses.git && mv MOI--An-integrated-solution-for-omics-analyses multiomicsintegrator

  1. Modify in the params_mcia.yml file the following parameters regarding the location you want your outputs
  • outdir: yourDir
  • pathmcia: /path/to/yourDir/mcia
  • biotransallpath : /path/to/yourDir/prepareforbio

Paths of pathmcia and biotransallpath should be complete and follow this format:

$outdir/mcia $outdir/prepareforbio

See format in params_mcia.yml and change accordingly.

### In addition check modify resources (in paramsmcia.yml) according to your system: - maxmemory : '8.GB' - max_cpus : 7

  1. Run the pipeline by providing the full path to params-file argument

bash NXF_VER=23.04.2 nextflow run multiomicsintegrator -params-file /full/path/to/params_mcia.yml -profile docker

Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE in the example command above). You can chain multiple config profiles in a comma-separated string.

  1. Start running your own analysis!

The above example refers to a simplified version of an integrated analysis. Depending on which part of the pipeline you want to run and your starting point (raw or matrices) modify the respective parameter file:

  • params_isoforms.yml
  • params_genes.yml
  • params_mirna.yml
  • params_proteins.yml
  • params_lipids.yml
  • params_mcia
  • params_ridderalone

Common issues:

  • If an error regarding biomaRt appears:

    bash Error in h(simpleError(msg, call)) : error in evaluating the argument 'conn' in selecting a method for function 'dbDisconnect': object 'info' not found Calls: useEnsembl ... .sql_disconnect -> dbDisconnect -> .handleSimpleError -> h Execution halted or

Ensembl site unresponsive, trying useast mirror Ensembl site unresponsive, trying asia mirror Error in .chooseEnsemblMirror(mirror = mirror, http_config = http_config) : Unable to query any Ensembl site Calls: useEnsembl -> .chooseEnsemblMirror Execution halted

just run the pipeline again with -resume :

bash nextflow run multiomicsintegrator -params-file /full/path/to/params_mcia.yml -profile docker -resume - If the error persists try delete container of bianca7/mompreprocess (or all containers if possible) and run again - Comparative analysis, isoform analysis and mcia need substantial resources (at least 7 cpus). - Check resources and your directories!

Documentation

The ASAGlab/moi pipeline comes with documentation about the pipeline under docs in various usage.md files as well as example yml files which the user can modify as guidance into custom modifications directly. Example outputs are also included under the docs folder in this repository.

Credits

ASAGlab/moi was originally written by Bianca Alexandra Pasat.

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #MOM channel (you can join with this invite).

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: ASAGlab
  • Login: ASAGlab
  • Kind: organization

Citation (CITATIONS.md)

# ASAGlab/moi: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.



## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Issues event: 1
  • Watch event: 2
  • Push event: 26
Last Year
  • Issues event: 1
  • Watch event: 2
  • Push event: 26

Dependencies

modules/local/cat/meta.yml cpan
modules/local/clusterprofiler/meta.yml cpan
modules/local/concetrate/meta.yml cpan
modules/local/correlation/meta.yml cpan
modules/local/cpat/meta.yml cpan
modules/local/cut/meta.yml cpan
modules/local/deseq2/meta.yml cpan
modules/local/edger/meta.yml cpan
modules/local/fimo/meta.yml cpan
modules/local/gene_to_fasta/meta.yml cpan
modules/local/gridd/meta.yml cpan
modules/local/isopart1/meta.yml cpan
modules/local/isopart1a/meta.yml cpan
modules/local/isopart2/meta.yml cpan
modules/local/isovis/meta.yml cpan
modules/local/iupred2a/meta.yml cpan
modules/local/lipidr/meta.yml cpan
modules/local/lipidr_normalize/meta.yml cpan
modules/local/mcia/meta.yml cpan
modules/local/metaboanalystr/meta.yml cpan
modules/local/mom_batch/meta.yml cpan
modules/local/mom_filter/meta.yml cpan
modules/local/mom_norm/meta.yml cpan
modules/local/mom_norm_mstus/meta.yml cpan
modules/local/multiqc/meta.yml cpan
modules/local/pfam/meta.yml cpan
modules/local/probscan/meta.yml cpan
modules/local/rankprod/meta.yml cpan
modules/local/rnaeval/meta.yml cpan
modules/local/rnafold/meta.yml cpan
modules/local/signalp/meta.yml cpan
modules/nf-core/bbmap/bbsplit/meta.yml cpan
modules/nf-core/cat/fastq/meta.yml cpan
modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
modules/nf-core/custom/getchromsizes/meta.yml cpan
modules/nf-core/custom/sratoolsncbisettings/meta.yml cpan
modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/gffread/meta.yml cpan
modules/nf-core/gunzip/meta.yml cpan
modules/nf-core/hisat2/align/meta.yml cpan
modules/nf-core/hisat2/build/meta.yml cpan
modules/nf-core/hisat2/extractsplicesites/meta.yml cpan
modules/nf-core/picard/markduplicates/meta.yml cpan
modules/nf-core/preseq/lcextrap/meta.yml cpan
modules/nf-core/rsem/calculateexpression/meta.yml cpan
modules/nf-core/rsem/preparereference/meta.yml cpan
modules/nf-core/salmon/index/meta.yml cpan
modules/nf-core/salmon/index_onlyfasta/meta.yml cpan
modules/nf-core/salmon/quant/meta.yml cpan
modules/nf-core/samtools/flagstat/meta.yml cpan
modules/nf-core/samtools/idxstats/meta.yml cpan
modules/nf-core/samtools/index/meta.yml cpan
modules/nf-core/samtools/sort/meta.yml cpan
modules/nf-core/samtools/stats/meta.yml cpan
modules/nf-core/sortmerna/meta.yml cpan
modules/nf-core/sratools/fasterqdump/meta.yml cpan
modules/nf-core/sratools/prefetch/meta.yml cpan
modules/nf-core/star/align/meta.yml cpan
modules/nf-core/star/genomegenerate/meta.yml cpan
modules/nf-core/stringtie/stringtie/meta.yml cpan
modules/nf-core/trimgalore/meta.yml cpan
modules/nf-core/umitools/dedup/meta.yml cpan
modules/nf-core/umitools/extract/meta.yml cpan
modules/nf-core/untar/meta.yml cpan
subworkflows/nf-core/align_hisat2/meta.yml cpan
subworkflows/nf-core/bam_dedup_stats_samtools_umitools/meta.yml cpan
subworkflows/nf-core/bam_markduplicates_picard/meta.yml cpan
subworkflows/nf-core/bam_sort_stats_samtools/meta.yml cpan
subworkflows/nf-core/bam_stats_samtools/meta.yml cpan
subworkflows/nf-core/fastq_download_prefetch_fasterqdump_sratools/meta.yml cpan
subworkflows/nf-core/fastq_fastqc_umitools_trimgalore/meta.yml cpan
pyproject.toml pypi