metabolomics_base_workflow

Snakemake pipeline for metabolomics data analysis

https://github.com/matteobolner/metabolomics_base_workflow

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Snakemake pipeline for metabolomics data analysis

Basic Info
  • Host: GitHub
  • Owner: matteobolner
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 181 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

Metabolomics base workflow

This repository contains a Snakemake pipeline to conduct metabolomics data analysis comparing two groups of samples in a robust manner, handling the random component on every step of the analysis. Most of the code to handle the metabolomics datasets is based on the metabotk python library (available at https://github.com/matteobolner/metabotk). Variants of this pipeline have been used to analyze metabolomics data in the following publications:
* High-throughput untargeted metabolomics reveals metabolites and metabolic pathways that differentiate two divergent pig breeds (Published) * Description of metabolic differences between castrated males and intact gilts obtained from high-throughput metabolomics of porcine plasma (Accepted for publication)

Usage

Edit the config.yaml file to define the dataset and analysis parameters. The metabolomics dataset must be in excel format and contain the following sheets: * Sample metadata * Chemical annotation (metabolite metadata) * Metabolite abundance data

Pipeline structure

The summarized structure of this pipeline can be seen below (rulegraph.svg). For the full pipeline including the repeated steps to account for multiple random seeds, see dag.svg

  • Missing data imputation
    • Remove outlier values and samples/metabolites with too many missing values (> 25% by default)
    • Impute missing values with MICE predictive mean matching (pmm)
  • Normalization (optional)
  • Removal of confounding effects (OLS regression)
  • Feature selection
    • Boruta (Random Forest)
  • Differential metabolite analysis
    • Univariate analyses (Mann-Whitney, ROC AUC)
    • Multivariate analyses (Correlation networks) - not added to this repo yet

Alt text

Citation

If you use this workflow and want to cite it, you can cite our paper: High-throughput untargeted metabolomics reveals metabolites and metabolic pathways that differentiate two divergent pig breeds

Owner

  • Name: Matteo
  • Login: matteobolner
  • Kind: user
  • Location: Trento - Bologna - Italy

PhD student at University of Bologna Bioinformatics graduate

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Bolner"
  given-names: "Matteo"
  orcid: "https://orcid.org/0000-0002-4985-0191"
title: "High-throughput untargeted metabolomics reveals metabolites and metabolic pathways that differentiate two divergent pig breeds"
version: 1.0.0
doi: 10.1016/j.animal.2024.101393
date-released: 2025-01-01
url: "https://github.com/matteobolner/metabolomics_base_workflow"

GitHub Events

Total
  • Push event: 91
  • Public event: 1
Last Year
  • Push event: 91
  • Public event: 1