https://github.com/biocore/q2-qemistree

Hierarchical orderings for mass spectrometry data. Canonically pronounced "chemis-tree".

https://github.com/biocore/q2-qemistree

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, sciencedirect.com, nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary

Keywords

cheminformatics fragmentation-trees metabolomics microbiome phylogenetics
Last synced: 6 months ago · JSON representation

Repository

Hierarchical orderings for mass spectrometry data. Canonically pronounced "chemis-tree".

Basic Info
  • Host: GitHub
  • Owner: biocore
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 87.3 MB
Statistics
  • Stars: 32
  • Watchers: 11
  • Forks: 15
  • Open Issues: 24
  • Releases: 5
Topics
cheminformatics fragmentation-trees metabolomics microbiome phylogenetics
Created over 7 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License

README.md

q2-qemistree

Canonically pronounced chemis-tree.

Build Status Coverage Status

A tool to build a tree of mass-spectrometry (LC-MS/MS) features to perform chemically-informed comparison of untargeted metabolomic profiles. The manuscript describing q2-qemistree is available here.

Qemistree manuscript

Installation

Once QIIME 2 is installed, activate your QIIME 2 environment and install q2-qemistree following the steps below:

bash git clone https://github.com/biocore/q2-qemistree.git cd q2-qemistree pip install . qiime dev refresh-cache

q2-qemistree uses SIRIUS, a software-framework developed for de-novo identification of metabolites. We use molecular substructures predicted by SIRIUS to build a hierarchy of the MS1 features in a dataset. For this demo, please download and unzip the latest version of SIRIUS from here.

Below, we download SIRIUS for macOS as follows (for linux the only thing that changes is the URL from which the binary is downloaded):

bash wget https://bio.informatik.uni-jena.de/repository/dist-release-local/de/unijena/bioinf/ms/sirius/4.9.3/sirius-4.9.3-osx64-headless.zip unzip sirius-4.9.3-osx64-headless.zip

Note: Qemistree was initially developed under Sirius 4.0.1 version. Since Sirius 4.0.1 got to its end of life, Qemistree was recently adapted to work with the new Sirius versions (>4.4.29).

Demonstration

q2-qemistree ships with the following methods:

qiime qemistree compute-fragmentation-trees qiime qemistree rerank-molecular-formulas qiime qemistree predict-fingerprints qiime qemistree make-hierarchy qiime qemistree get-classyfire-taxonomy qiime qemistree prune-hierarchy

To generate a tree that relates the MS1 features in your experiment, we need to pre-process mass-spectrometry data (.mzXML, .mzML or .mzDATA files) using MZmine2 and produce the following inputs:

  1. An MGF file with both MS1 and MS2 information. This file will be imported into QIIME 2 as a MassSpectrometryFeatures artifact.
  2. A feature table with peak areas of MS1 ions per sample. This table will be imported from a CSV file into the BIOM format, and then into QIIME 2 as a FeatureTable[Frequency] artifact.

These input files can be obtained following peak detection in MZmine2. Here is an example MZmine2 batch file used to generate these.

To begin this demonstration, create a separate folder to store all the inputs and outputs:

bash mkdir demo-qemistree cd demo-qemistree

Download a small feature table and MGF file using:

bash wget https://raw.githubusercontent.com/biocore/q2-qemistree/master/q2_qemistree/demo/feature-table.biom wget https://raw.githubusercontent.com/biocore/q2-qemistree/master/q2_qemistree/demo/sirius.mgf

We import these files into the appropriate QIIME 2 artifact formats as follows:

bash qiime tools import --input-path feature-table.biom --output-path feature-table.qza --type FeatureTable[Frequency] qiime tools import --input-path sirius.mgf --output-path sirius.mgf.qza --type MassSpectrometryFeatures

Note: If the MGF file has formatting errors (eg. no MS1 are included in the MGF, or if an MS1 entry does not have a corresponding MS2 entry), then an appropriate error message will help users troubleshoot this step before proceeding forward. First, we generate fragmentation trees for molecular peaks detected using MZmine2:

bash qiime qemistree compute-fragmentation-trees --p-sirius-path 'sirius.app/Contents/MacOS' \ --i-features sirius.mgf.qza \ --p-ppm-max 15 \ --p-profile orbitrap \ --p-ions-considered '[M+H]+' \ --p-java-flags "-Djava.io.tmpdir=/path-to-some-dir/ -Xms16G -Xmx64G" \ --o-fragmentation-trees fragmentation_trees.qza Note: /path-to-some-dir/ should be a directory where you have write permissions and sufficient storage space. We use -Xms16G and -Xmx64G as the minimum and maximum heap size for Java virtual machine (JVM). If left blank, q2-qemistree will use default JVM flags.

This generates a QIIME 2 artifact of type SiriusFolder. This contains fragmentation trees with candidate molecular formulas for each MS1 feature detected in your experiment.

Note 2: The new Sirius versions have the parameter --p-ions-considered, which refers to the adduct of the MS/MS data to considered. Here are some examples: [M+H]+, [M+K]+, [M+Na]+, [M+H-H2O]+, [M+H-H4O2]+, [M+NH4]+, [M-H]-, [M+Cl]-, [M-H2O-H]-, [M+Br]-.

You can also provide a comma-separated list. Example: '[M+H]+, [M+Na]+'.

Next, we select top scoring molecular formula as follows:

bash qiime qemistree rerank-molecular-formulas --p-sirius-path 'sirius.app/Contents/MacOS' \ --i-features sirius.mgf.qza \ --i-fragmentation-trees fragmentation_trees.qza \ --p-zodiac-threshold 0.95 \ --p-java-flags "-Djava.io.tmpdir=/path-to-some-dir/ -Xms16G -Xmx64G" \ --o-molecular-formulas molecular_formulas.qza

This produces a QIIME 2 artifact of type ZodiacFolder with top-ranked molecular formula for MS1 features. Now, we predict molecular substructures in each feature based on the molecular formulas. We use CSI:FingerID for this purpose as follows:

bash qiime qemistree predict-fingerprints --p-sirius-path 'sirius.app/Contents/MacOS' \ --i-molecular-formulas molecular_formulas.qza \ --p-ppm-max 20 \ --p-java-flags "-Djava.io.tmpdir=/path-to-some-dir/ -Xms16G -Xmx64G" \ --o-predicted-fingerprints fingerprints.qza

This gives us a QIIME 2 artifact of type CSIFolder that contains probabilities of molecular substructures (total 2936 molecular properties) within in each feature. We use these predicted molecular substructures to generate a hierarchy of molecules as follows:

bash qiime qemistree make-hierarchy \ --i-csi-results fingerprints.qza \ --i-feature-tables feature-table.qza \ --o-tree qemistree.qza \ --o-feature-table feature-table-hashed.qza \ --o-feature-data feature-data.qza

To support meta-analyses, this method is capable of handling one or more datasets i.e pairs of CSI results and feature tables. You will need to download a new feature table and csi fingerprint result from another experiment to test this functionality as follows:

bash wget https://raw.githubusercontent.com/biocore/q2-qemistree/master/q2_qemistree/demo/feature-table2.biom.qza wget https://raw.githubusercontent.com/biocore/q2-qemistree/master/q2_qemistree/demo/fingerprints2.qza

Below is the q2_qemistree command to co-analyze the datasets together:

bash qiime qemistree make-hierarchy \ --i-csi-results fingerprints.qza \ --i-csi-results fingerprints2.qza \ --i-feature-tables feature-table.qza \ --i-feature-tables feature-table2.biom.qza \ --o-tree merged-qemistree.qza \ --o-feature-table merged-feature-table-hashed.qza \ --o-feature-data merged-feature-data.qza Additionally, Qemistree also supports the inclusion of structural annotations made using MS/MS spectral library matches for downstream analysis using the optional input --i-ms2-matches as follows:

bash qiime qemistree make-hierarchy \ --i-csi-results fingerprints.qza \ --i-feature-tables feature-table.qza \ --i-ms2-matches /path-to-MS2-spectral-matches.qza/ \ --o-tree qemistree.qza \ --o-feature-table feature-table-hashed.qza \ --o-feature-data feature-data.qza

Note: 1. The input to --i-ms2-matches can be obtained using Feature-based molecular networking or FBMN workflow supported in the web-based mass-spectrometry data analysis platform, GNPS. To use MS2 matches in Qemistree, please download the results of FBMN workflow and import the tsv file in the folder clusterinfo_summary as a QIIME2 artifact of type FeatureData[Molecules] as follows:

bash qiime tools import \ --input-path path-to-MS2-spectral-matches.tsv \ --output-path path-to-MS2-spectral-matches.qza \ --type FeatureData[Molecules]

  1. The input CSI results, feature tables and MS2 match tables should have a one-to-one correspondence i.e CSI results, feature tables and MS2 match tables from all datasets should be provided in the same order.

This method generates the following: 1. A combined feature table by merging all the input feature tables; MS1 features without fingerprints are filtered out of this feature table. This is done because SIRIUS predicts molecular substructures for a subset of features (typically for 70-90% of all MS1 features) in an experiment (based on factors such as sample type, the quality MS2 spectra, and user-defined tolerances such as --p-ppm-max, --p-zodiac-threshold). This output is of type FeatureTable[Frequency]. 2. A tree relating the MS1 features in these data based on molecular substructures predicted for MS1 features. This is of type Phylogeny[Rooted]. By default, we retain all fingerprint positions i.e. 2936 molecular properties). Adding --p-qc-properties filters these properties to keep only PubChem fingerprint positions (489 molecular properties) in the contingency table. Note: The latest release of SIRIUS uses PubChem version downloaded on 13 August 2017. 3. A combined feature data file that contains unique identifiers of each feature, their corresponding original feature identifier (row ID from Mzmine2), parent mass (parent_mass), retention time (retention_time), CSI:FingerID structure predictions (csi_smiles), MS2 match structure predictions (ms2_smiles), and the table(s) (table_number) that each feature was detected in. This is of type FeatureData[Molecules]. (The renaming of features helps prevent overlap between non-unique feature identifiers in the original feature tables in case of meta-analyses)

These can be used as inputs to perform chemical phylogeny-based alpha-diversity and beta-diversity analyses.

Furthermore, Qemistree supports the classification of molecules into Classyfire chemical taxonomy. We generate a feature data table (also of the type FeatureData[Molecules]) which includes classification of molecules into chemical 'kingdom', 'superclass', 'class', 'subclass', and 'direct_parent'. We can run Classyfire using Qemistree as follows:

bash qiime qemistree get-classyfire-taxonomy \ --i-feature-data merged-feature-data.qza \ --o-classified-feature-data classified-merged-feature-data.qza Qemistree will use ms2_smiles to make chemical taxonomy assignments, when MS2 matches are available for a feature. Otherwise, csi_smiles will be used. The column structure_source in classified-merged-feature-data.qza records whether taxonomic assignment was done using CSI:FingerID predictions or MS/MS library matches.

Lastly, Qemistree includes some utility functions that are useful to visualize and explore the molecular hierarchy generated above. Qemistree trees can be visualized using q2-empress [preprint]. Below are the installation instructions that can be run within your qiime2 environment:

bash pip uninstall --yes emperor pip install git+https://github.com/biocore/empress.git qiime dev refresh-cache

  1. Prune molecular hierarchy to keep only the molecules with annotations.

bash qiime qemistree prune-hierarchy \ --i-feature-data classified-merged-feature-data.qza \ --p-column class \ --i-tree merged-qemistree.qza \ --o-pruned-tree merged-qemistree-class.qza

Users can choose any of the data columns (--p-column) that are in the classified-merged-feature-data.qza file to prune the hierarchy. For e.g. '#featureID','kingdom', 'superclass', 'class', 'subclass', 'direct_parent', and 'smiles'. All features with no data in this column will be removed from the phylogeny.

  1. Generate an annotated qemistree tree in using q2-empress.

bash qiime empress community-plot \ --i-tree merged-qemistree-class.qza \ --i-feature-table feature-table-hashed.qza \ --m-sample-metadata-file path-to-sample-metadata.tsv \ --m-feature-metadata-file classified-merged-feature-data.qza \ --o-visualization empress-tree.qzv

The output empress QZV can be visualized using Qiime2 Viewer; EMPress can be used to interactively modify the tree visualization. Below is an example visualization from Empress' preprint. Here, the user has sample metadata columns (food sources) to compare groups of food samples; Empress enables them to visualize metabolite relative prevalence as barcharts at the tips of the tree.

Empress plot

Please visit the Empress tutorial for all the currently supported tree visualization features that can be leveraged to explore the chemical diversity of your metabolomics dataset.

Owner

  • Name: biocore
  • Login: biocore
  • Kind: organization
  • Location: Cyberspace

Collaboratively developed bioinformatics software.

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 83
  • Total pull requests: 76
  • Average time to close issues: 5 months
  • Average time to close pull requests: 14 days
  • Total issue authors: 18
  • Total pull request authors: 8
  • Average comments per issue: 0.64
  • Average comments per pull request: 0.87
  • Merged pull requests: 65
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 1.33
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • anupriyatripathi (51)
  • ElDeveloper (12)
  • mwang87 (5)
  • MuBulut (1)
  • YUANMY2021 (1)
  • jeep3 (1)
  • DeepaAcharya (1)
  • askerdb (1)
  • helenamrusso (1)
  • kechen1984 (1)
  • lfnothias (1)
  • mortonjt (1)
  • typewritermonkey (1)
  • Samiloffe (1)
  • anani-a-missinou (1)
Pull Request Authors
  • anupriyatripathi (45)
  • ElDeveloper (15)
  • helenamrusso (5)
  • mwang87 (3)
  • stephramos17 (2)
  • qiyunzhu (2)
  • fedarko (2)
  • tgroth97 (1)
Top Labels
Issue Labels
enhancement (15) good first issue (6) help wanted (4) wontfix (2)
Pull Request Labels
duplicate (1) wontfix (1)

Dependencies

setup.py pypi
  • itolapi *