https://github.com/broadinstitute/bmxp

The tools that constitute a nontargeted LCMS metabolomics data processing pipeline, created and used by the Broad Institute Metabolomics Platform.

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: biorxiv.org
✓
Committers with academic emails
1 of 4 committers (25.0%) from academic institutions
✓
Institutional organization owner
Organization broadinstitute has institutional domain (www.broadinstitute.org)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Last synced: 6 months ago · JSON representation

Repository

The tools that constitute a nontargeted LCMS metabolomics data processing pipeline, created and used by the Broad Institute Metabolomics Platform.

Basic Info

Host: GitHub
Owner: broadinstitute
License: other
Language: Python
Default Branch: main
Size: 1.55 MB

Statistics

Stars: 13
Watchers: 4
Forks: 0
Open Issues: 0
Releases: 0

Created almost 4 years ago · Last pushed 7 months ago

Metadata Files

Readme License

BMXP - The Metabolomics Platform at the Broad Institute

pip install bmxp

Please cite: https://www.biorxiv.org/content/10.1101/2023.06.09.544417v1.full

This is a collection of tools for processing our data, which powers our cloud processing workflow. Each tool is meant to be a standalone module that performs a step in our processing pipeline. They are written in Python and C, and designed to be perfomant and cloud-compatible.

Eclipse - Align two or more same-method nontargeted LCMS datsets.
Gravity - Cluster redundant LCMS features based on RT and Correlation (And someday, XIC shape)
Blueshift - Drift Correction via pooled technical replicates and internal standards
Formation - Formatting and Final QC
Chroma - Read .raw and .mzml files

We expect users to be familiar with Python and already have an understanding of LCMS Metabolomics data processing and the specific steps they wish to accomplish.

While the tools are and always will be standalone, we are working on linking them closer together with a shared schema, and eventually may have a pipeline ability to run all steps, given a set of parameters.

We are open to feedback and suggestions, with a focus on performance and application in pipelines.

Shared Schema

All BMXP modules use a shared schema and file formats with our prefered columns headers. These files are (along with their labels): * Feature Metadata bmxp.FMDATA - Describes the feature. Index default is Compound_ID * Injection Metadata bmxp.IMDATA - Describes the Injection. Index default is injection_id * Sample Metadata bmxp.SMDATA - Describes the biospecimen from which the Injection is derived. Index default is broad_id * Feature Abundances - Pivot table of Feature x Injection (Compound_ID x injection_id) containing the abundances.

Some modules (Blueshift, Eclipse) require merging Feature Metadata + Feature Abundances.

These can be changed globally so that all packages will use the same terminology. To update the schema, modify the dictionary objects in the module directly prior to running code. For example: ```python import bmxp from bxmp.eclipse import MSAligner from bxmp.blueshift import DriftCorrection from bmxp.gravity import cluster bmxp.FMDATA['CompoundID'] = 'FeatureID' bmxp.IMDATA['injection_id'] = 'Filename'

continue with work...

``` With those changes above, Eclipse, Blushift and Gravity will use "FeatureID" and "Filename" as column headers instead of "CompoundID" and "injection_id".

Feature Metadata - bmxp.FMDATA

Feature Metadata describes the LCMS feature. This is a mixture of fundamental nontargeted feature information, annotation info, and anything else.

Feature Specific

Compound_ID - Index, Project-unique feature ID (a bit of a misnomer)
RT - Unitless retention time, may or may not be scaled
MZ - Unsigned mass-to-charge ratio
Intensity - Average feature intensity
Method - Human Readable name of LCMS method used
__extraction_method - Name of extraction method/software used. Used to denote mixed Targeted/Nontargeted

Annotation

Annotation_ID - Method-unique annotation label
Adduct - Adduct form of the annotation
__annotation_id - Globally unique annotation identifier
Metabolite - Preferred display/reporting name of metabolite
Non_Quant - Boolean denoting that a feature is not quanitifiable

Generated by Gravity

Cluster_Num - Cluster number assigned during Gravity clustering
Cluster_Size - Number of members in the cluster

Generated by Blueshift

Batches Skipped - Batches that were skipped due to lack of PREFs

Injection Metadata - bmxp.IMDATA

injection_id - Index, Injection name, usually filename without the extension
broad_id - Assigned biospeciemn label
program_id - Biospecimen label as received (inherited from Sample Metadata)
injection_type - Type of injection ("sample", "prefa", "prefb", "blank", "other-", "not_used-")
comments - Comments about the injection
column_number - Column number, in multi-column studies
injection_order - Injection number, not skipping blanks or non-samples
batches - Denotes batches ('batch start' or 'batch end')

generated by blueshift

QCRole - Role in drift correction ("QC-driftcorrection", "QC-pooledref", "QC-not_used", "sample")

Sample Metadata - bmxp.SMDATA

broad_id - Assigned biospecimen label
Arbitrary Metadata Columns - Any column label except labels in Injection Metadata

Owner

Name: Broad Institute
Login: broadinstitute
Kind: organization
Location: Cambridge, MA

Website: http://www.broadinstitute.org/
Twitter: broadinstitute
Repositories: 1,083
Profile: https://github.com/broadinstitute

Broad Institute of MIT and Harvard

GitHub Events

Total

Watch event: 9
Delete event: 2
Member event: 1
Push event: 14
Pull request event: 6
Create event: 3

Last Year

Watch event: 9
Delete event: 2
Member event: 1
Push event: 14
Pull request event: 6
Create event: 3

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 9
Total Committers: 4
Avg Commits per committer: 2.25
Development Distribution Score (DDS): 0.556

Top Committers

Name	Email	Commits
jkrejci7	5**7@u**m	4
Daniel Hitchcock	d**k@g**m	3
Daniel Hitchcock	d**h@b**g	1
Daniel Hitchcock	4**k@u**m	1

Committer Domains (Top 20 + Academic)

broadinstitute.org: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 11
Average time to close issues: N/A
Average time to close pull requests: about 6 hours
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 11
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: 4 minutes
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

danhitchcock (20)
jkrejci7 (2)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 193 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 51
Total maintainers: 3

pypi.org: bmxp

LCMS Processing tools used by the Metabolomics Platform at the Broad Institute.

Documentation: https://bmxp.readthedocs.io/
License: MIT
Latest release: 0.3.15
published 7 months ago

Versions: 51
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 193 Last month

Rankings

Dependent packages count: 6.6%

Downloads: 13.0%

Average: 21.8%

Stargazers count: 28.2%

Forks count: 30.5%

Dependent repos count: 30.6%

Maintainers (3)

danhitchcock jkrejci chloesturgeon

Last synced: 7 months ago

Dependencies

requirements.txt pypi

matplotlib *
networkx *
pandas *
scipy *
statsmodels *

setup.py pypi

https://github.com/broadinstitute/bmxp

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

BMXP - The Metabolomics Platform at the Broad Institute

Shared Schema

continue with work...

Feature Metadata - bmxp.FMDATA

Feature Specific

Annotation

Generated by Gravity

Generated by Blueshift

Injection Metadata - bmxp.IMDATA

generated by blueshift

Sample Metadata - bmxp.SMDATA

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: bmxp

Rankings

Maintainers (3)

Dependencies