squallms

Repository for the Bioconductor squallms R package

https://github.com/wkumler/squallms

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

Repository for the Bioconductor squallms R package

Basic Info
  • Host: GitHub
  • Owner: wkumler
  • License: other
  • Language: R
  • Default Branch: devel
  • Homepage:
  • Size: 11.7 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 8
  • Releases: 0
Created about 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

Speedy quality assurance via lasso labeling for untargeted MS data (squallms)

Overview

squallms is a Bioconductor R package that implements a "semi-labeled" approach to untargeted mass spectrometry data. It pulls in raw data from mass-spec files to calculate several metrics that are then used to label MS features in bulk as high or low quality that are then passed to a simple logistic model that produces a fully-labeled dataset suitable for downstream analysis.

Step 0: Installation

squallms isn't yet on Bioconductor, so the easiest way to install it is directly from Github with the remotes package. This section will be updated once it's on Bioconductor properly.

remotes::install_github("https://github.com/wkumler/squallms")

Once installed, squallms can be loaded like any other package:

library(squallms)

Step 1: Metric extraction

squallms obtains peak quality metrics in two ways. First, it compares individual MS features to an idealized bell shape as detailed in Kumler et al. 2023 (figure below) to extract the betacor and betasnr metrics. Second, it constructs a retention time by filename by normalized intensity matrix and performs a PCA to extract the dominant feature signal - typically also a bell curve represented in the first or second principal components. The PCs are used to group together similar features for rapid annotation in Step 2, while the betacor and betasnr metrics are used alongside the labels to construct the logistic model in Step 3 below.

Step 2: Labeling

Two labeling tools are provided for rapid MS feature classification. The first uses a Shiny app to render each feature as a chromatogram and accepts keybound inputs to assign classes to the feature. The second uses the PCA coordinates extracted in Step 1 to place features in a similarity space and triggers a small Shiny app to label clusters of compounds using the "lasso" tool. Both tools produce a named vector of features with classifications used to train the logistic model detailed in Step 3.

Built-in Shiny app for simultaneous lasso labeling of similar features:

Step 3: Logistic modeling

After metrics have been extracted and labeling has occurred, a logistic model can be trained to predict MS feature class from the betacor and betasnr values obtained in Step 1 (and additional metrics supplied by the user). This model returns the estimated likelihood of each peak being classified as "Good" or "Bad" which can then be used to remove features that fall below a given likelihood threshold.

Demo:

``` library(tidyverse) library(xcms) library(MSnbase) library(RaMS)

remotes::install_github("https://github.com/wkumler/squallms")

library(squallms)

mzML_files <- list.files(system.file("extdata", package = "RaMS"), full.names=TRUE)[c(3,5,6)]

register(BPPARAM = SerialParam(progressbar = TRUE)) msnexpfilled <- readMSData(files = mzMLfiles, msLevel. = 1, mode = "onDisk") %>% findChromPeaks(CentWaveParam(snthresh = 0)) %>% adjustRtime(ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt")) %>% groupChromPeaks(PeakDensityParam(sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0)) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5))

msdata <- grabMSdata(mzMLfiles, grabwhat = "MS1") peakdata <- makeXcmsObjFlat(msnexpfilled) featmetrics <- extractChromMetrics(peakdata, recalcbetas = TRUE, verbosity = 2, ms1data = msdata$MS1) classlabels <- labelFeatsLasso(peakdata, ms1_data=msdata$MS1, verbosity=1)

Alternatively, if manual labeling is desired:

classlabels <- labelFeatsManual(peakdata, ms1_data=msdata$MS1, verbosity=1)

cleanedxcmsobj <- updateXcmsObjFeats(msnexpfilled, featmetrics, classlabels, likelihoodthreshold=0.5, verbosity=2) ```

Owner

  • Name: William
  • Login: wkumler
  • Kind: user
  • Location: University of Washington, Seattle, WA

Graduate student at the University of Washington

GitHub Events

Total
  • Issues event: 1
  • Issue comment event: 1
Last Year
  • Issues event: 1
  • Issue comment event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 127
  • Total Committers: 2
  • Avg Commits per committer: 63.5
  • Development Distribution Score (DDS): 0.024
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
wkumler w****r@u****u 124
William Kumler 4****r@u****m 3
Committer Domains (Top 20 + Academic)
uw.edu: 1

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 2,291 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
bioconductor.org: squallms

Speedy quality assurance via lasso labeling for LC-MS data

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 2,291 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 31.5%
Average: 42.2%
Downloads: 95.1%
Maintainers (1)
Last synced: 8 months ago