RaMS

R-based access to Mass-Spectrometry data

https://github.com/wkumler/rams

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 5 committers (20.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.6%) to scientific vocabulary

Keywords

mass-spectrometry-data r tidy-data

Keywords from Contributors

mass-spectrometry

Last synced: 10 months ago · JSON representation

Repository

R-based access to Mass-Spectrometry data

Basic Info

Host: GitHub
Owner: wkumler
License: other
Language: R
Default Branch: master
Homepage:
Size: 117 MB

Statistics

Stars: 24
Watchers: 6
Forks: 7
Open Issues: 5
Releases: 0

Topics

mass-spectrometry-data r tidy-data

Created over 5 years ago · Last pushed 11 months ago

Metadata Files

Readme Changelog License

R-based access to Mass-Spec data (RaMS)

Table of contents: Overview - Installation - Usage - File types - Contact

Overview

RaMS is a lightweight package that provides rapid and tidy access to mass-spectrometry data. This package is lightweight because it’s built from the ground up rather than relying on an extensive network of external libraries. No Rcpp, no Bioconductor, no long load times and strange startup warnings. Just XML parsing provided by xml2 and data handling provided by data.table. Access is rapid because an absolute minimum of data processing occurs. Unlike other packages, RaMS makes no assumptions about what you’d like to do with the data and is simply providing access to the encoded information in an intuitive and R-friendly way. Finally, the access is tidy in the philosophy of tidy data. Tidy data neatly resolves the ragged arrays that mass spectrometers produce and plays nicely with other tidy data packages.

RaMS quick-start poster from Metabolomics Society conference 2021

Installation

To install the stable version on CRAN:

r install.packages('RaMS')

To install the current development version:

r devtools::install_github("wkumler/RaMS", build_vignettes = TRUE)

Finally, load RaMS like every other package:

r library(RaMS)

Usage

There’s only one main function in RaMS: the aptly named grabMSdata. This function accepts the names of mass-spectrometry files as well as the data you’d like to extract (e.g. MS1, MS2, BPC, etc.) and produces a list of data tables. Each table is intuitively named within the list and formatted tidily:

``` r msdatadir <- system.file("extdata", package = "RaMS") msdatafiles <- list.files(msdata_dir, pattern = "mzML", full.names=TRUE)

msdata <- grabMSdata(files = msdatafiles[2:4], grabwhat = c("BPC", "MS1")) ```

Some additional examples can be found below, but a more thorough introduction can be found in the vignette or by typing vignette("Intro-to-RaMS", package = "RaMS") in the R console after installation.

BPC/TIC data:

Base peak chromatograms (BPCs) and total ion chromatograms (TICs) have three columns, making them super-simple to plot with either base R or the popular ggplot2 library:

r knitr::kable(head(msdata$BPC, 3))

| rt | int | filename | |---------:|---------:|:------------------| | 4.009000 | 11141859 | LB12HLAB.mzML.gz | | 4.024533 | 9982309 | LB12HLAB.mzML.gz | | 4.040133 | 10653922 | LB12HL_AB.mzML.gz |

r plot(msdata$BPC$rt, msdata$BPC$int, type = "l", ylab="Intensity")

r library(ggplot2) ggplot(msdata$BPC) + geom_line(aes(x = rt, y=int, color=filename)) + facet_wrap(~filename, scales = "free_y", ncol = 1) + labs(x="Retention time (min)", y="Intensity", color="File name: ") + theme(legend.position="top")

MS1 data:

MS¹ data includes an additional dimension, the m/z of each ion measured, and has multiple entries per retention time:

r knitr::kable(head(msdata$MS1, 3))

| rt | mz | int | filename | |------:|---------:|-----------:|:------------------| | 4.009 | 139.0503 | 1800550.12 | LB12HLAB.mzML.gz | | 4.009 | 148.0967 | 206310.81 | LB12HLAB.mzML.gz | | 4.009 | 136.0618 | 71907.15 | LB12HL_AB.mzML.gz |

This tidy format means that it plays nicely with other tidy data packages. Here, we use data.table and a few other tidyverse packages to compare a molecule’s ¹³C and ¹⁵N peak areas to that of the base peak, giving us some clue as to its molecular formula. Note also the use of the trapz function (available in v1.3.2+) to calculate the area of the peak given the retention time and intensity values.

``` r library(data.table) library(tidyverse)

M <- 118.0865 M13C <- M + 1.003355 M15N <- M + 0.997035

isodata <- imapdfr(lst(M, M13C, M15N), function(mass, isotope){ peakdata <- msdata$MS1[mz%between%pmppm(mass) & rt%between%c(7.6, 8.2)] cbind(peakdata, isotope) })

isodata %>% groupby(filename, isotope) %>% summarise(area=trapz(rt, int)) %>% pivotwider(namesfrom = isotope, valuesfrom = area) %>% mutate(ratio13C12C = M13C/M) %>% mutate(ratio15N14N = M15N/M) %>% select(filename, contains("ratio")) %>% pivotlonger(cols = contains("ratio"), namesto = "isotope") %>% groupby(isotope) %>% summarize(avgratio = mean(value), sdratio = sd(value), .groups="drop") %>% mutate(isotope=strextract(isotope, "(?<=).*(?=_)")) %>% knitr::kable() ```

| isotope | avgratio | sdratio | |:--------|----------:|----------:| | 13C | 0.0544072 | 0.0005925 | | 15N | 0.0033611 | 0.0001578 |

With natural abundances for ¹³C and ¹⁵N of 1.11% and 0.36%, respectively, we can conclude that this molecule likely has five carbons and a single nitrogen.

Of course, it’s always a good idea to plot the peaks and perform a manual check of data quality:

r ggplot(iso_data) + geom_line(aes(x=rt, y=int, color=filename)) + facet_wrap(~isotope, scales = "free_y", ncol = 1)

MS¹ data typically consists of many individual chromatograms, so RaMS provides a small function that can bin it into chromatograms based on m/z windows.

r msdata$MS1 %>% arrange(desc(int)) %>% mutate(mz_group=mz_group(mz, ppm=10, max_groups = 3)) %>% qplotMS1data(facet_col = "mz_group")

We also use the qplotMS1data function above, which wraps the typical ggplot call to avoid needing to type out ggplot() + geom_line(aes(x=rt, y=int, group=filename)) every time. Both the mz_group and qplotMS1data functions were added in RaMS version 1.3.2.

MS2 data:

DDA (fragmentation) data can also be extracted, allowing rapid and intuitive searches for fragments or neutral losses:

r msdata <- grabMSdata(files = msdata_files[1], grab_what = "MS2")

For example, we may be interested in the major fragments of a specific molecule:

r msdata$MS2[premz%between%pmppm(351.0817) & int>mean(int)] %>% plot(int~fragmz, type="h", data=., ylab="Intensity", xlab="Fragment m/z")

Or want to search for precursors with a specific neutral loss above a certain intensity:

r msdata$MS2[, neutral_loss:=premz-fragmz][int>1e4] %>% filter(neutral_loss%between%pmppm(126.1408, 5)) %>% head(3) %>% knitr::kable()

| rt | premz | fragmz | int | voltage | filename | neutralloss | |---:|---:|---:|---:|---:|:---|---:| | 47.27750 | 351.0817 | 224.9409 | 16333.23 | 40 | Blank129I1Lpos20240207-MS3.mzML.gz | 126.1408 | | 47.35267 | 351.0818 | 224.9410 | 27353.09 | 40 | Blank129I1Lpos20240207-MS3.mzML.gz | 126.1408 | | 47.42767 | 351.0818 | 224.9410 | 33843.92 | 40 | Blank129I1Lpos_20240207-MS3.mzML.gz | 126.1408 |

SRM/MRM data

Selected/multiple reaction monitoring files don’t have data stored in the typical MSn format but instead encode their values as chromatograms. To extract data in this format, include "chroms" in the grab_what argument:

r chromsdata <- grabMSdata(files = msdata_files[7], grab_what = "chroms", verbosity = 0)

which has individual reactions separated by the chrom_type column (and the associated index) with relevant target/fragment data:

r knitr::kable(head(chromsdata$chroms, 3))

| chromtype | chromindex | targetmz | productmz | rt | int | filename | |:-----------|:------------|----------:|-----------:|---------:|----:|:-----------------| | TIC | 0 | NA | NA | 2.000000 | 0 | wkchrom.mzML.gz | | TIC | 0 | NA | NA | 2.048077 | 0 | wkchrom.mzML.gz | | TIC | 0 | NA | NA | 2.096154 | 0 | wk_chrom.mzML.gz |

Minifying MS files

As of version 1.1.0, RaMS has functions that allow irrelevant data to be removed from the file to reduce file sizes. See the vignette for more details.

tmzML documents

Version 1.2.0 of RaMS introduced a new file type, the “transposed mzML” or “tmzML” file to resolve the large memory requirement when working with many files. See the vignette for more details, though note that I’ve largely deprecated this file type in favor of proper database solutions as in the speed & size comparison vignette.

File types

RaMS is currently limited to the modern mzML data format and the slightly older mzXML format. Tools to convert data from other formats are available through Proteowizard’s msconvert tool. Data can, however, be gzip compressed (file ending .gz) and this compression actually speeds up data retrieval significantly as well as reducing file sizes.

Currently, RaMS handles MS¹, MS², and MS³ data. This should be easy enough to expand in the future, but right now I haven’t observed a demonstrated need for higher fragmentation level data collection.

Additionally, note that files can be streamed from the internet directly if a URL is provided to grabMSdata, although this will usually take longer than reading a file from disk:

``` r

Not run:

Find a file with a web browser:

browseURL("https://www.ebi.ac.uk/metabolights/MTBLS703/files")

Copy link address by right-clicking "download" button:

sampleurl <- paste0("https://www.ebi.ac.uk/metabolights/ws/studies/MTBLS703/", "download/acefcd61-a634-4f35-9c3c-c572ade5acf3?file=", "FILES/161024SmpLB12HLAB_pos.mzXML")

msdata <- grabMSdata(sampleurl, grabwhat="everything", verbosity=2) msdata$metadata ```

For an analysis of how RaMS compares to other methods of MS data access and alternative file types, consider browsing the speed & size comparison vignette.

Contact

Feel free to submit questions, bugs, or feature requests on the GitHub Issues page.

README last built on 2025-07-29

Owner

Name: William
Login: wkumler
Kind: user
Location: University of Washington, Seattle, WA

Repositories: 2
Profile: https://github.com/wkumler

Graduate student at the University of Washington

GitHub Events

Total

Issues event: 5
Watch event: 2
Issue comment event: 5
Push event: 3
Pull request event: 2
Create event: 1

Last Year

Issues event: 5
Watch event: 2
Issue comment event: 5
Push event: 3
Pull request event: 2
Create event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 435
Total Committers: 5
Avg Commits per committer: 87.0
Development Distribution Score (DDS): 0.053

Past Year

Commits: 44
Committers: 2
Avg Commits per committer: 22.0
Development Distribution Score (DDS): 0.023

Top Committers

Name	Email	Commits
wkumler	w**r@u**u	412
William	4****r	11
Ricardo Cunha	6****a	6
ricardobachertdacunha	c**a@i**e	3
Ethan Bass	e**s@g**m	3

Committer Domains (Top 20 + Academic)

iuta.de: 1 uw.edu: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 29
Total pull requests: 25
Average time to close issues: 3 months
Average time to close pull requests: 12 days
Total issue authors: 8
Total pull request authors: 3
Average comments per issue: 2.03
Average comments per pull request: 1.28
Merged pull requests: 24
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 7
Average time to close issues: about 3 hours
Average time to close pull requests: about 8 hours
Issue authors: 3
Pull request authors: 2
Average comments per issue: 0.8
Average comments per pull request: 1.14
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

wkumler (20)
ethanbass (2)
OmarAshkar (2)
ricardobachertdacunha (1)
YonghuiDong (1)
RaynerQueiroz (1)
plyush1993 (1)
tentrillion (1)

Pull Request Authors

wkumler (19)
ethanbass (4)
ricardobachertdacunha (2)

Top Labels

Issue Labels

enhancement (2)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 622 last-month
Total docker downloads: 21,613

Total dependent packages: 1
Total dependent repositories: 1
Total versions: 6
Total maintainers: 1

cran.r-project.org: RaMS

R Access to Mass-Spec Data

Homepage: https://github.com/wkumler/RaMS
Documentation: http://cran.r-project.org/web/packages/RaMS/RaMS.pdf
License: MIT + file LICENSE
Latest release: 1.4.3
published over 1 year ago

Versions: 6
Dependent Packages: 1
Dependent Repositories: 1
Downloads: 622 Last month
Docker Downloads: 21,613

Rankings

Forks count: 9.7%

Stargazers count: 13.0%

Average: 17.1%

Dependent packages count: 17.6%

Downloads: 20.9%

Dependent repos count: 24.3%

Maintainers (1)

wkumler@uw.edu

Last synced: 10 months ago

Dependencies

DESCRIPTION cran

base64enc * imports
data.table * imports
utils * imports
xml2 * imports
DBI * suggests
RSQLite * suggests
dplyr * suggests
ggplot2 * suggests
knitr * suggests
openxlsx * suggests
plotly * suggests
reticulate * suggests
rmarkdown * suggests
testthat * suggests
tidyverse * suggests

.github/workflows/r-check-cran.yml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/upload-artifact main composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite

RaMS

Science Score: 23.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

R-based access to Mass-Spec data (RaMS)

Overview

Installation

Usage

BPC/TIC data:

MS1 data:

MS2 data:

SRM/MRM data

Minifying MS files

tmzML documents

File types

Not run:

Find a file with a web browser:

Copy link address by right-clicking "download" button:

Contact

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: RaMS

Rankings

Maintainers (1)

Dependencies