biosignature-detection-with-py-gc-ms-data-using-machine-learning
https://github.com/ghystad/biosignature-detection-with-py-gc-ms-data-using-machine-learning
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: ghystad
- License: gpl-3.0
- Language: R
- Default Branch: main
- Size: 80.1 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Biosignature Detection with Py GC-MS Data using Machine Learning
The R-scripts are created for the paper
Detecting Biosignatures in Complex Molecular Mixtures from pyrolysis Gas Chromatography Mass Spectrometry Data using Machine Learning
Grethe Hystad1, H. James Cleaves II 2,3,4, Collin A. Garmon1,5*, Michael L. Wong6,7, Anirudh Prabhu6, George D. Cody6, and Robert M. Hazen6
1. Department of Mathematics and Statistics, Purdue University Northwest, Hammond, IN, 46323, USA.
2. Department of Chemistry, Howard University, Washington, D.C. 20059, USA.
3. Earth Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan.
4. Blue Marble Space Institute for Science, Seattle, WA 98104, USA.
5. Current Address: Department of Mathematical Sciences, Purdue University Fort Wayne, Fort Wayne, IN, 46805, USA.*
6. Earth and Planets Laboratory, Carnegie Science, Washington, DC 20015, USA.
7. NHFP Sagan Fellow, NASA Hubble Fellowship Program, Space Telescope Science Institute, Baltimore, MD 21218, USA.
Introduction
Three-dimensional (scan number /mass-to-charge ratio/intensity) data from biotic and abiotic samples are obtained by pyrolysis-gas chromatography mass spectrometry. The R-scripts created are for preprocessing these data and to use machine learning to predict whether a sample is biotic or abiotic. Nested resampling is used to obtain an estimate for the prediction performance of the model. The pattern of features that are "important" in distinguishing the abiotic from the biotic species are then determined and shown graphically. The following machine learning classification methods are used: random forest, logistic regression with elastic net penalty, support vector machines (SVM), and eXtreme Gradient Boosting (XGBoost). The Benjamini-Hochberg procedure is used for multiple hypothesis testing.
Data
The 150 pyr-GC-MS samples can be found at https://osf.io/8sywr/?view_only=7d450ad4f9af48dfab5e194d041c0c13 with reference:
Cleaves, H. J. (2023). A robust molecular biosignature based on machine learning (Version 1) [Dataset]. OSF. DOI 10.17605/OSF.IO/EMBH8
The 16 new samples are provided in the folder named "HystadEtAl16newFiles.zip". The other 134 samples are provided in the folder named "Cleavesetal.pyrGCMSData.zip".
The outputs of the R Markdown files are displayed on:
https://ghystad.github.io/Biosignature-Detection-with-Py-GC-MS-Data-using-Machine-Learning/
The outputs of the R Markdown files are also displayed on RPubs:
https://rpubs.com/ghystad/nestedresamplingXGBoost
https://rpubs.com/ghystad/nestedresamplingsupportvectormachines
https://rpubs.com/ghystad/nestedresamplingrandom_forest
https://rpubs.com/ghystad/nestedresamplingelastic_net
https://rpubs.com/ghystad/MonteCarlosimulationsrandomforest
https://rpubs.com/ghystad/igraphsrandomforest
https://rpubs.com/ghystad/graphsrandomforest
https://rpubs.com/ghystad/graphselasticnetandBenjamini_Hochberg
https://rpubs.com/ghystad/graphselasticnet
https://rpubs.com/ghystad/graphsBenjaminiHochberg
https://rpubs.com/ghystad/correlationgraphsrandom_forest
https://rpubs.com/ghystad/chromatograms
Licence
The application is released under GNU GPL version 3 license.
Author of the R-scripts
Grethe Hystad
Sessioninfo
R version 4.3.3 (2024-02-29 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22621)
Matrix products: default
locale: [1] LCCOLLATE=EnglishUnited States.utf8 LCCTYPE=EnglishUnited States.utf8 LCMONETARY=EnglishUnited States.utf8 [4] LCNUMERIC=C LCTIME=English_United States.utf8
time zone: America/Chicago tzcode source: internal
attached base packages: [1] stats graphics grDevices datasets utils methods base
loaded via a namespace (and not attached):
[1] compiler4.3.3 fastmap1.1.1 cli3.6.2 htmltools0.5.8.1 tools4.3.3 yaml2.3.8 rmarkdown2.26
[8] knitr1.46 xfun0.43 digest0.6.35 rlang1.1.3 renv1.0.7 evaluate_0.23
Owner
- Login: ghystad
- Kind: user
- Repositories: 1
- Profile: https://github.com/ghystad
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.1.2
title: >-
Using Machine Learning for Biosignature Detection with
Pyrolysis Gas Chromatography-Mass Spectrometry Data
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Grethe
family-names: Hystad
name-particle: Grethe
email: ghystad@pnw.edu
affiliation: Purdue University Northwest
orcid: 'https://orcid.org/0000-0001-9572-1019'
repository-code: >-
https://github.com/ghystad/Biosignature-Detection-with-Py-GC-MS-Data-using-Machine-Learning.git
license: GPL-3.0
version: '1.1.2'
date-released: '2025-06-07'
GitHub Events
Total
- Release event: 3
- Push event: 15
- Create event: 5
Last Year
- Release event: 3
- Push event: 15
- Create event: 5