boxsers

Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).

https://github.com/alebrun-108/boxsers

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.5%) to scientific vocabulary

Keywords

baseline-correction chemometrics cnn-keras data-augmentation deep-learning machine-learning pca-analysis preprocessing python raman-spectroscopy sers unsupervised-learning vibrational-spectroscopy
Last synced: 4 months ago · JSON representation ·

Repository

Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).

Basic Info
  • Host: GitHub
  • Owner: ALebrun-108
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 20 MB
Statistics
  • Stars: 66
  • Watchers: 2
  • Forks: 15
  • Open Issues: 1
  • Releases: 8
Topics
baseline-correction chemometrics cnn-keras data-augmentation deep-learning machine-learning pca-analysis preprocessing python raman-spectroscopy sers unsupervised-learning vibrational-spectroscopy
Created over 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

test image size

DOI License: MIT made-with-python Maintenance

BoxSERS, a powerful and ready-to-use python package providing several tools for the analysis of vibrational spectra (Raman, FTIR, SERS, etc.), including features for data augmentation, dimensional reduction, spectral correction and both supervised and unsupervised machine learning.

General info on the repository

This GitHub repository includes the following elements :

  • BoxSERS package : Complete and ready-to-use python library includind for the application of methods designed and adapted for vibrational spectra(Raman, SERS, etc.)

  • Jupyter notebooks : Typical examples of BoxSERS package usage.

  • Raw data : Database of SERS bile acid spectra that were used (Raw and Preprocess form) in the article submitted by Lebrun and Boudreau (2022) (https://doi.org/10.1177/00037028221077119) can be used as a starting point to start using the BoxSERS package.

Below, on this page, there is also the package's installation guideline and an overview of its main functions.

Table of contents

Getting Started

It is advisable to start with the Jupyter notebook that present the complete procedure and describe each step in detail while adding information to facilitate understanding.

This project doesn't cover database conception yet and requires user to have completed this step before using this project. Please take a look at the following Python modules from other users, which allow you to import spectra in various formats:

  • spe2py Princeton Instruments LightField (SPE 3.x) file
  • pyspectra .spc and .dx file format

BoxSERS Installation

From PypY bash pip install boxsers

From Github bash pip install git+https://github.com/ALebrun-108/BoxSERS.git

Requirements

Listed below are the main modules needed to operate the codes:

  • Sklearn
  • Scipy
  • Numpy
  • Pandas
  • Matplotlib
  • Tensor flow

To use GPU computing units, it may be necessary to import cudnn and cudatoolkit packages using conda or pip.

Label information

The labels associated with the spectra can be either integer values (single column) or binary values (multiple columns).

Example of labels for three classes that correspond to three bile acids:

| Bile acid | Integer label (1 column) | Binary label (3 columns) | |------------------ |:-------------: |:------------: | | Cholic acid | 0 | [1 0 0] | | Lithocholic acid | 1 | [0 1 0] | | Deoxycholic acid | 2 | [0 0 1] |

Included Features

This section includes the detailed description (utility, parameters, ...) for each function and class contained in the BoxSERS package


Module misc_tools

This module provides functions for a variety of utilities.

  • data_split : Randomly splits an initial set of spectra into two new subsets named in this function: subset A and subset B.

  • ramanshift_converter : Converts wavelength [nm] to Raman shifts [cm-1].

  • wavelength_converter : Convert Raman shifts [cm-1] to wavelengths [nm].

  • load_rruff : Export a subset of Raman spectra from the RRUFF database in the form of three related lists containing Raman shifts, intensities and mineral names.

Module visual_tools

This module provides different tools to visualize vibrational spectra quickly.

  • spectro_plot : Returns a plot with the selected spectrum(s)

  • random_plot : Plot a number of randomly selected spectra from a set of spectra.

  • distribution_plot : Return a bar plot that represents the distributions of spectra for each classes in a given set of spectra

```python

Code example:

import pandas as pd import numpy as np from sklearn.preprocessing import StandardScaler, LabelEncoder, LabelBinarizer

from boxsers.misctools import datasplit from boxsers.visualtools import distributionplot

df = pd.readhdf('Bileacids27072020.h5', key='df') # Load bile acids dataframe wn = np.load('Ramanshift2707_2020.npy') # Load Wavenumber (Raman shift) classnames = df['Classes'].unique()

display(df) # Prints a detailed overview of the imported dataframe "df"

Features extraction: Exports dataframe spectra as a numpy array (value type = float64).

sp = df.iloc[:, 1:].to_numpy()

Labels extraction: Export dataframe classes into a numpy array of string values.

label = df.loc[:, 'Classes'].values

String to integer labels conversion:

labelencoder = LabelEncoder() # Creating instance of LabelEncoder labint = labelencoder.fittransform(label) # 0, 3, 2, ...

String to binary labels conversion:

labelbinarizer = LabelBinarizer() # Creating instance of LabelBinarizer labbinary = labelbinarizer.fittransform(label) # [1 0 0 0] [0 0 0 1] [0 1 0 0], ...

Train/Validation/Test sets splitting

(sptrain, spb, labtrain, labb) = datasplit(sp, label, bsize=0.30, rdmste=None, printreport=False) (spval, sptest, labval, labtest) = datasplit(spb, labb, bsize=0.50, rdmste=None, printreport=False)

Visualization of spectrum distributions

distributionplot(labtrain, classnames=classnames, avgline=True, title='Train set distribution') distributionplot(labval, classnames=classnames, avgline=True, title='Validation set distribution') distributionplot(labtest, classnames=classnames, avgline=True, title='Test set distribution') ``` test image size

Module preprocessing

This module provides functions to preprocess vibrational spectra. These features improve spectrum quality and can improve performance for machine learning applications.

  • alsbaselinecor : Subtracts the baseline signal from the spectrum(s) using an Asymmetric Least Squares estimation.

  • spectral_normalization : Normalizes the spectrum(s) using one of the available norms in this function.

  • savgol_smoothing : Smoothes the spectrum(s) using a Savitzky-Golay polynomial filter.

  • cosmic_filter : Applies a median filter to the spectrum(s) to remove cosmic rays.

  • spectral_cut : Subtracts or sets to zero a delimited spectral region of the spectrum(s).

  • spline_interpolation : Performs a one-dimensional interpolation spline on the spectra to reproduce them with a new x-axis.

```python

Code example:

import numpy as np from boxsers.preprocessing import savgolsmoothing, alsbaselinecor, spectralnormalization from boxsers.visualtools import spectroplot

Two spectrum are selected randomly

randomindex = np.random.randint(0, sp.shape[0]-1, 2) spsample = sp[randomindex] # selected spectra labela = label[randomindex[0]] # class corresponding to the first spectrum labelb = label[random_index[1]] # class corresponding to the second spectrum

1) Subtracts the baseline signal from the spectra

spbc = alsbaselinecor(spsample, lam=1e4, p=0.001, niter=10, return_baseline=False)

2) Smoothes the spectra

spbcsvg = savgolsmoothing(spbc, window_length=15, p=3, degree=0)

3) Normalizes the spectra

spbcsvgnorm = spectralnormalization(spbcsvg, norm='minmax')

Graphs visualization :

legend=(labela, labelb) spectroplot(wn, spsample, title='Raw spectra', legend=legend') spectroplot(wn, spbcsvgnorm[0], spbcsvgnorm[1], yspace=1, title='Preprocessed spectra', legend=legend) ``` test image size

```python

darktheme = True/False enables two different display options!

spectroplot(wn, sp, title='Raman spectrum of L-Tyrosine', darktheme=False) spectroplot(wn, sp, title='Raman spectrum of L-Tyrosine', darktheme=True)
``` drawing

Module data_augmentation

This module provides funtions to generate new spectra by adding different variations to existing spectra.

  • aug_mixup : Randomly generates new spectra by mixing together several spectra with a Dirichlet probability distribution.

  • aug_noise : Randomly generates new spectra with Gaussian noise added.

  • aug_multiplier : Randomly generates new spectra with multiplicative factors applied.

  • aug_offset : Randomly generates new spectra shifted in intensity.

  • aug_xshift : Randomly generates new spectra shifted in wavelength.

  • aug_linslope : Randomly generates new spectra with additional linear slopes

Module dimension_reduction

This module provides different techniques to perform dimensionality reduction of vibrational spectra.

  • SpectroPCA : Principal Component Analysis (PCA) model object.

python pca_model = SpectroPCA(n_comp=10) pca_model.fit_model(sp) pca_model.scatter_plot(sp, label, component_x=1, component_y=2, fontsize=13, class_names=['Mol. A', 'Mol. B', 'Mol. C']) drawing

Module clustering

This module provides unsupervised learning models for vibrational spectra cluster analysis.

  • SpectroKmeans : K-Means clustering model.

  • SpectroGmixture : Gaussian mixture probability distribution model.

Module classification

This module provides supervised learning models for vibrational spectra classification.

  • SpectroRF : Random forest classification model.

  • SpectroSVM : Support Vector Machine classification model.

  • SpectroLDA : Linear Discriminant Analysis classification model

Module neural_networks

This module provides neural network model specifically designed for the classification of vibrational spectra.

  • SpectroCNN : Convolutional Neural Network (CNN) for vibrational spectra classification.

Module validation_metrics

This module provides different tools to evaluate the quality of a model’s predictions.

  • cf_matrix : Returns a confusion matrix (built with scikit-learn) generated on a given set of spectra.

  • clf_report : Returns a classification report generated from a given set of spectra

Owner

  • Name: Alexis Lebrun
  • Login: ALebrun-108
  • Kind: user
  • Location: Québec city
  • Company: @FLClab

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Lebrun
  given-names: Alexis
  orcid: "https://orcid.org/0000-0002-7616-2087"
title: "BoxSERS"
version: 1.3.0
doi: 10.5281/zenodo.5557905
date-released: 2021-10-08
url: "https://github.com/ALebrun-108/BoxSERS"

GitHub Events

Total
  • Issues event: 1
  • Watch event: 11
  • Issue comment event: 1
Last Year
  • Issues event: 1
  • Watch event: 11
  • Issue comment event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 290
  • Total Committers: 2
  • Avg Commits per committer: 145.0
  • Development Distribution Score (DDS): 0.045
Top Committers
Name Email Commits
ALebrun-108 5****8@u****m 277
Alexis a****1@u****a 13
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: 15 days
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 3.67
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • RGD2 (1)
  • WentongZhou (1)
  • srvparmar (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 126 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 21
  • Total maintainers: 1
pypi.org: boxsers

Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).

  • Versions: 21
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 126 Last month
Rankings
Forks count: 9.6%
Dependent packages count: 10.1%
Stargazers count: 10.3%
Average: 13.1%
Downloads: 14.1%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 4 months ago

Dependencies

setup.py pypi
  • matplotlib *
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
  • seaborn *
  • tensorflow *