splineomics

Timeseries analysis of -omics data can be carried out by fitting spline curves to the data and using limma for hypothesis testing. The obtained hits can be clustered based on the spline shape and then used for an overrepresentation (ORA) analysis.The R package SplineOmics streamlines this whole process and generates reports

https://github.com/csbg/splineomics

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
✓
Institutional organization owner
Organization csbg has institutional domain (www.plus.ac.at)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.1%) to scientific vocabulary

Keywords

cluster limma omics splines timeseries

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: csbg
License: mit
Language: R
Default Branch: main
Homepage: https://csbg.github.io/SplineOmics/
Size: 68.9 MB

Statistics

Stars: 5
Watchers: 3
Forks: 0
Open Issues: 0
Releases: 0

Topics

cluster limma omics splines timeseries

Created almost 2 years ago · Last pushed 6 months ago

Metadata Files

Readme License Code of conduct Citation

README.Rmd

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# SplineOmics

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE) ![Maintained? Yes](https://img.shields.io/badge/Maintained%3F-yes-brightgreen.svg) [![R-CMD-check](https://github.com/csbg/SplineOmics/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/csbg/SplineOmics/actions/workflows/R-CMD-check.yaml) [![Docker](https://img.shields.io/badge/docker-pull-blue)](https://hub.docker.com/r/thomasrauter/splineomics) ![Platforms](https://img.shields.io/badge/platforms-all-brightgreen)



The R package `SplineOmics` finds the significant features (hits) of time-series -omics data by using splines and `limma` for hypothesis testing. It then clusters the hits based on the spline shape while showing all results in summary HTML reports.

The graphical abstract below shows the full workflow streamlined by `SplineOmics`:

![Graphical Abstract of SplineOmics Workflow](man/figures/SplineOmics_graphical_abstract.png)

## Table of Contents

-   [📘 Introduction](#-introduction)
-   [🔧 Installation](#-installation)
    -   [🐳 Docker Container](#-docker-container)
-   [▶️ Usage](#-usage)
    -   [🎓 Tutorial](#-tutorial)
    -   [📋 Details](#-details)
    -   [🧬 RNA-seq and Glycan Data](#-rna-seq-and-glycan-data)
-   [📦 Dependencies](#-dependencies)
-   [📚 Further Reading](#-further-reading)
-   [❓ Getting Help](#-getting-help)
-   [💬 Feedback](#-feedback)
-   [📜 License](#-license)
-   [🎓 Citation](#-citation)
-   [🌟 Contributors](#-contributors)
-   [🙏 Acknowledgements](#-acknowledgements)

## 📘 Introduction

Welcome to `SplineOmics`, an R package designed to streamline the analysis of -omics time-series data, followed by automated HTML report generation.

### Is the SplineOmics package of use for me?

If you have -omics data over time, the package will help you to run `limma` with splines, perform the clustering, run ORA and show result plots in HTML reports. Any time-series data that is a valid input to the `limma` package is also a valid input to the `SplineOmics` package (such as transcriptomics, proteomics, phosphoproteomics, metabolomics, glycan fractional abundances, etc.).

### What do I need precisely?

1.  **Data**: A data matrix where each row is a feature (e.g., protein, metabolite, etc.) and each column is a sample taken at a specific time. The data must have no NA values, should have normally distributed features and no dependence between the samples.

2.  **Meta**: A table with metadata on the columns/samples of the data matrix (e.g., batch, time point, etc.)

3.  **Annotation** (optional): A table with identifiers on the rows/features of the data matrix (e.g., gene and protein name).

### Capabilities

With `SplineOmics`, you can:

-   **Automatically perform exploratory data analysis:**

    The `explore_data()` function generates an HTML report, containing various plots, such as density, PCA, and correlation heatmap plots ([example report](https://csbg.github.io/SplineOmics_html_reports/explore_data_PTX.html)).

-   **Perform limma spline analysis:**

    Use the `run_limma_splines()` function to perform the `limma` analysis with splines once the optimal hyperparameters are identified ([example report](https://csbg.github.io/SplineOmics_html_reports/create_limma_report_PTX.html)).

-   **Find jumps and drops in the timecourse:**

    Use the `find_pvc()` function for that ([example report](https://csbg.github.io/SplineOmics_html_reports/pvc_report_PPTX.html)).

-   **Cluster significant features:**

    Cluster the significant features (hits) identified in the spline analysis with the `cluster_hits()` function ([example report](https://csbg.github.io/SplineOmics_html_reports/report_clustered_hits_PTX.html)).

-   **Run ORA with clustered hits:**

    Perform over-representation analysis (ORA) using the clustered hits with the `run_ora()` function ([example report](https://csbg.github.io/SplineOmics_html_reports/run_ora_report.html)).

## 🔧 Installation

Follow the steps below to install the `SplineOmics` package from the GitHub repository into your R environment.

> **Note**
Carefully read the terminal messages of the installations. It can happen that 
installations fail due to missing dependencies, which you then must resolve
using other commands not necessarily written down here.

#### Prerequisites

-   Ensure **R** is installed on your system. If not, download and install it from [CRAN](https://cran.r-project.org/).
-   **RStudio** is recommended for a more user-friendly experience with R. Download and install RStudio from [posit.co](https://posit.co/download/rstudio-desktop/).

#### Installation Steps

> **Note for Windows Users:**

During the installation on Windows, you might see a message indicating that Rtools is not installed, which is typically required for compiling R packages from source. However, for this installation, Rtools is not necessary, and you can safely ignore this message.

1.  **Open RStudio** or your R console in a new or existing project folder.

2. Create a **virtual environment** with `renv`

``` r
renv::init()
```

3.  **Install `BiocManager`** for Bioconductor dependencies (if not already installed)

``` r
install.packages("BiocManager")
```

4. Install required Bioconductor packages

``` r
BiocManager::install(
  c("ComplexHeatmap", "limma", "variancePartition")
  # force = TRUE   # when encountering issues
)
```

5.  Install `remotes` for GitHub package installation

``` r
install.packages("remotes")
```

5.  **Install** the **`SplineOmics`** package from GitHub and all its non-Bioconductor dependencies, using `remotes`

``` r
remotes::install_github(
  "csbg/SplineOmics",   # GitHub repository
  ref = "v0.3.1",       # Specify the tag to install, e.g. v0.3.1 (check GitHub for the newest version!)
  dependencies = TRUE,  # Install all dependencies
  upgrade = "always"    # Always upgrade dependencies
  # force = TRUE        # when encountering issues
)
```

6.  **Verify** the **installation** of the `SplineOmics` package

``` r
# Verify the installation of the SplineOmics package
if ("SplineOmics" %in% rownames(installed.packages())) {
  message("SplineOmics was installed successfully.")
} else {
  message("SplineOmics installation failed. Please check for errors during installation.")
}
```

📌 **Note on documentation:**  
The website only contains the documentation for the most recent `SplineOmics`
version (also shown on the website in the top right corner). To get the 
documentation of older versions, run:

``` r
help(package="SplineOmics")
```

to get the documentation of your currently installed version.


#### Troubleshooting

If you encounter errors related to dependencies or package versions during installation, try updating your R and RStudio to the latest versions and repeat the installation steps.

For issues specifically related to the `SplineOmics` package, check the [Issues section](https://github.com/csbg/SplineOmics/issues) of the GitHub repository for similar problems or to post a new issue.

### 🐳 Docker Container

Alternatively, you can run your analysis in a `Docker` container. The underlying `Docker` image encapsulates the `SplineOmics` package together with the necessary environment and dependencies. This ensures higher levels of reproducibility because the analysis is carried out in a consistent environment, independent of the operating system and its custom configurations.

Please note that you must have the `Docker Engine` installed on your machine. For instructions on how to install it, consult the official [Docker Engine installation guide](https://docs.docker.com/engine/install/).

More information about `Docker` containers can be found on the [official Docker page](https://www.docker.com/resources/what-container/).

For instructions on downloading the image of the `SplineOmics` package and running the container, please refer to the [Docker instructions](https://csbg.github.io/SplineOmics/articles/Docker-instructions.html).

#### Troubleshooting

If you face "permission denied" issues on Linux distributions, check this [vignette](https://csbg.github.io/SplineOmics/articles/Docker_permission_denied.html).

## ▶️ Usage

### 🎓 Tutorial {#tutorial}

[This tutorial](https://csbg.github.io/SplineOmics/articles/get-started.html) covers a real CHO cell time-series proteomics example from start to end.

### 📋 Details {#details}

A detailed description of all arguments and outputs of all the functions in the package (exported and internal functions) can be found [here](https://csbg.github.io/SplineOmics/reference/).

#### Design `limma` design formula

A quick guide on how to design a `limma` design formula can be found [here](https://csbg.github.io/SplineOmics/articles/design_limma_design_formula.html).

An explanation of the three different `limma` results is [here](https://csbg.github.io/SplineOmics/articles/limma_result_categories.html).

### 🧬 RNA-seq and Glycan Data {#rna-seq-and-glycan-data}

#### RNA-seq data

Transcriptomics data must be preprocessed for `limma`. You need to provide an appropriate object, such as a `voom` object, in the `rna_seq_data` argument of the `SplineOmics` object (see [documentation](https://csbg.github.io/SplineOmics/reference/create_splineomics.html)). Along with this, the normalized matrix (e.g., the `$E` slot of the `voom` object) must be passed to the `data` argument. This allows flexibility in preprocessing; you can use any method you prefer as long as the final object and matrix are compatible with limma. One way to preprocess your RNA-seq data is by using the `preprocess_rna_seq_data()` function included in the `SplineOmics` package (see [documentation](https://csbg.github.io/SplineOmics/reference/preprocess_rna_seq_data.html)).

[Here](https://csbg.github.io/SplineOmics/articles/RNA-seq%20analysis.html) you can find an example analysis of RNA-seq data with the SplineOmics package.

#### Glycan fractional abundance data

The glycan fractional abundance data matrix, where each row represents a type of glycan and the columns correspond to timepoints, must be transformed before analysis. This preprocessing step is essential due to the compositional nature of the data. In compositional data, an increase in the abundance of one component (glycan) necessarily results in a decrease in others, introducing a dependency among the variables that can bias the analysis. One way to address this issue is by applying the Centered Log Ratio (CLR) transformation to the data with the clr function from the compositions package:

``` r
library(compositions)
clr_transformed_data <- clr(data_matrix)  # use as SplineOmics input
```

The results from clr transformed data can be harder to understand and interpret however. If you prefer ease of interpretation and are fine that the results contain some artifacts due to the compositional nature of the data, log2 transform your data instead and use that as input for the `SplineOmics` package.

``` r
log2_transformed_data <- log2(data_matrix)  # use as SplineOmics input
```

### R Version

-   Recommended: R 4.3.3 or higher


## ❓ Getting Help

If you encounter a bug or have a suggestion for improving the `SplineOmics` package, we encourage you to [open an issue](https://github.com/csbg/SplineOmics/issues) on our GitHub repository. Before opening a new issue, please check to see if your question or bug has already been reported by another user. This helps avoid duplicate reports and ensures that we can address problems efficiently.

For more detailed questions, discussions, or contributions regarding the package’s use and development, please refer to the [GitHub Discussions](https://github.com/csbg/SplineOmics/discussions) page for `SplineOmics`.

## 💬 Feedback

We appreciate your feedback! Besides raising issues, you can provide feedback in the following ways:

-   **Direct Email**: Send your feedback directly to [Thomas Rauter](mailto:thomas.rauter@plus.ac.at).

-   **Anonymous Feedback**: Use [this Google Form](https://forms.gle/jocMXSxLf3GrGBdT9) to provide anonymous feedback by answering questions.

Your feedback helps us improve the project and address any issues you may encounter.

## 📜 License

This package is licensed under the MIT License: [LICENSE](./LICENSE)

© 2024 Thomas Rauter. All rights reserved.

## 🎓 Citation

The `SplineOmics` package is currently not published in a peer-reviewed scientific journal or similar outlet. However, if this package helped you in your work, consider citing this GitHub repository.

To cite this package, you can use the citation information provided in the [`CITATION.cff`](./CITATION.cff) file.

You can also generate a citation in various formats using the `CITATION.cff` file by visiting the top right of this repo and clicking on the “Cite this repository” button.

Also, if you like the package, consider giving the GitHub repository a star. Your support helps us in the continued development and improvement of `SplineOmics`. Thank you for using our package!

## 🌟 Contributors

-   [Thomas-Rauter](https://github.com/Thomas-Rauter) - 🚀 Wrote the package, developed the approach together with VSchaepertoens under guidance from nfortelny and skafdasschaf.
-   [nfortelny](https://github.com/nfortelny) - 🧠 Principal Investigator, provided guidance and support for the overall approach.
-   [skafdasschaf](https://github.com/skafdasschaf) - 🔧 Helped reviewing code, delivered improvement suggestions and scientific guidance to develop the approach.
-   [VSchaepertoens](https://github.com/VSchaepertoens) - ✨ Developed one internal plotting function, as well as some code for the exploratory data analysis plots, and the overall approach together with Thomas-Rauter.

## 🙏 Acknowledgements

This work was carried out in the context of the DigiTherapeutX project, which was funded by the Austrian Science Fund (FWF). The research was conducted under the supervision of Prof. Nikolaus Fortelny, who leads the Computational Systems Biology working group at the Paris Lodron University of Salzburg, Austria. You can find more information about Prof. Fortelny's research group [here](https://www.plus.ac.at/biowissenschaften/der-fachbereich/arbeitsgruppen/fortelny/).

Owner

Name: Computational Systems Biology Group
Login: csbg
Kind: organization
Location: Salzburg, AT

Website: www.plus.ac.at/fortelny
Repositories: 1
Profile: https://github.com/csbg

The Fortelny Lab at the University of Salzburg

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the following work."
title: "SplineOmics: A bioinformatics package for time-series omics data analysis"
version: 1.0.0
repository-code: https://github.com/csbg/SplineOmics
license: MIT
type: software

authors:
  - family-names: Rauter
    given-names: Thomas
    orcid: "0000-0002-7350-4045"
    contribution: Wrote the package, developed the approach together with VSchaepertoens under guidance from nfortelny and skafdasschaf.
  - family-names: Fortelny
    given-names: Nikolaus
    orcid: "0000-0003-4025-9968"
    contribution: Principal Investigator, provided guidance and support for the overall approach.
  - family-names: Skafdasschaf
    given-names: (insert-given-name)
    contribution: Helped reviewing code, delivered improvement suggestions and scientific guidance to develop the approach.
  - family-names: Schaepertoens
    given-names: V.
    contribution: Developed one internal plotting function, some exploratory data analysis plots, and the overall approach together with Thomas-Rauter.

date-released: 2024-09-23

GitHub Events

Total

Watch event: 7
Delete event: 1
Push event: 63
Create event: 6

Last Year

Watch event: 7
Delete event: 1
Push event: 63
Create event: 6

Dependencies

DESCRIPTION cran

R >= 4.1.0 depends
ComplexHeatmap >= 2.18.0 imports
RColorBrewer >= 1.1 imports
Rtsne >= 0.17 imports
base64enc >= 0.1 imports
circlize >= 0.4.16 imports
cluster >= 2.1.6 imports
clusterProfiler >= 4.10.1 imports
data.table >= 1.15.4 imports
dendextend >= 1.17.1 imports
dplyr >= 1.1.4 imports
fs >= 1.6.4 imports
ggplot2 >= 3.5.1 imports
ggrepel >= 0.9.5 imports
here >= 1.0.1 imports
htmltools >= 0.5.8.1 imports
kableExtra >= 1.4.0 imports
knitr >= 1.47 imports
limma >= 3.58.1 imports
magrittr >= 2.0.3 imports
openxlsx >= 4.2.5.2 imports
patchwork >= 1.2.0 imports
pheatmap >= 1.0.12 imports
progress >= 1.2.3 imports
purrr >= 1.0.2 imports
ragg >= 1.3.2 imports
readr >= 2.1.5 imports
reshape2 >= 1.4.4 imports
rlang >= 1.1.4 imports
rstudioapi >= 0.16.0 imports
scales >= 1.3.0 imports
stringr >= 1.5.1 imports
tibble >= 3.2.1 imports
tidyr >= 1.3.1 imports
viridis >= 0.6.5 imports
conflicted >= 1.2.0 suggests
devtools * suggests
rmarkdown >= 2.7 suggests
roxygen2 * suggests
testthat >= 3.0.0 suggests

docker/Dockerfile docker

rocker/rstudio 4.3.3 build

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science