cycadas

https://github.com/dii-lih-luxembourg/cycadas

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: DII-LIH-Luxembourg
License: mit
Language: R
Default Branch: main
Size: 3.03 MB

Statistics

Stars: 0
Watchers: 1
Forks: 2
Open Issues: 4
Releases: 2

Created almost 3 years ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

Cytometry Cluster Annotation and Differential Abundance Suite

Efficient and reproducible cytometry data

Aims:

• facilitating the process of cluster annotation while reducing user bias • improving reproducibility

Key features:

• defining the threshold of positive/negative marker expression

• interactive inspection of cluster phenotypes

• automatic merging of populations

• differential abundance analysis

Installation Instructions

``` r library(devtools)

Install all required packages

devtools::install_github("DII-LIH-Luxembourg/cycadas", dependencies = TRUE)

library(cycadas)

start the cycadas shiny app

cycadas() ```

Demo dataset

To enable tool exploration, we provide the demo dataset that can be loaded (Load tab → Demo Data) either as cluster expression data only (Load Cluster Expression Demo Data, allowing the user to create the annotation) or as annotated data (Load Annotated Demo Data which include the annotation tree).

This demo dataset is generated from the publicly available mass cytometry data of patients with idiopathic Parkinson's disease and healthy controls (Capelle, C.M. et al., Nat Commun, 2023) that were clustered with GigaSOM to generate 1600 clusters.

Loading SingleCellExperiment Data (CATALYST)

This is optional. If you wish to load data clustered with CATALYST or other Tools using the Single Cell Format, please install:

``` r if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("CATALYST") BiocManager::install("SingleCellExperiment") ```

Data input, Single Cell Format

```{r}

CATALYST Workflow using Single Cell Experiment data object

Preprocessing ...

Cluster with CATALYST

sce <- cluster(sce, features = "type", xdim = 10, ydim = 10, maxK = 20, verbose = FALSE, seed = 1)

Save the object as .rds file

saveRDS(sce, "my_sce.rds")

Load into CyCadas

Annotate the desired (meta-)cluster levels

use the integrated merge function within Cycadas

save the object

load it back into your workflow

sce <- readRDS("Annotated_sce.rds")

continue downstream analysis

...

```

Data Input from FlowSOM

Median Expression and Cluster Frequencies:

``` r

within your clustering workflow create sample_ids according to the metadata files:

sampleids <- rep(metadata$sampleid, fsApply(fcs, nrow))

library(FlowSOM) fsom <- ReadInput(fcs, transform = FALSE, scale = FALSE)

set.seed(42) som <- BuildSOM(fsom, colsToUse = lineage_markers, xdim=20, ydim=20, rlen=40)

expr_median <- som$map$medianValues

Calculate cluster frequencies

clusteringtable <- as.numeric(table(som$map$mapping[,1])) clusteringprop <- round(clusteringtable / sum(clusteringtable) * 100, 2) dfprop <- as.data.frame(clusteringprop) dfprop$cluster <- rownames(dfprop)

write.csv(exprmedian, "exprmedian.csv", row.names = F) write.csv(dfprop, "clusterfreq.csv")

----------------------------------------------------------------------------

Generate the Proportion Table

----------------------------------------------------------------------------

countstable <- table(som$map$mapping[,1], sampleids) propstable <- t(t(countstable) / colSums(counts_table)) * 100

props <- as.data.frame.matrix(props_table)

write.csv(props, "proportion_table.csv") ```

Data Input from GigaSOM.jl

Median Expression and Cluster Frequencies:

``` julia gridSize = 20 nEpochs = 40

Assume the dataset is loaded in distributed data info `di`, e.g. using `loadFCSSet`.

som = initGigaSOM(di, gridSize, gridSize, seed = 42) # set a seed value here som = trainGigaSOM(som, di, epochs = nEpochs) mapping_di = mapToGigaSOM(som, di)

num_cluster = gridSize^2

Get the Cluster Frequencies for CyCadas:

clusterFreq = dcount(numcluster, mappingdi) df = DataFrame(cluster = 1:length(df), clusteringprop = clusterFreq) df.clusteringprop = df.clusteringprop ./ sum(df.clusteringprop) CSV.write("cluster_freq.csv", df)

Get the count table per fileID (optional - Count Table in CyCadas).

Assume `md` is a data frame that describes the data

(i.e., it contains a row for all filenames loaded in `di` in the same order,

together with sample identifiers)

files = distributeFCSFileVector(:fileIDs, md[:, :filename]) counttbl = dcountbuckets(numcluster, mappingdi, size(md, 1), files) ct = DataFrame(counttbl, :auto) rename!(ct, md.sampleid) CSV.write("clustercounts.csv", ct)

Get the median Marker Expressions.

Assume lineage_markers is a human-readable list of markers used in clustering

(here used for annotating the median expression table)

exprtbl = dmedianbuckets(di, numcluster, mappingdi, cols) et = DataFrame(exprtbl, :auto) rename!(et, lineagemarkers) CSV.write("median_expr.csv", et) ```

Detailed workflow for each method can be found in the data section.

Data exploration

The UMAP interactive tab allows the preview of marker expression in the clusters selected by the user on the UMAP:

In the UMAP Marker expression tab, user can investigate the expression level of the selected marker across all the clusters.

Thresholds

In the Thresholds tab, the estimation of threshold value defining negative and positive marker expression of each marker is based on 1-dimensional k-means clustering and Mclust. A silhouette score chooses the best estimation of each marker. The bimodality for every marker is assessed and the bimodal coefficient values are reported. The blue threshold line indicates that data meets the bimodal distribution criteria, otherwise it is colored red. The threshold value can be manually adjusted by clicking on the scatterplot.

Expression of CD8a with blue threshold line indicating the bimodal distribution:

Expression of TCRgd with red threshold line indicating that this marker expression does not follow the bimodal distribution:

Annotation

The Annotation tab allows performing the annotation in a tree-based hierarchical process - initially, the main cell types are defined, followed by the identification of their subtypes (with the level of detail defined by the user).

All the clusters are initially defined as "unassigned". Then, upon the selection of positive and negative markers defining the population, clusters characterized by given expression pattern are re-assigned from the parent node to the child node.

Scheme depicting the process of building the annotation tree:

Cropped fragment of the completed annotation tree:

Upon selection of the node, heatmap displaying the expression of all the markers in all the clusters belonging to this node is shown.

Heatmap depicting phenotype of clusters annotated as CD8+ TEM cells:

Differential abundance analysis

In the Differential Abundance tab, a pairwise Wilcoxon test on all the nodes is performed upon selecting the desired multiple testing correction method:

DA Interactive Tree allows exploration of abundance of all the defined subpopulations across the conditions by selecting the node on the annotation tree.

Upon clicking on the desired node...

... proportion of the selected celltype across the condition is plotted.

Data export

Differential abundance analysis results, as well as proportion table (% of defined cell populations across all the samples) can be exported in the Differential Abundance tab.

Files enabling the continuation of the analysis - modified threshold values, as well as annotation tree structure, can be exported from the Thresholds and Annotation tabs, respectively, and re-loaded (Load tab) to continue the analysis.

Exporting annotation tree:

Exporting threshold values:

Owner

Name: Luxembourg Institute of Health
Login: DII-LIH-Luxembourg
Kind: organization
Location: 29, rue Henri Koch, L-4354 Esch-sur-Alzette

Website: www.lih.lu
Repositories: 2
Profile: https://github.com/DII-LIH-Luxembourg

Department of Infection and Immunity

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using the following:"
title: "CyCadas"
authors:
  - family-names: "Hunewald"
    given-names: "Oliver"
    orcid: "https://orcid.org/0000-0001-5402-5084"  # optional
    affiliation: "Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
    Bioinformatics & AI, Department of Medical Informatics, Luxembourg Institute of Health, Strassen, Luxembourg"
date-released: 2024-10-07

  - family-names: "Demczuk"
    given-names: "Agnieszka"
    orcid: "https://orcid.org/0000-0001-9868-7653"  # optional
    affiliation: "Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
    Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
version: "1.0.0"
doi: "10.1234/bioinformatics/btae595"
url: "https://https://github.com/DII-LIH-Luxembourg/cycadas"
license: "MIT"

GitHub Events

Total

Issues event: 1
Issue comment event: 1
Push event: 9
Pull request review event: 2
Pull request event: 4
Create event: 4

Last Year

Issues event: 1
Issue comment event: 1
Push event: 9
Pull request review event: 2
Pull request event: 4
Create event: 4

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: 20 days
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: 20 days
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

oHunewald (1)
exaexa (1)

Pull Request Authors

exaexa (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

DESCRIPTION cran

testthat >= 3.0.0 suggests

cycadas

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Cytometry Cluster Annotation and Differential Abundance Suite

Efficient and reproducible cytometry data

Installation Instructions

Install all required packages

start the cycadas shiny app

Demo dataset

Loading SingleCellExperiment Data (CATALYST)

Data input, Single Cell Format

CATALYST Workflow using Single Cell Experiment data object

Preprocessing ...

Cluster with CATALYST

Save the object as .rds file

Load into CyCadas

Annotate the desired (meta-)cluster levels

use the integrated merge function within Cycadas

save the object

load it back into your workflow

continue downstream analysis

...

Data Input from FlowSOM

Median Expression and Cluster Frequencies:

within your clustering workflow create sample_ids according to the metadata files:

Calculate cluster frequencies

----------------------------------------------------------------------------

Generate the Proportion Table

----------------------------------------------------------------------------

Data Input from GigaSOM.jl

Median Expression and Cluster Frequencies:

Assume the dataset is loaded in distributed data info di, e.g. using loadFCSSet.

Get the Cluster Frequencies for CyCadas:

Get the count table per fileID (optional - Count Table in CyCadas).

Assume md is a data frame that describes the data

(i.e., it contains a row for all filenames loaded in di in the same order,

together with sample identifiers)

Get the median Marker Expressions.

Assume lineage_markers is a human-readable list of markers used in clustering

(here used for annotating the median expression table)

Data exploration

Thresholds

Annotation

Differential abundance analysis

Data export

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Assume the dataset is loaded in distributed data info `di`, e.g. using `loadFCSSet`.

Assume `md` is a data frame that describes the data

(i.e., it contains a row for all filenames loaded in `di` in the same order,