Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: DII-LIH-Luxembourg
  • License: mit
  • Language: R
  • Default Branch: main
  • Size: 3.03 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 2
  • Open Issues: 4
  • Releases: 2
Created over 2 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Alt text

Cytometry Cluster Annotation and Differential Abundance Suite

Efficient and reproducible cytometry data

DOI

Aims:

• facilitating the process of cluster annotation while reducing user bias • improving reproducibility

Key features:

• defining the threshold of positive/negative marker expression

• interactive inspection of cluster phenotypes

• automatic merging of populations

• differential abundance analysis

Installation Instructions

``` r library(devtools)

Install all required packages

devtools::install_github("DII-LIH-Luxembourg/cycadas", dependencies = TRUE)

library(cycadas)

start the cycadas shiny app

cycadas() ```

Demo dataset

To enable tool exploration, we provide the demo dataset that can be loaded (Load tab → Demo Data) either as cluster expression data only (Load Cluster Expression Demo Data, allowing the user to create the annotation) or as annotated data (Load Annotated Demo Data which include the annotation tree).

This demo dataset is generated from the publicly available mass cytometry data of patients with idiopathic Parkinson's disease and healthy controls (Capelle, C.M. et al., Nat Commun, 2023) that were clustered with GigaSOM to generate 1600 clusters.

Loading SingleCellExperiment Data (CATALYST)

This is optional. If you wish to load data clustered with CATALYST or other Tools using the Single Cell Format, please install:

``` r if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("CATALYST") BiocManager::install("SingleCellExperiment") ```

Data input, Single Cell Format

```{r}

CATALYST Workflow using Single Cell Experiment data object

Preprocessing ...

Cluster with CATALYST

sce <- cluster(sce, features = "type", xdim = 10, ydim = 10, maxK = 20, verbose = FALSE, seed = 1)

Save the object as .rds file

saveRDS(sce, "my_sce.rds")

Load into CyCadas

Annotate the desired (meta-)cluster levels

use the integrated merge function within Cycadas

save the object

load it back into your workflow

sce <- readRDS("Annotated_sce.rds")

continue downstream analysis

...

```

Data Input from FlowSOM

Median Expression and Cluster Frequencies:

``` r

within your clustering workflow create sample_ids according to the metadata files:

sampleids <- rep(metadata$sampleid, fsApply(fcs, nrow))

library(FlowSOM) fsom <- ReadInput(fcs, transform = FALSE, scale = FALSE)

set.seed(42) som <- BuildSOM(fsom, colsToUse = lineage_markers, xdim=20, ydim=20, rlen=40)

expr_median <- som$map$medianValues

Calculate cluster frequencies

clusteringtable <- as.numeric(table(som$map$mapping[,1])) clusteringprop <- round(clusteringtable / sum(clusteringtable) * 100, 2) dfprop <- as.data.frame(clusteringprop) dfprop$cluster <- rownames(dfprop)

write.csv(exprmedian, "exprmedian.csv", row.names = F) write.csv(dfprop, "clusterfreq.csv")

----------------------------------------------------------------------------

Generate the Proportion Table

----------------------------------------------------------------------------

countstable <- table(som$map$mapping[,1], sampleids) propstable <- t(t(countstable) / colSums(counts_table)) * 100

props <- as.data.frame.matrix(props_table)

write.csv(props, "proportion_table.csv") ```

Data Input from GigaSOM.jl

Median Expression and Cluster Frequencies:

``` julia gridSize = 20 nEpochs = 40

Assume the dataset is loaded in distributed data info di, e.g. using loadFCSSet.

som = initGigaSOM(di, gridSize, gridSize, seed = 42) # set a seed value here som = trainGigaSOM(som, di, epochs = nEpochs) mapping_di = mapToGigaSOM(som, di)

num_cluster = gridSize^2

Get the Cluster Frequencies for CyCadas:

clusterFreq = dcount(numcluster, mappingdi) df = DataFrame(cluster = 1:length(df), clusteringprop = clusterFreq) df.clusteringprop = df.clusteringprop ./ sum(df.clusteringprop) CSV.write("cluster_freq.csv", df)

Get the count table per fileID (optional - Count Table in CyCadas).

Assume md is a data frame that describes the data

(i.e., it contains a row for all filenames loaded in di in the same order,

together with sample identifiers)

files = distributeFCSFileVector(:fileIDs, md[:, :filename]) counttbl = dcountbuckets(numcluster, mappingdi, size(md, 1), files) ct = DataFrame(counttbl, :auto) rename!(ct, md.sampleid) CSV.write("clustercounts.csv", ct)

Get the median Marker Expressions.

Assume lineage_markers is a human-readable list of markers used in clustering

(here used for annotating the median expression table)

exprtbl = dmedianbuckets(di, numcluster, mappingdi, cols) et = DataFrame(exprtbl, :auto) rename!(et, lineagemarkers) CSV.write("median_expr.csv", et) ```

Detailed workflow for each method can be found in the data section.

Data exploration

The UMAP interactive tab allows the preview of marker expression in the clusters selected by the user on the UMAP:

In the UMAP Marker expression tab, user can investigate the expression level of the selected marker across all the clusters.

Thresholds

In the Thresholds tab, the estimation of threshold value defining negative and positive marker expression of each marker is based on 1-dimensional k-means clustering and Mclust. A silhouette score chooses the best estimation of each marker. The bimodality for every marker is assessed and the bimodal coefficient values are reported. The blue threshold line indicates that data meets the bimodal distribution criteria, otherwise it is colored red. The threshold value can be manually adjusted by clicking on the scatterplot.

Expression of CD8a with blue threshold line indicating the bimodal distribution:

Expression of TCRgd with red threshold line indicating that this marker expression does not follow the bimodal distribution:

Annotation

The Annotation tab allows performing the annotation in a tree-based hierarchical process - initially, the main cell types are defined, followed by the identification of their subtypes (with the level of detail defined by the user).

All the clusters are initially defined as "unassigned". Then, upon the selection of positive and negative markers defining the population, clusters characterized by given expression pattern are re-assigned from the parent node to the child node.

Scheme depicting the process of building the annotation tree:

Cropped fragment of the completed annotation tree:

Upon selection of the node, heatmap displaying the expression of all the markers in all the clusters belonging to this node is shown.

Heatmap depicting phenotype of clusters annotated as CD8+ TEM cells:

Differential abundance analysis

In the Differential Abundance tab, a pairwise Wilcoxon test on all the nodes is performed upon selecting the desired multiple testing correction method:

DA Interactive Tree allows exploration of abundance of all the defined subpopulations across the conditions by selecting the node on the annotation tree.

Upon clicking on the desired node...

... proportion of the selected celltype across the condition is plotted.

Data export

Differential abundance analysis results, as well as proportion table (% of defined cell populations across all the samples) can be exported in the Differential Abundance tab.

Files enabling the continuation of the analysis - modified threshold values, as well as annotation tree structure, can be exported from the Thresholds and Annotation tabs, respectively, and re-loaded (Load tab) to continue the analysis.

Exporting annotation tree:

Exporting threshold values:

Owner

  • Name: Luxembourg Institute of Health
  • Login: DII-LIH-Luxembourg
  • Kind: organization
  • Location: 29, rue Henri Koch, L-4354 Esch-sur-Alzette

Department of Infection and Immunity

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using the following:"
title: "CyCadas"
authors:
  - family-names: "Hunewald"
    given-names: "Oliver"
    orcid: "https://orcid.org/0000-0001-5402-5084"  # optional
    affiliation: "Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
    Bioinformatics & AI, Department of Medical Informatics, Luxembourg Institute of Health, Strassen, Luxembourg"
date-released: 2024-10-07

  - family-names: "Demczuk"
    given-names: "Agnieszka"
    orcid: "https://orcid.org/0000-0001-9868-7653"  # optional
    affiliation: "Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
    Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
version: "1.0.0"
doi: "10.1234/bioinformatics/btae595"
url: "https://https://github.com/DII-LIH-Luxembourg/cycadas"
license: "MIT"

GitHub Events

Total
  • Issues event: 1
  • Issue comment event: 1
  • Push event: 9
  • Pull request review event: 2
  • Pull request event: 4
  • Create event: 4
Last Year
  • Issues event: 1
  • Issue comment event: 1
  • Push event: 9
  • Pull request review event: 2
  • Pull request event: 4
  • Create event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: 20 days
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 20 days
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • oHunewald (1)
  • exaexa (1)
Pull Request Authors
  • exaexa (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • testthat >= 3.0.0 suggests