GigaSOM

Huge-scale, high-performance flow cytometry clustering in Julia

https://github.com/lcsb-biocore/gigasom.jl

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.1%) to scientific vocabulary

Keywords

artifical-neural-network artificial-intelligence clustering clustering-methods cytof cytometry flow-cytometry huge-scale immunology large-scale mass-cytometry neural-networks self-organizing-map som

Keywords from Contributors

fluxes pdes tracers surrogate standardization polygons minkowski-sum lazy-evaluation geometry-algorithms formal-verification
Last synced: 6 months ago · JSON representation ·

Repository

Huge-scale, high-performance flow cytometry clustering in Julia

Basic Info
Statistics
  • Stars: 35
  • Watchers: 2
  • Forks: 9
  • Open Issues: 1
  • Releases: 27
Topics
artifical-neural-network artificial-intelligence clustering clustering-methods cytof cytometry flow-cytometry huge-scale immunology large-scale mass-cytometry neural-networks self-organizing-map som
Created almost 7 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

GigaSOM.jl

GigaSOM.jl
Huge-scale, high-performance flow cytometry clustering

GigaSOM is a Julia toolkit for clustering and visualisation of really large cytometry data. Most generally, it can load FCS files, perform transformation and cleaning operations in their contents, run FlowSOM-style clustering, and visualize and export the results. GigaSOM is distributed and parallel in nature, which makes processing huge datasets a breeze -- a hundred of millions of cells with a few dozen parameters can be clustered and visualized in a few minutes.

| Documentation | Test Coverage | CI | SciCrunch | |:-----------------:|:-----------------:|:-----------------------------------------------------:|:--------:| | doc | coverage status | linux | rrid |

If you use GigaSOM.jl and want to refer to it in your work, use the following citation format (also available as BibTeX in gigasom.bib):

Miroslav Kratochvíl, Oliver Hunewald, Laurent Heirendt, Vasco Verissimo, Jiří Vondrášek, Venkata P Satagopam, Reinhard Schneider, Christophe Trefois, Markus Ollert. GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets. GigaScience, Volume 9, Issue 11, November 2020, giaa127, https://doi.org/10.1093/gigascience/giaa127

How to get started

Prerequisites and requirements

  • Operating system: Use Linux (Debian, Ubuntu or centOS), MacOS, or Windows 10 as your operating system. GigaSOM has been tested on these systems.
  • Julia language: In order to use GigaSOM, you need to install Julia 1.0 or higher. You can find the download and installation instructions for Julia here.
  • Hardware requirements: GigaSOM runs on any hardware that can run Julia, and can easily use resources from multiple computers interconnected by network. For processing large datasets, you require to ensure that the total amount of available RAM on all involved computers is larger than the data size.

:bulb: If you are new to Julia, it is adviseable to familiarize youself with the environment first. Use the full Julia documentation to solve various possible language-related problems, and the Julia package manager docs to solve installation-related difficulties.

Installation

Using the Julia package manager to install GigaSOM is easy -- after starting Julia, type:

julia import Pkg; Pkg.add("GigaSOM");

All these commands should be run from Julia at the julia> prompt.

Then you can load the GigaSOM package and start using it:

julia using GigaSOM

The first loading of the GigaSOM package may take several minutes to complete due to precompilation of the sources, especially on a fresh Julia install.

Test the installation

If you run a non-standard platform (e.g. a customized operating systems), or if you added any modifications to GigaSOM source code, you may want to run the test suite to ensure that everything works as expected:

julia import Pkg; Pkg.test("GigaSOM");

For debugging, it is sometimes very useful to enable the @debug messages from the source, as such: julia using Logging global_logger(ConsoleLogger(stderr, Logging.Debug))

How to use GigaSOM

A comprehensive documentation is available online; several introductory tutorials of increasing complexity are also included.

A very basic dataset (Levine13 from FR-FCM-ZZPH) can be loaded, clustered and visualized as such:

```julia using GigaSOM

params, fcsmatrix = loadFCS("Levine_13dim.fcs") # load the FCS file

exprs = fcsmatrix[:,1:13] # extract only the data columns with expression values

som = initGigaSOM(exprs, 20, 20) # random initialization of the SOM codebook som = trainGigaSOM(som, exprs) # SOM training clusters = mapToGigaSOM(som, exprs) # extraction of per-cell cluster IDs e = embedGigaSOM(som, exprs) # EmbedSOM projection to 2D ```

The example loads the data, runs the SOM training (as in FlowSOM) and computes a 2D projection of the dataset (using EmbedSOM); the total computation time (excluding the possible precompilation of the libraries) should be around 15 seconds.

The results can be visualized e.g. with GigaScatter which we developed for this purpose, or by exporting the data and plotting them with any other programming language. For example, to save an embedding with highlighted expression of CD4, you can install and use GigaScatter as such:

```julia import Pkg; Pkg.add("GigaScatter") using GigaScatter

savePNG("Levine13-CD4.png", solidBackground(rasterize((500,500), # bitmap size Matrix{Float64}(e'), # the embedding coordinates expressionColors( scaleNorm(Array{Float64}(exprs[:,5])), # 5th column contains CD4 expressions expressionPalette(100, alpha=0.5))))) # colors for plotting (based on RdYlBu) ```

The output may look like this (blue is negative expresison, red is positive):

Levine13 embedding with CD4 highlighted

Feedback, issues, questions

Please follow the contributing guide when you have questions, want to raise issues, or just want to leave us some feedback!

Owner

  • Name: Luxembourg Centre for Systems Biomedicine
  • Login: LCSB-BioCore
  • Kind: organization
  • Location: Luxembourg

Citation (CITATION.cff)

cff-version: 1.2.0
title: GigaSOM.jl
message: >-
  If you use GigaSOM.jl and want to refer to it in
  your work, use this citation.
type: software
authors:
  - given-names: Miroslav
    family-names: Kratochvíl
    orcid: https://orcid.org/0000-0001-7356-4075
  - given-names: Oliver
    family-names: Hunewald
    orcid: https://orcid.org/0000-0001-5402-5084
  - given-names: Laurent
    family-names: Heirendt
    orcid: https://orcid.org/0000-0003-1861-0037
  - given-names: Vasco
    family-names: Verissimo
    orcid: https://orcid.org/0000-0003-3884-9125
  - given-names: Jiří
    family-names: Vondrášek
    orcid: https://orcid.org/0000-0002-6066-973X
  - given-names: Venkata P
    family-names: Satagopam
    orcid: https://orcid.org/0000-0002-6532-5880
  - given-names: Reinhard
    family-names: Schneider
    orcid: https://orcid.org/0000-0002-8278-1618
  - given-names: Christophe
    family-names: Trefois
    orcid: https://orcid.org/0000-0002-8991-6810
  - given-names: Markus
    family-names: Ollert
    orcid: https://orcid.org/0000-0002-8055-0103
repository-code: 'https://github.com/LCSB-BioCore/GigaSOM.jl'
date-released: 2019-07-23

preferred-citation:
  type: article
  title: "GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets"
  authors:
    - given-names: Miroslav
      family-names: Kratochvíl
      orcid: https://orcid.org/0000-0001-7356-4075
    - given-names: Oliver
      family-names: Hunewald
      orcid: https://orcid.org/0000-0001-5402-5084
    - given-names: Laurent
      family-names: Heirendt
      orcid: https://orcid.org/0000-0003-1861-0037
    - given-names: Vasco
      family-names: Verissimo
      orcid: https://orcid.org/0000-0003-3884-9125
    - given-names: Jiří
      family-names: Vondrášek
      orcid: https://orcid.org/0000-0002-6066-973X
    - given-names: Venkata P
      family-names: Satagopam
      orcid: https://orcid.org/0000-0002-6532-5880
    - given-names: Reinhard
      family-names: Schneider
      orcid: https://orcid.org/0000-0002-8278-1618
    - given-names: Christophe
      family-names: Trefois
      orcid: https://orcid.org/0000-0002-8991-6810
    - given-names: Markus
      family-names: Ollert
      orcid: https://orcid.org/0000-0002-8055-0103
  doi: "10.1093/gigascience/giaa127"
  journal: GigaScience
  volume: 9
  issue: 11
  year: 2020
  month: November
  issn: 2047-217X
  url: "https://academic.oup.com/gigascience/article/9/11/giaa127/5987271"
  abstract: "The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena.We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study.GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies."

GitHub Events

Total
  • Watch event: 2
  • Issue comment event: 1
  • Push event: 16
  • Create event: 1
Last Year
  • Watch event: 2
  • Issue comment event: 1
  • Push event: 16
  • Create event: 1

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 1,130
  • Total Committers: 7
  • Avg Commits per committer: 161.429
  • Development Distribution Score (DDS): 0.583
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Laurent Heirendt l****t@u****u 471
oHunewald o****d@g****m 262
Vasco VERISSIMO v****o@u****u 217
Mirek Kratochvil e****a@g****m 160
github-actions[bot] 4****] 18
Julia TagBot 5****t 1
CompatHelper Julia c****y@j****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 26
  • Total pull requests: 74
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 6 days
  • Total issue authors: 8
  • Total pull request authors: 3
  • Average comments per issue: 2.81
  • Average comments per pull request: 0.91
  • Merged pull requests: 72
  • Bot issues: 0
  • Bot pull requests: 13
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • laurentheirendt (15)
  • oHunewald (5)
  • kyle11rd (1)
  • fransua (1)
  • smahoff (1)
  • rfourquet (1)
  • gszep (1)
  • JuliaTagBot (1)
Pull Request Authors
  • laurentheirendt (33)
  • exaexa (28)
  • github-actions[bot] (13)
Top Labels
Issue Labels
documentation/tutorials (1)
Pull Request Labels
ready to merge (4) documentation/tutorials (3)

Packages

  • Total packages: 1
  • Total downloads:
    • julia 6 total
  • Total dependent packages: 2
  • Total dependent repositories: 0
  • Total versions: 24
juliahub.com: GigaSOM

Huge-scale, high-performance flow cytometry clustering in Julia

  • Versions: 24
  • Dependent Packages: 2
  • Dependent Repositories: 0
  • Downloads: 6 Total
Rankings
Dependent repos count: 9.9%
Forks count: 14.9%
Average: 15.4%
Dependent packages count: 16.6%
Stargazers count: 20.4%
Last synced: 6 months ago

Dependencies

.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite
.github/workflows/ci.yml actions
  • actions/cache v1 composite
  • actions/checkout v2 composite
  • codecov/codecov-action v1 composite
  • julia-actions/julia-buildpkg latest composite
  • julia-actions/julia-processcoverage v1 composite
  • julia-actions/julia-runtest latest composite
  • julia-actions/setup-julia v1 composite
.github/workflows/docker.yml actions
  • actions/checkout v2 composite
  • mr-smithers-excellent/docker-build-push v5 composite
.github/workflows/docs.yml actions
  • actions/checkout v2 composite
  • julia-actions/setup-julia latest composite
Dockerfile docker
  • julia latest build