GigaSOM
Huge-scale, high-performance flow cytometry clustering in Julia
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.1%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Huge-scale, high-performance flow cytometry clustering in Julia
Basic Info
- Host: GitHub
- Owner: LCSB-BioCore
- License: apache-2.0
- Language: Julia
- Default Branch: master
- Homepage: https://lcsb-biocore.github.io/GigaSOM.jl/
- Size: 3.95 MB
Statistics
- Stars: 35
- Watchers: 2
- Forks: 9
- Open Issues: 1
- Releases: 27
Topics
Metadata Files
README.md

GigaSOM.jl
Huge-scale, high-performance flow cytometry clustering
GigaSOM is a Julia toolkit for clustering and visualisation of really large cytometry data. Most generally, it can load FCS files, perform transformation and cleaning operations in their contents, run FlowSOM-style clustering, and visualize and export the results. GigaSOM is distributed and parallel in nature, which makes processing huge datasets a breeze -- a hundred of millions of cells with a few dozen parameters can be clustered and visualized in a few minutes.
| Documentation | Test Coverage | CI | SciCrunch |
|:-----------------:|:-----------------:|:-----------------------------------------------------:|:--------:|
| |
|
|
|
If you use GigaSOM.jl and want to refer to it in your work, use the following citation format (also available as BibTeX in gigasom.bib):
Miroslav Kratochvíl, Oliver Hunewald, Laurent Heirendt, Vasco Verissimo, Jiří Vondrášek, Venkata P Satagopam, Reinhard Schneider, Christophe Trefois, Markus Ollert. GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets. GigaScience, Volume 9, Issue 11, November 2020, giaa127, https://doi.org/10.1093/gigascience/giaa127
How to get started
Prerequisites and requirements
- Operating system: Use Linux (Debian, Ubuntu or centOS), MacOS, or Windows 10 as your operating system. GigaSOM has been tested on these systems.
- Julia language: In order to use GigaSOM, you need to install Julia 1.0 or higher. You can find the download and installation instructions for Julia here.
- Hardware requirements: GigaSOM runs on any hardware that can run Julia, and can easily use resources from multiple computers interconnected by network. For processing large datasets, you require to ensure that the total amount of available RAM on all involved computers is larger than the data size.
:bulb: If you are new to Julia, it is adviseable to familiarize youself with the environment first. Use the full Julia documentation to solve various possible language-related problems, and the Julia package manager docs to solve installation-related difficulties.
Installation
Using the Julia package manager to install GigaSOM is easy -- after starting Julia, type:
julia
import Pkg; Pkg.add("GigaSOM");
All these commands should be run from Julia at the
julia>prompt.
Then you can load the GigaSOM package and start using it:
julia
using GigaSOM
The first loading of the GigaSOM package may take several minutes to complete due to precompilation of the sources, especially on a fresh Julia install.
Test the installation
If you run a non-standard platform (e.g. a customized operating systems), or if you added any modifications to GigaSOM source code, you may want to run the test suite to ensure that everything works as expected:
julia
import Pkg; Pkg.test("GigaSOM");
For debugging, it is sometimes very useful to enable the @debug messages from the source, as such:
julia
using Logging
global_logger(ConsoleLogger(stderr, Logging.Debug))
How to use GigaSOM
A comprehensive documentation is available online; several introductory tutorials of increasing complexity are also included.
A very basic dataset (Levine13 from FR-FCM-ZZPH) can be loaded, clustered and visualized as such:
```julia using GigaSOM
params, fcsmatrix = loadFCS("Levine_13dim.fcs") # load the FCS file
exprs = fcsmatrix[:,1:13] # extract only the data columns with expression values
som = initGigaSOM(exprs, 20, 20) # random initialization of the SOM codebook som = trainGigaSOM(som, exprs) # SOM training clusters = mapToGigaSOM(som, exprs) # extraction of per-cell cluster IDs e = embedGigaSOM(som, exprs) # EmbedSOM projection to 2D ```
The example loads the data, runs the SOM training (as in FlowSOM) and computes a 2D projection of the dataset (using EmbedSOM); the total computation time (excluding the possible precompilation of the libraries) should be around 15 seconds.
The results can be visualized e.g. with GigaScatter which we developed for this purpose, or by exporting the data and plotting them with any other programming language. For example, to save an embedding with highlighted expression of CD4, you can install and use GigaScatter as such:
```julia import Pkg; Pkg.add("GigaScatter") using GigaScatter
savePNG("Levine13-CD4.png", solidBackground(rasterize((500,500), # bitmap size Matrix{Float64}(e'), # the embedding coordinates expressionColors( scaleNorm(Array{Float64}(exprs[:,5])), # 5th column contains CD4 expressions expressionPalette(100, alpha=0.5))))) # colors for plotting (based on RdYlBu) ```
The output may look like this (blue is negative expresison, red is positive):

Feedback, issues, questions
Please follow the contributing guide when you have questions, want to raise issues, or just want to leave us some feedback!
Owner
- Name: Luxembourg Centre for Systems Biomedicine
- Login: LCSB-BioCore
- Kind: organization
- Location: Luxembourg
- Website: https://wwwen.uni.lu/lcsb/
- Repositories: 9
- Profile: https://github.com/LCSB-BioCore
Citation (CITATION.cff)
cff-version: 1.2.0
title: GigaSOM.jl
message: >-
If you use GigaSOM.jl and want to refer to it in
your work, use this citation.
type: software
authors:
- given-names: Miroslav
family-names: Kratochvíl
orcid: https://orcid.org/0000-0001-7356-4075
- given-names: Oliver
family-names: Hunewald
orcid: https://orcid.org/0000-0001-5402-5084
- given-names: Laurent
family-names: Heirendt
orcid: https://orcid.org/0000-0003-1861-0037
- given-names: Vasco
family-names: Verissimo
orcid: https://orcid.org/0000-0003-3884-9125
- given-names: Jiří
family-names: Vondrášek
orcid: https://orcid.org/0000-0002-6066-973X
- given-names: Venkata P
family-names: Satagopam
orcid: https://orcid.org/0000-0002-6532-5880
- given-names: Reinhard
family-names: Schneider
orcid: https://orcid.org/0000-0002-8278-1618
- given-names: Christophe
family-names: Trefois
orcid: https://orcid.org/0000-0002-8991-6810
- given-names: Markus
family-names: Ollert
orcid: https://orcid.org/0000-0002-8055-0103
repository-code: 'https://github.com/LCSB-BioCore/GigaSOM.jl'
date-released: 2019-07-23
preferred-citation:
type: article
title: "GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets"
authors:
- given-names: Miroslav
family-names: Kratochvíl
orcid: https://orcid.org/0000-0001-7356-4075
- given-names: Oliver
family-names: Hunewald
orcid: https://orcid.org/0000-0001-5402-5084
- given-names: Laurent
family-names: Heirendt
orcid: https://orcid.org/0000-0003-1861-0037
- given-names: Vasco
family-names: Verissimo
orcid: https://orcid.org/0000-0003-3884-9125
- given-names: Jiří
family-names: Vondrášek
orcid: https://orcid.org/0000-0002-6066-973X
- given-names: Venkata P
family-names: Satagopam
orcid: https://orcid.org/0000-0002-6532-5880
- given-names: Reinhard
family-names: Schneider
orcid: https://orcid.org/0000-0002-8278-1618
- given-names: Christophe
family-names: Trefois
orcid: https://orcid.org/0000-0002-8991-6810
- given-names: Markus
family-names: Ollert
orcid: https://orcid.org/0000-0002-8055-0103
doi: "10.1093/gigascience/giaa127"
journal: GigaScience
volume: 9
issue: 11
year: 2020
month: November
issn: 2047-217X
url: "https://academic.oup.com/gigascience/article/9/11/giaa127/5987271"
abstract: "The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena.We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study.GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies."
GitHub Events
Total
- Watch event: 2
- Issue comment event: 1
- Push event: 16
- Create event: 1
Last Year
- Watch event: 2
- Issue comment event: 1
- Push event: 16
- Create event: 1
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Laurent Heirendt | l****t@u****u | 471 |
| oHunewald | o****d@g****m | 262 |
| Vasco VERISSIMO | v****o@u****u | 217 |
| Mirek Kratochvil | e****a@g****m | 160 |
| github-actions[bot] | 4****] | 18 |
| Julia TagBot | 5****t | 1 |
| CompatHelper Julia | c****y@j****g | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 26
- Total pull requests: 74
- Average time to close issues: about 1 month
- Average time to close pull requests: 6 days
- Total issue authors: 8
- Total pull request authors: 3
- Average comments per issue: 2.81
- Average comments per pull request: 0.91
- Merged pull requests: 72
- Bot issues: 0
- Bot pull requests: 13
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- laurentheirendt (15)
- oHunewald (5)
- kyle11rd (1)
- fransua (1)
- smahoff (1)
- rfourquet (1)
- gszep (1)
- JuliaTagBot (1)
Pull Request Authors
- laurentheirendt (33)
- exaexa (28)
- github-actions[bot] (13)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- julia 6 total
- Total dependent packages: 2
- Total dependent repositories: 0
- Total versions: 24
juliahub.com: GigaSOM
Huge-scale, high-performance flow cytometry clustering in Julia
- Homepage: https://lcsb-biocore.github.io/GigaSOM.jl/
- Documentation: https://docs.juliahub.com/General/GigaSOM/stable/
- License: Apache-2.0
-
Latest release: 0.7.0
published almost 4 years ago
Rankings
Dependencies
- JuliaRegistries/TagBot v1 composite
- actions/cache v1 composite
- actions/checkout v2 composite
- codecov/codecov-action v1 composite
- julia-actions/julia-buildpkg latest composite
- julia-actions/julia-processcoverage v1 composite
- julia-actions/julia-runtest latest composite
- julia-actions/setup-julia v1 composite
- actions/checkout v2 composite
- mr-smithers-excellent/docker-build-push v5 composite
- actions/checkout v2 composite
- julia-actions/setup-julia latest composite
- julia latest build