tidyFlowCore
tidyFlowCore: Bringing flowCore to the tidyverse
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.8%) to scientific vocabulary
Last synced: 7 months ago
·
JSON representation
Repository
tidyFlowCore: Bringing flowCore to the tidyverse
Basic Info
- Host: GitHub
- Owner: keyes-timothy
- License: other
- Language: R
- Default Branch: main
- Homepage: https://keyes-timothy.github.io/tidyFlowCore/
- Size: 4.85 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created about 2 years ago
· Last pushed almost 2 years ago
Metadata Files
Readme
License
Code of conduct
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
warning = FALSE,
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# tidyFlowCore
[](https://github.com/keyes-timothy/tidyFlowCore/actions)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[](https://app.codecov.io/gh/keyes-timothy/tidyFlowCore?branch=main)
[](https://github.com/keyes-timothy/tidyflowCore/issues)
[](https://github.com/keyes-timothy/tidyflowCore/pulls)
`tidyFlowCore` is an R package that bridges the gap between flow cytometry analysis using the `flowCore` Bioconductor package and the tidy data principles advocated by the `tidyverse.` It provides a suite of `dplyr`-, `ggplot2`-, and `tidyr`-like verbs specifically designed for working with `flowFrame` and `flowSet` objects as if they were tibbles; however, your data remain `flowCore` `flowFrame`s and `flowSet`s under this layer of abstraction.
Using this approach, `tidyFlowCore` enables intuitive and streamlined analysis workflows that can leverage both the Bioconductor and tidyverse ecosystems for cytometry data.
## Installation instructions
Get the latest stable `R` release from [CRAN](http://cran.r-project.org/). Then install `tidyFlowCore` from [Bioconductor](http://bioconductor.org/) using the following code:
```{r 'install', eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("tidyFlowCore")
```
And the development version from [GitHub](https://github.com/keyes-timothy/tidyflowCore) with:
```{r 'install_dev', eval = FALSE}
BiocManager::install("keyes-timothy/tidyFlowCore")
```
## Example
`tidyFlowCore` allows you to treat `flowCore` data structures like tidy `data.frame`s or `tibble`s It does so by implementing `dplyr`, `tidyr`, and `ggplot2` verbs that can be deployed directly on the `flowFrame` and `flowSet` S4 classes.
In this section, we give a brief example of how `tidyFlowCore` can enable a data analysis pipeline to use all the useful functions of the `flowCore` package and many of the functions of the `dplyr`, `tidyr`, and `ggplot2` packages.
### Load required packages
```{r, warning = FALSE}
library(tidyFlowCore)
library(flowCore)
```
### Read data
```{r example, eval = requireNamespace('tidyFlowCore')}
# read data from the HDCytoData package
bcr_flowset <- HDCytoData::Bodenmiller_BCR_XL_flowSet()
```
### Data transformation
The `flowCore` package natively supports multiple types of data preprocessing and transformations for cytometry data through the use of its `tranform` class.
For example, if we want to apply the standard arcsinh transformation often used for CyTOF data to our current dataset, we could use the following code:
```{r}
asinh_transformation <- flowCore::arcsinhTransform(a = 0, b = 1/5, c = 0)
transformation_list <-
flowCore::transformList(
colnames(bcr_flowset),
asinh_transformation
)
transformed_bcr_flowset <- flowCore::transform(bcr_flowset, transformation_list)
```
Alternatively, we can also use the `tidyverse`'s functional programming paradigm to perform the same transformation. For this, we use the mutate-across framework via `tidyFlowCore`:
```{r}
transformed_bcr_flowset <-
bcr_flowset |>
dplyr::mutate(across(-ends_with("_id"), \(.x) asinh(.x / 5)))
```
### Cell type counting
Suppose we're interested in counting the number of cells that belong to each cell type (encoded in the `population_id` column of `bcr_flowset`) in our dataset. Using standard `flowCore` functions, we could perform this calculation in a few steps:
```{r}
# extract all expression matrices from our flowSet
combined_matrix <- flowCore::fsApply(bcr_flowset, exprs)
# take out the concatenated population_id column
combined_population_id <- combined_matrix[, 'population_id']
# perform the calculation
table(combined_population_id)
```
`tidyFlowCore` allows us to perform the same operation simply using the `dplyr` package's `count` function:
```{r}
bcr_flowset |>
dplyr::count(population_id)
```
And `tidyFlowCore` also makes it easy to perform the counting broken down by other variables in our metadata:
```{r}
bcr_flowset |>
# use the .tidyFlowCore_identifier pronoun to access the name of
# each experiment in the flowSet
dplyr::count(.tidyFlowCore_identifier, population_id)
```
### Nesting and unnesting
`flowFrame` and `flowSet` data objects have a clear relationship with one another in the `flowCore` API - essentially nested `flowFrame`s. In other words, `flowSet`s are made up of multiple `flowFrame`s!
`tidyFlowCore` provides a useful API for converting between `flowSet` and `flowFrame` data structures at various degrees of nesting using the `group`/`nest` and `ungroup`/`unnest` verbs. Note that in the dplyr and tidyr APIs, `group`/`nest` and `ungroup`/`unnest` are **not** synonyms (grouped `data.frames` are different from nested `data.frames`). However, because of how `flowFrame`s and `flowSet`s are structured, `tidyFlowCore`'s `group`/`nest` and `ungroup`/`unnest` functions have identical behavior, respectively.
```{r}
# unnesting a flowSet results in a flowFrame with an additional column,
# 'tidyFlowCore_name` that identifies cells based on which experiment in the
# original flowSet they come from
bcr_flowset |>
dplyr::ungroup()
```
```{r}
# flowSets can be unnested and renested for various analyses
bcr_flowset |>
dplyr::ungroup() |>
# group_by cell type
dplyr::group_by(population_id) |>
# calculate the mean HLA-DR expression of each cell population
dplyr::summarize(mean_expression = mean(`HLA-DR(Yb174)Dd`)) |>
dplyr::select(population_id, mean_expression)
```
### Plotting
`tidyFlowCore` also provides a direct interface between `ggplot2` and `flowFrame` or `flowSet` data objects. For example...
```{r}
# cell population names, from the HDCytoData documentation
population_names <-
c(
"B-cells IgM-",
"B-cells IgM+",
"CD4 T-cells",
"CD8 T-cells",
"DC",
"monocytes",
"NK cells",
"surface-"
)
# calculate mean CD20 expression across all cells
mean_cd20_expression <-
bcr_flowset |>
dplyr::ungroup() |>
dplyr::summarize(mean_expression = mean(asinh(`CD20(Sm147)Dd` / 5))) |>
dplyr::pull(mean_expression)
# calculate mean CD4 expression across all cells
mean_cd4_expression <-
bcr_flowset |>
dplyr::ungroup() |>
dplyr::summarize(mean_expression = mean(asinh(`CD4(Nd145)Dd` / 5))) |>
dplyr::pull(mean_expression)
bcr_flowset |>
# preprocess all columns that represent protein measurements
dplyr::mutate(dplyr::across(-ends_with("_id"), \(.x) asinh(.x / 5))) |>
# plot a CD4 vs. CD45 scatterplot
ggplot2::ggplot(ggplot2::aes(x = `CD20(Sm147)Dd`, y = `CD4(Nd145)Dd`)) +
# add some reference lines
ggplot2::geom_hline(
yintercept = mean_cd4_expression,
color = "red",
linetype = "dashed"
) +
ggplot2::geom_vline(
xintercept = mean_cd20_expression,
color = "red",
linetype = "dashed"
) +
ggplot2::geom_point(size = 0.1, alpha = 0.1) +
# facet by cell population
ggplot2::facet_wrap(
facets = ggplot2::vars(population_id),
labeller =
ggplot2::as_labeller(
\(population_id) population_names[as.numeric(population_id)]
)
) +
# axis labels
ggplot2::labs(
x = "CD20 expression (arcsinh)",
y = "CD4 expression (arcsinh)"
)
```
Using some standard functions from the `ggplot2` library, we can create a scatterplot of CD4 vs. CD20 expression in the different cell populations included in the `bcr_flowset` `flowSet`. We can see, unsurprisingly, that both B-cell populations are highest for CD20 expression, whereas CD4+ T-helper cells are highest for CD4 expression.
## Citation
Below is the citation output from running `citation('tidyFlowCore')` in R. Please
run this yourself to check for any updates on how to cite __tidyFlowCore__.
```{r 'citation', eval = requireNamespace('tidyFlowCore')}
print(citation('tidyFlowCore'), bibtex = TRUE)
```
Please note that the `tidyFlowCore` was only made possible thanks to many other R and bioinformatics software authors, which are cited either in the vignettes and/or the paper(s) describing this package.
## Code of Conduct
Please note that the `tidyFlowCore` project is released with a [Contributor Code of Conduct](http://bioconductor.org/about/code-of-conduct/). By contributing to this project, you agree to abide by its terms.
## Development tools
* Continuous code testing is possible thanks to [GitHub actions](https://www.tidyverse.org/blog/2020/04/usethis-1-6-0/) through `r BiocStyle::CRANpkg('usethis')`, `r BiocStyle::CRANpkg('remotes')`, and `r BiocStyle::CRANpkg('rcmdcheck')` customized to use [Bioconductor's docker containers](https://www.bioconductor.org/help/docker/) and `r BiocStyle::Biocpkg('BiocCheck')`.
* Code coverage assessment is possible thanks to [codecov](https://codecov.io/gh) and `r BiocStyle::CRANpkg('covr')`.
* The [documentation website](http://keyes-timothy.github.io/tidyFlowCore) is automatically updated thanks to `r BiocStyle::CRANpkg('pkgdown')`.
* The code is styled automatically thanks to `r BiocStyle::CRANpkg('styler')`.
* The documentation is formatted thanks to `r BiocStyle::CRANpkg('devtools')` and `r BiocStyle::CRANpkg('roxygen2')`.
For more details, check the `dev` directory.
This package was developed using `r BiocStyle::Biocpkg('biocthis')`.
Owner
- Name: Timothy Keyes
- Login: keyes-timothy
- Kind: user
- Company: Stanford University
- Twitter: timothykeyes
- Repositories: 1
- Profile: https://github.com/keyes-timothy
MD/PhD student and aspiring biomedical data scientist at Stanford. Data scientist for the Medical Student Pride Alliance. Princeton '14.
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- bioconductor 2,518 total
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
bioconductor.org: tidyFlowCore
tidyFlowCore: Bringing flowCore to the tidyverse
- Homepage: https://github.com/keyes-timothy/tidyFlowCore https://keyes-timothy.github.io/tidyFlowCore/
- Documentation: https://bioconductor.org/packages/release/bioc/vignettes/tidyFlowCore/inst/doc/tidyFlowCore.pdf
- License: MIT + file LICENSE
-
Latest release: 1.2.0
published 11 months ago
Rankings
Dependent repos count: 0.0%
Dependent packages count: 31.5%
Average: 42.0%
Downloads: 94.6%
Maintainers (1)
Last synced:
7 months ago
Dependencies
.github/workflows/check-bioc.yml
actions
- JamesIves/github-pages-deploy-action releases/v4 composite
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/upload-artifact master composite
- docker/build-push-action v4 composite
- docker/login-action v2 composite
- docker/setup-buildx-action v2 composite
- docker/setup-qemu-action v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action v4.5.0 composite
- actions/checkout v4 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v4 composite
- actions/upload-artifact v4 composite
- codecov/codecov-action v4.0.1 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 4.3 depends
- Biobase * imports
- dplyr * imports
- flowCore * imports
- ggplot2 * imports
- methods * imports
- purrr * imports
- rlang * imports
- stats * imports
- stringr * imports
- tibble * imports
- tidyr * imports
- BiocStyle * suggests
- HDCytoData * suggests
- RefManageR * suggests
- knitr * suggests
- rmarkdown * suggests
- sessioninfo * suggests
- testthat >= 3.0.0 suggests