https://github.com/alleninstitute/scrattch.io

Functions for handling RNA-seq files and formats as input and output for scrattch functions.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 3 committers (33.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.8%) to scientific vocabulary

Keywords

10xgenomics hdf5 loom r tome transcriptomics

Keywords from Contributors

scrattch umbrella

Last synced: 6 months ago · JSON representation

Repository

Functions for handling RNA-seq files and formats as input and output for scrattch functions.

Basic Info

Host: GitHub
Owner: AllenInstitute
License: other
Language: R
Default Branch: master
Size: 16.7 MB

Statistics

Stars: 11
Watchers: 12
Forks: 3
Open Issues: 11
Releases: 0

Topics

10xgenomics hdf5 loom r tome transcriptomics

Created about 8 years ago · Last pushed over 4 years ago

https://github.com/AllenInstitute/scrattch.io/blob/master/

# scrattch.io: scrattch File Input/Output Handling



master: [![Build Status](https://travis-ci.org/AllenInstitute/scrattch.io.svg?branch=master)](https://travis-ci.org/AllenInstitute/scrattch.io)  
dev: [![Build Status](https://travis-ci.org/AllenInstitute/scrattch.io.svg?branch=dev)](https://travis-ci.org/AllenInstitute/scrattch.io)  

## Installation

scrattch.io requires the `rhdf5` package from BioConductor, which can be installed with:
```
source("https://bioconductor.org/biocLite.R")
biocLite("rhdf5")
```

Once `rhdf5` is in place, scrattch.io can be installed from github:
```
devtools::install_github("AllenInstitute/scrattch.io")
```

If you'd like to use the developer branch where we're testing out new code, it can be installed using:
```
devtools::install_github("AllenInstitute/scrattch.io", ref = "dev")
```

## .tome files
A major component of scrattch.io is a set of helpful functions for writing and reading .tome files, which are an HDF5-based format for **t**ranscriptomics in an **o**pen, **m**odular, **e**xtensible format.  

### Why another HDF5 format for transcriptomics?  
Existing formats for transcriptomics are either designed for fast computation, like .loom, or a small storage footprint, like the .h5 files generated by 10X Genomics' cellRanger. The goal of .tome 
is to combine compact storage with reasonably fast random access of both genes and samples.

This is accomplished by storing the main data matrix in a sparse format, based on [dgCMatrix from the R Matrix package](https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/dgCMatrix-class.html), stored in both orientations. This structure is also chunked and compressed to speed access and reduce file size. The compression level can be changed depending on how quickly you need to read your data (see `?write_tome_data` for details).

The practical upshot of this strategy is that .tome files are ~1/10th the size of .loom files for storage of data from 10X genomics experiments, while providing a way to read gene or sample data for display quickly.

Many additional metadata can be stored in .tome files as well, from sample annotations to precomputed statistics.

The [.tome cheatsheets on Google Docs](https://docs.google.com/spreadsheets/d/1tJUgnfEXUv1IuzGAykDCTIUTsgzEWkT-jfl4UcEUl48/edit?usp=sharing) is a helpful reference for where scrattch.io stores these within the HDF5 file structure, and which functions can be used to read and write these objects.

.tome is intended to be extensible. Want to store something that isn't already provided? Check out the Generic functions section of the [.tome cheatsheet](https://docs.google.com/spreadsheets/d/1tJUgnfEXUv1IuzGAykDCTIUTsgzEWkT-jfl4UcEUl48/edit?usp=sharing), to add your own data however it makes sense to you.

## .loom files
scrattch.io also includes simple functions for reading matrices, annotations, and projections from .loom files with `read_loom_dgCMatrix()`, `read_loom_anno()`, and `read_loom_projections()`, respectively.

You can find out more about the .loom format, developed by the Linnarsson lab, here: [loompy.org](http://loompy.org/)

A more complete implementation of the .loom format in R is available from the Satija lab's loomR package on Github here: [mojaveazure/loomR](https://github.com/mojaveazure/loomR)

## 10X Genomics files
scrattch.io includes the ability to read the data matrix from the .h5 files that are output by CellRanger in [HDF5 Gene-Barcode Matrix Format](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices) with `read_10x_dgCMatrix()`.

## .h5ad files
scrattch.io also supports reading the main data matrix from .h5ad files that are generated by tools like [Scanpy](https://github.com/theislab/scanpy) with `read_h5ad_dgCMatrix()`.

## The `scrattch` suite

`scrattch.io` is one component of the [scrattch](https://github.com/AllenInstitute/scrattch/) suite of packages for Single Cell RNA-seq Analysis for Transcriptomic Type CHaracterization from the Allen Institute.

## License

The license for this package is available on Github at: https://github.com/AllenInstitute/scrattch.io/blob/master/LICENSE

## Level of Support

We are planning on occasional updating this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests.

## Contribution Agreement

If you contribute code to this repository through pull requests or other mechanisms, you are subject to the Allen Institute Contribution Agreement, which is available in full at: https://github.com/AllenInstitute/scrattch.io/blob/master/CONTRIBUTION

Owner

Name: Allen Institute
Login: AllenInstitute
Kind: organization
Location: Seattle, WA

Website: https://alleninstitute.org
Repositories: 184
Profile: https://github.com/AllenInstitute

Please visit http://alleninstitute.github.io/ for more information.

GitHub Events

Total

Last Year

Committers

Last synced: 10 months ago

All Time

Total Commits: 152
Total Committers: 3
Avg Commits per committer: 50.667
Development Distribution Score (DDS): 0.033

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Lucas Gray	s**t@g**m	147
Jeremy Miller	j**m@a**g	3
cvanvelt	4****t	2

Committer Domains (Top 20 + Academic)

alleninstitute.org: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 15
Total pull requests: 30
Average time to close issues: N/A
Average time to close pull requests: about 2 hours
Total issue authors: 5
Total pull request authors: 2
Average comments per issue: 1.4
Average comments per pull request: 0.07
Merged pull requests: 25
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

hypercompetent (6)
maximilianh (1)
matthewspeir (1)
daccachejoe (1)
jeremymiller (1)

Pull Request Authors

hypercompetent (21)
mmoisse (1)

Top Labels

Issue Labels

enhancement (5) bug (2) Optimization (2)

Pull Request Labels

Dependencies

DESCRIPTION cran

rhdf5 >= 2.24.0 depends
Matrix >= 1.2 imports
data.table * imports
dplyr >= 0.4.3 imports
lazyeval * imports
purrr >= 0.2.4 imports
viridisLite * imports
feather * suggests
testthat * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/alleninstitute/scrattch.io

Science Score: 10.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

https://github.com/AllenInstitute/scrattch.io/blob/master/

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies