Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
2 of 5 committers (40.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Keywords from Contributors
Repository
R package to process BUS files
Basic Info
Statistics
- Stars: 10
- Watchers: 2
- Forks: 1
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
BUSpaRse
This package processes bus files generated from single-cell RNA-seq FASTQ files, e.g. using kallisto. The bus format is a table with 4 columns: Barcode, UMI, Set, and counts, that represent key information in single-cell RNA-seq datasets. See this paper for more information about the bus format. A gene count matrix for a single-cell RNA-seq experiment can be generated with the kallisto bus command and the bustools suite of programs many times faster than with other programs.
The most recent version of bustools can convert bus files to the gene count and transcript compatibility count (TCC) matrices very efficiently. This package has an alternative implementation of the algorithm that converts bus files to gene count and TCC matrices. This implementation is much less efficient (though still many times faster than, e.g., Cell Ranger). The purpose of this implementation is to facilitate experimentation with new algorithms or to adapt the methods for other applications. The implementation in this package is written in Rcpp, which is easier to work with than pure C++ code and requires less expertise of C++.
A file mapping transcripts to genes is required to convert the bus file to a gene count matrix, either with bustools or with this package. This package contains functions that produces this file or data frame, by directly querying Ensembl, by parsing GTF or GFF3 files, by extracting information from TxDb or EnsDb gene annotation resources from Bioconductor, or by parsing sequence names of fasta files of transcriptomes downloaded from Ensembl. This package can query Ensembl for not only vertebrates (i.e. www.ensembl.org), but also plants, fungi, invertebrates, and protists. Now the functions used to map transcript to genes can also filter by biotypes and only keep standard chromosomes, and extract filtered transcriptomes.
This package can also generate the files required for running RNA velocity with kallisto and bustools, including a fasta file with not only the transcriptome but also appropriately flanked intronic sequences, lists of transcripts and introns to be captured, and a file mapping transcripts and introns to genes. For spliced transcripts, you may either use the cDNA sequences, or exon-exon junctions, for pseudoalignment. Using exon-exon junctions should more unambiguously distinguish between spliced and unspliced transcripts, since unspliced transcripts also have exonic sequences.
Example
See the vignettes for examples of using kallisto bus, bustools, and BUSpaRse on real data. The vignettes contain a complete walk-through, starting with downloading the FASTQ files for an experiment and ending with an analysis. Google Colab version of those vignettes can be found here. Also see browseVignettes("BUSpaRse") for vignettes for using BUSpaRse to get gene count matrix and for extracting filtered transcriptomes with tr2g_* functions.
Installation
You can install development version of BUSpaRse with:
r
if (!require(devtools)) install.packages("devtools")
devtools::install_github("BUStools/BUSpaRse")
The release version can be installed from Bioconductor, or the development version with the version = "devel" argument:
r
BiocManager::install("BUSpaRse")
Owner
- Name: BUStools
- Login: BUStools
- Kind: organization
- Repositories: 6
- Profile: https://github.com/BUStools
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Lambda Moses | d****2@c****u | 224 |
| Nitesh Turaga | n****a@g****m | 12 |
| Peter Hickey | p****y@g****m | 1 |
| Lior Pachter | l****r@g****m | 1 |
| Lior Pachter | l****r@c****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 16
- Total pull requests: 1
- Average time to close issues: 2 months
- Average time to close pull requests: 3 minutes
- Total issue authors: 16
- Total pull request authors: 1
- Average comments per issue: 1.75
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- methornton (1)
- willtownes (1)
- jma1991 (1)
- hmassalha (1)
- palfalvi (1)
- jmzvillarreal (1)
- bsmith030465 (1)
- tangybat (1)
- rfarouni (1)
- lambdamoses (1)
- csoneson (1)
- Ci-TJ (1)
- biobenkj (1)
- skpalan (1)
- Simon-Coetzee (1)
Pull Request Authors
- PeteHaitch (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- bioconductor 22,749 total
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 8
- Total maintainers: 1
bioconductor.org: BUSpaRse
kallisto | bustools R utilities
- Homepage: https://github.com/BUStools/BUSpaRse
- Documentation: https://bioconductor.org/packages/release/bioc/vignettes/BUSpaRse/inst/doc/BUSpaRse.pdf
- License: BSD_2_clause + file LICENSE
-
Latest release: 1.22.1
published 8 months ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.6 depends
- AnnotationDbi * imports
- AnnotationFilter * imports
- BSgenome * imports
- BiocGenerics * imports
- Biostrings * imports
- GenomeInfoDb * imports
- GenomicFeatures * imports
- GenomicRanges * imports
- IRanges * imports
- Matrix * imports
- Rcpp * imports
- S4Vectors * imports
- biomaRt * imports
- dplyr * imports
- ensembldb * imports
- ggplot2 * imports
- magrittr * imports
- methods * imports
- plyranges * imports
- stats * imports
- stringr * imports
- tibble * imports
- tidyr * imports
- utils * imports
- zeallot * imports
- BSgenome.Hsapiens.UCSC.hg38 * suggests
- BiocStyle * suggests
- EnsDb.Hsapiens.v86 * suggests
- TENxBUSData * suggests
- TxDb.Hsapiens.UCSC.hg38.knownGene * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat * suggests