kb-python
A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 13 committers (7.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
Basic Info
- Host: GitHub
- Owner: pachterlab
- License: bsd-2-clause
- Language: Python
- Default Branch: master
- Homepage: https://www.kallistobus.tools/
- Size: 229 MB
Statistics
- Stars: 167
- Watchers: 11
- Forks: 25
- Open Issues: 5
- Releases: 34
Topics
Metadata Files
README.md
kb-python
kb-python is a python package for processing single-cell RNA-sequencing. It wraps the kallisto | bustools single-cell RNA-seq command line tools in order to unify multiple processing workflows.
kb-python was first developed by Kyung Hoi (Joseph) Min and A. Sina Booeshaghi while in Lior Pachter's lab at Caltech. If you use kb-python in a publication please cite*:
Melsted, P., Booeshaghi, A.S., et al.
Modular, efficient and constant-memory single-cell RNA-seq preprocessing.
Nat Biotechnol 39, 813–818 (2021).
https://doi.org/10.1038/s41587-021-00870-2
Installation
The latest release can be installed with
bash
pip install kb-python
The development version can be installed with
bash
pip install git+https://github.com/pachterlab/kb_python
There are no prerequisite packages to install. The kallisto and bustools binaries are included with the package.
Usage
kb consists of five subcommands
bash
$ kb
usage: kb [-h] [--list] <CMD> ...
positional arguments:
<CMD>
info Display package and citation information
compile Compile `kallisto` and `bustools` binaries from source
ref Build a kallisto index and transcript-to-gene mapping
count Generate count matrices from a set of single-cell FASTQ files
extract Extract reads that were pseudoaligned to specific genes/transcripts (or extract all reads that were / were not pseudoaligned)
kb ref: generate a pseudoalignment index
The kb ref command takes in a species annotation file (GTF) and associated genome (FASTA) and builds a species-specific index for pseudoalignment of reads. This must be run before kb count. Internally, kb ref extracts the coding regions from the GTF and builds a transcriptome FASTA that is then indexed with kallisto index.
bash
kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa <GENOME> <GENOME_ANNOTATION>
- <GENOME> refers to a genome file (FASTA).
- For example, the zebrafish genome is hosted by ensembl and can be downloaded here
- <GENOME_ANNOTATION> refers to a genome annotation file (GTF)
- For example, the zebrafish genome annotation file is hosted by ensembl and can be downloaded here
- Note: The latest genome annotation and genome file for every species on ensembl can be found with the gget command-line tool.
Prebuilt indices are available at https://github.com/pachterlab/kallisto-transcriptome-indices
Examples
```bash
Index the transcriptome from genome FASTA (genome.fa.gz) and GTF (annotation.gtf.gz)
$ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa genome.fa.gz annotation.gtf.gz
An example for downloading a prebuilt reference for mouse
$ kb ref -d mouse -i index.idx -g t2g.txt
```
kb count: pseudoalign and count reads
The kb count command takes in the pseudoalignment index (built with kb ref) and sequencing reads generated by a sequencing machine to generate a count matrix. Internally, kb count runs numerous kallisto and bustools commands comprising a single-cell workflow for the specified technology that generated the sequencing reads.
bash
kb count -i index.idx -g t2g.txt -o out/ -x <TECHNOLOGY> <FASTQ FILE[s]>
- <TECHNOLOGY> refers to the assay that generated the sequencing reads.
- For a list of supported assays run kb --list
- <FASTQ FILE[s]> refers to the a list of FASTQ files generated
- Different assays will have a different number of FASTQ files
- Different assays will place the different features in different FASTQ files
- For example, sequencing a 10xv3 library on a NextSeq Illumina sequencer usually results in two FASTQ files.
- The R1.fastq.gz file (colloquially called "read 1") contains a 16 basepair cell barcode and a 12 basepair unique molecular identifier (UMI).
- The R2.fastq.gz file (colloquially called "read 2") contains the cDNA associated with the cell barcode-UMI pair in read 1.
Examples
```bash
Quantify 10xv3 reads read1.fastq.gz and read2.fastq.gz
$ kb count -i index.idx -g t2g.txt -o out/ -x 10xv3 read1.fastq.gz read2.fastq.gz
```
kb info: display package and citation information
The kb info command prints out package information including the version of kb-python, kallisto, and bustools along with their installation location.
```bash $ kb info kb_python 0.29.5 ... kallisto: 0.51.1 ... bustools: 0.45.1 ... ...
```
kb compile: compile kallisto and bustools binaries from source
The kb compile command grabs the latest kallisto and bustools source and compiles the binaries. Note: this is not required to run kb-python.
Use cases
kb-python facilitates fast and uniform pre-processing of single-cell sequencing data to answer relevant research questions.
```bash
$ pip install kb-python gget ffq
Goal: quantify publicly available scRNAseq data
$ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa $(gget ref --ftp -w dna,gtf homo_sapiens) $ kb count -i index.idx -g t2g.txt -x 10xv3 -o out $(ffq --ftp SRR10668798 | jq -r '.[] | .url' | tr '\n' ' ')
-> count matrix in out/ folder
Goal: quantify 10xv2 feature barcode data, feature_barcodes.txt is a tab-delimited file
containing barcodesequencebarcode name
$ kb ref -i index.idx -g f2g.txt -f1 features.fa --workflow kite feature_barcodes.txt $ kb count -i index.idx -g f2b.txt -x 10xv2 -o out/ --workflow kite --h5ad R1.fastq.gz R2.fastq.gz
-> count matrix in out/ folder
``` Submitted by @sbooeshaghi.
Do you have a cool use case for kb-python? Submit a PR (including the goal, code snippet, and your username) so that we can feature it here.
Tutorials
For a list of tutorials that use kb-python please see https://www.kallistobus.tools/.
Documentation
Developer documentation is hosted on Read the Docs.
Contributing
Thank you for wanting to improve kb-python! If you have believe you've found a bug, please submit an issue.
If you have a new feature you'd like to add to kb-python please create a pull request. Pull requests should contain a message detailing the exact changes made, the reasons for the change, and tests that check for the correctness of those changes.
Cite
If you use kb-python in a publication, please cite the following papers:
kb-python & kallisto and/or bustools
@article{sullivan2023kallisto,
title={kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq},
author={Sullivan, Delaney K and Min, Kyung Hoi and Hj{\"o}rleifsson, Kristj{\'a}n Eldj{\'a}rn and Luebbert, Laura and Holley, Guillaume and Moses, Lambda and Gustafsson, Johan and Bray, Nicolas L and Pimentel, Harold and Booeshaghi, A Sina and others},
journal={bioRxiv},
pages={2023--11},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}
bustools
tex
@article{melsted2021modular,
title={\href{https://doi.org/10.1038/s41587-021-00870-2}{Modular, efficient and constant-memory single-cell RNA-seq preprocessing}},
author={Melsted, P{\'a}ll and Booeshaghi, A. Sina and Liu, Lauren and Gao, Fan and Lu, Lambda and Min, Kyung Hoi Joseph and da Veiga Beltrame, Eduardo and Hj{\"o}rleifsson, Kristj{\'a}n Eldj{\'a}rn and Gehring, Jase and Pachter, Lior},
author+an={1=first;2=first,highlight},
journal={Nature biotechnology},
year={2021},
month={4},
day={1},
doi={https://doi.org/10.1038/s41587-021-00870-2}
}
kallisto
tex
@article{bray2016near,
title={Near-optimal probabilistic RNA-seq quantification},
author={Bray, Nicolas L and Pimentel, Harold and Melsted, P{\'a}ll and Pachter, Lior},
journal={Nature biotechnology},
volume={34},
number={5},
pages={525--527},
year={2016},
publisher={Nature Publishing Group}
}
kITE
tex
@article{booeshaghi2024quantifying,
title={Quantifying orthogonal barcodes for sequence census assays},
author={Booeshaghi, A Sina and Min, Kyung Hoi and Gehring, Jase and Pachter, Lior},
journal={Bioinformatics Advances},
volume={4},
number={1},
pages={vbad181},
year={2024},
publisher={Oxford University Press}
}
BUS format
tex
@article{melsted2019barcode,
title={The barcode, UMI, set format and BUStools},
author={Melsted, P{\'a}ll and Ntranos, Vasilis and Pachter, Lior},
journal={Bioinformatics},
volume={35},
number={21},
pages={4472--4473},
year={2019},
publisher={Oxford University Press}
}
kb-python was inspired by Sten Linnarsson’s loompy fromfq command (http://linnarssonlab.org/loompy/kallisto/index.html)
Owner
- Name: Pachter Lab
- Login: pachterlab
- Kind: organization
- Email: lpachter@caltech.edu
- Location: Pasadena, CA
- Website: http://pachterlab.github.io
- Repositories: 128
- Profile: https://github.com/pachterlab
GitHub Events
Total
- Create event: 6
- Release event: 3
- Issues event: 39
- Watch event: 16
- Delete event: 2
- Member event: 1
- Issue comment event: 95
- Push event: 20
- Pull request event: 6
- Fork event: 1
Last Year
- Create event: 6
- Release event: 3
- Issues event: 39
- Watch event: 16
- Delete event: 2
- Member event: 1
- Issue comment event: 95
- Push event: 20
- Pull request event: 6
- Fork event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Delaney Sullivan | d****n@g****m | 273 |
| Lioscro | k****n@c****u | 187 |
| Laura Luebbert, Ph.D. | 5****t | 105 |
| Lior Pachter | l****r@g****m | 8 |
| josephrich98 | j****8@g****m | 5 |
| biobenkj | b****3@g****m | 5 |
| techno-sam | 7****m | 3 |
| Sina Booeshaghi | s****i | 3 |
| ricomnl | r****7@g****m | 2 |
| dependabot[bot] | 4****] | 1 |
| Yossi Farjoun | f****n@g****m | 1 |
| TrellixVulnTeam | c****d@t****m | 1 |
| BuildTools | u****d@n****g | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 162
- Total pull requests: 32
- Average time to close issues: 26 days
- Average time to close pull requests: 27 days
- Total issue authors: 122
- Total pull request authors: 10
- Average comments per issue: 3.96
- Average comments per pull request: 0.56
- Merged pull requests: 30
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 25
- Pull requests: 7
- Average time to close issues: 23 days
- Average time to close pull requests: about 21 hours
- Issue authors: 23
- Pull request authors: 2
- Average comments per issue: 3.0
- Average comments per pull request: 0.43
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- NikTuzov (6)
- franziskadenk (4)
- akhst7 (4)
- biobenkj (4)
- BenjaminDEMAILLE (3)
- anniliu7 (3)
- yeroslaviz (3)
- shashkat (3)
- gauravgadhvi (2)
- sbooeshaghi (2)
- jkniehaus (2)
- nrclaudio (2)
- jma1991 (2)
- yfarjoun (2)
- kaushik-roy-physics (2)
Pull Request Authors
- Yenaled (12)
- Lioscro (10)
- techno-sam (2)
- josephrich98 (2)
- JohnMMa (1)
- yfarjoun (1)
- ricomnl (1)
- dependabot[bot] (1)
- TrellixVulnTeam (1)
- sbooeshaghi (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 1,012 last-month
- Total docker downloads: 118
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 1
(may contain duplicates) - Total versions: 66
- Total maintainers: 3
proxy.golang.org: github.com/pachterlab/kb_python
- Documentation: https://pkg.go.dev/github.com/pachterlab/kb_python#section-documentation
- License: bsd-2-clause
-
Latest release: v0.29.5
published 9 months ago
Rankings
pypi.org: kb-python
Python wrapper around kallisto | bustools for scRNA-seq analysis
- Homepage: https://github.com/pachterlab/kb_python
- Documentation: https://kb-python.readthedocs.io/
- License: BSD
-
Latest release: 0.29.5
published 9 months ago
Rankings
spack.io: py-kb-python
Python wrapper around kallisto | bustools for scRNA-seq analysis.
- Homepage: https://github.com/pachterlab/kb_python
- License: []
-
Latest release: 0.27.3
published about 3 years ago
Rankings
Maintainers (1)
Dependencies
- bumpversion ==0.6.0 development
- coverage ==5.1 development
- flake8 ==3.8.2 development
- nose ==1.3.7 development
- pre-commit ==2.4.0 development
- sphinx >=3.3.1 development
- sphinx-autoapi >=1.5.1 development
- sphinx_rtd_theme >=0.5.0 development
- twine >=2.0.0 development
- wheel ==0.34.2 development
- yapf ==0.30.0 development
- sphinx-autoapi >=1.2.1
- sphinx-rtd-theme >=0.4.3
- Jinja2 >2.10.1
- anndata >=0.6.22.post1
- h5py >=2.10.0
- loompy >=3.0.6
- nbconvert >=5.6.0
- nbformat >=4.4.0
- ngs-tools >=1.7.3
- numpy >=1.17.2
- pandas >=1.0.0
- plotly >=4.5.0
- requests >=2.22.0
- scanpy >=1.4.4.post1
- scikit-learn >=0.21.3
- typing-extensions >=3.7.4
- actions/checkout master composite
- actions/setup-python v1 composite
- actions/checkout master composite
- actions/setup-python v1 composite
- actions/stale v1 composite