kb-python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing

https://github.com/pachterlab/kb_python

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 13 committers (7.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary

Keywords

bustools kallisto kb-python rna-velocity-estimation scrna-seq single-cell-rna-seq

Keywords from Contributors

genomics proteomics alphafold2 profiles reference alphafold transcriptomics archs4 interactive blast
Last synced: 6 months ago · JSON representation

Repository

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing

Basic Info
Statistics
  • Stars: 167
  • Watchers: 11
  • Forks: 25
  • Open Issues: 5
  • Releases: 34
Topics
bustools kallisto kb-python rna-velocity-estimation scrna-seq single-cell-rna-seq
Created over 6 years ago · Last pushed 9 months ago
Metadata Files
Readme License

README.md

kb-python

github version pypi version python versions status codecov pypi downloads docs license

kb-python is a python package for processing single-cell RNA-sequencing. It wraps the kallisto | bustools single-cell RNA-seq command line tools in order to unify multiple processing workflows.

kb-python was first developed by Kyung Hoi (Joseph) Min and A. Sina Booeshaghi while in Lior Pachter's lab at Caltech. If you use kb-python in a publication please cite*: Melsted, P., Booeshaghi, A.S., et al. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat Biotechnol 39, 813–818 (2021). https://doi.org/10.1038/s41587-021-00870-2

Installation

The latest release can be installed with

bash pip install kb-python

The development version can be installed with bash pip install git+https://github.com/pachterlab/kb_python

There are no prerequisite packages to install. The kallisto and bustools binaries are included with the package.

Usage

kb consists of five subcommands bash $ kb usage: kb [-h] [--list] <CMD> ... positional arguments: <CMD> info Display package and citation information compile Compile `kallisto` and `bustools` binaries from source ref Build a kallisto index and transcript-to-gene mapping count Generate count matrices from a set of single-cell FASTQ files extract Extract reads that were pseudoaligned to specific genes/transcripts (or extract all reads that were / were not pseudoaligned)

kb ref: generate a pseudoalignment index

The kb ref command takes in a species annotation file (GTF) and associated genome (FASTA) and builds a species-specific index for pseudoalignment of reads. This must be run before kb count. Internally, kb ref extracts the coding regions from the GTF and builds a transcriptome FASTA that is then indexed with kallisto index.

bash kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa <GENOME> <GENOME_ANNOTATION> - <GENOME> refers to a genome file (FASTA). - For example, the zebrafish genome is hosted by ensembl and can be downloaded here - <GENOME_ANNOTATION> refers to a genome annotation file (GTF) - For example, the zebrafish genome annotation file is hosted by ensembl and can be downloaded here - Note: The latest genome annotation and genome file for every species on ensembl can be found with the gget command-line tool.

Prebuilt indices are available at https://github.com/pachterlab/kallisto-transcriptome-indices

Examples

```bash

Index the transcriptome from genome FASTA (genome.fa.gz) and GTF (annotation.gtf.gz)

$ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa genome.fa.gz annotation.gtf.gz

An example for downloading a prebuilt reference for mouse

$ kb ref -d mouse -i index.idx -g t2g.txt

```

kb count: pseudoalign and count reads

The kb count command takes in the pseudoalignment index (built with kb ref) and sequencing reads generated by a sequencing machine to generate a count matrix. Internally, kb count runs numerous kallisto and bustools commands comprising a single-cell workflow for the specified technology that generated the sequencing reads.

bash kb count -i index.idx -g t2g.txt -o out/ -x <TECHNOLOGY> <FASTQ FILE[s]> - <TECHNOLOGY> refers to the assay that generated the sequencing reads. - For a list of supported assays run kb --list - <FASTQ FILE[s]> refers to the a list of FASTQ files generated - Different assays will have a different number of FASTQ files - Different assays will place the different features in different FASTQ files - For example, sequencing a 10xv3 library on a NextSeq Illumina sequencer usually results in two FASTQ files. - The R1.fastq.gz file (colloquially called "read 1") contains a 16 basepair cell barcode and a 12 basepair unique molecular identifier (UMI). - The R2.fastq.gz file (colloquially called "read 2") contains the cDNA associated with the cell barcode-UMI pair in read 1.

Examples

```bash

Quantify 10xv3 reads read1.fastq.gz and read2.fastq.gz

$ kb count -i index.idx -g t2g.txt -o out/ -x 10xv3 read1.fastq.gz read2.fastq.gz

```

kb info: display package and citation information

The kb info command prints out package information including the version of kb-python, kallisto, and bustools along with their installation location.

```bash $ kb info kb_python 0.29.5 ... kallisto: 0.51.1 ... bustools: 0.45.1 ... ...

```

kb compile: compile kallisto and bustools binaries from source

The kb compile command grabs the latest kallisto and bustools source and compiles the binaries. Note: this is not required to run kb-python.

Use cases

kb-python facilitates fast and uniform pre-processing of single-cell sequencing data to answer relevant research questions. ```bash $ pip install kb-python gget ffq

Goal: quantify publicly available scRNAseq data

$ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa $(gget ref --ftp -w dna,gtf homo_sapiens) $ kb count -i index.idx -g t2g.txt -x 10xv3 -o out $(ffq --ftp SRR10668798 | jq -r '.[] | .url' | tr '\n' ' ')

-> count matrix in out/ folder

Goal: quantify 10xv2 feature barcode data, feature_barcodes.txt is a tab-delimited file

containing barcodesequencebarcodename

$ kb ref -i index.idx -g f2g.txt -f1 features.fa --workflow kite feature_barcodes.txt $ kb count -i index.idx -g f2b.txt -x 10xv2 -o out/ --workflow kite --h5ad R1.fastq.gz R2.fastq.gz

-> count matrix in out/ folder

``` Submitted by @sbooeshaghi.

Do you have a cool use case for kb-python? Submit a PR (including the goal, code snippet, and your username) so that we can feature it here.

Tutorials

For a list of tutorials that use kb-python please see https://www.kallistobus.tools/.

Documentation

Developer documentation is hosted on Read the Docs.

Contributing

Thank you for wanting to improve kb-python! If you have believe you've found a bug, please submit an issue.

If you have a new feature you'd like to add to kb-python please create a pull request. Pull requests should contain a message detailing the exact changes made, the reasons for the change, and tests that check for the correctness of those changes.

Cite

If you use kb-python in a publication, please cite the following papers:

kb-python & kallisto and/or bustools @article{sullivan2023kallisto, title={kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq}, author={Sullivan, Delaney K and Min, Kyung Hoi and Hj{\"o}rleifsson, Kristj{\'a}n Eldj{\'a}rn and Luebbert, Laura and Holley, Guillaume and Moses, Lambda and Gustafsson, Johan and Bray, Nicolas L and Pimentel, Harold and Booeshaghi, A Sina and others}, journal={bioRxiv}, pages={2023--11}, year={2023}, publisher={Cold Spring Harbor Laboratory} }

bustools tex @article{melsted2021modular, title={\href{https://doi.org/10.1038/s41587-021-00870-2}{Modular, efficient and constant-memory single-cell RNA-seq preprocessing}}, author={Melsted, P{\'a}ll and Booeshaghi, A. Sina and Liu, Lauren and Gao, Fan and Lu, Lambda and Min, Kyung Hoi Joseph and da Veiga Beltrame, Eduardo and Hj{\"o}rleifsson, Kristj{\'a}n Eldj{\'a}rn and Gehring, Jase and Pachter, Lior}, author+an={1=first;2=first,highlight}, journal={Nature biotechnology}, year={2021}, month={4}, day={1}, doi={https://doi.org/10.1038/s41587-021-00870-2} }

kallisto tex @article{bray2016near, title={Near-optimal probabilistic RNA-seq quantification}, author={Bray, Nicolas L and Pimentel, Harold and Melsted, P{\'a}ll and Pachter, Lior}, journal={Nature biotechnology}, volume={34}, number={5}, pages={525--527}, year={2016}, publisher={Nature Publishing Group} }

kITE tex @article{booeshaghi2024quantifying, title={Quantifying orthogonal barcodes for sequence census assays}, author={Booeshaghi, A Sina and Min, Kyung Hoi and Gehring, Jase and Pachter, Lior}, journal={Bioinformatics Advances}, volume={4}, number={1}, pages={vbad181}, year={2024}, publisher={Oxford University Press} }

BUS format tex @article{melsted2019barcode, title={The barcode, UMI, set format and BUStools}, author={Melsted, P{\'a}ll and Ntranos, Vasilis and Pachter, Lior}, journal={Bioinformatics}, volume={35}, number={21}, pages={4472--4473}, year={2019}, publisher={Oxford University Press} }

kb-python was inspired by Sten Linnarsson’s loompy fromfq command (http://linnarssonlab.org/loompy/kallisto/index.html)

Owner

  • Name: Pachter Lab
  • Login: pachterlab
  • Kind: organization
  • Email: lpachter@caltech.edu
  • Location: Pasadena, CA

GitHub Events

Total
  • Create event: 6
  • Release event: 3
  • Issues event: 39
  • Watch event: 16
  • Delete event: 2
  • Member event: 1
  • Issue comment event: 95
  • Push event: 20
  • Pull request event: 6
  • Fork event: 1
Last Year
  • Create event: 6
  • Release event: 3
  • Issues event: 39
  • Watch event: 16
  • Delete event: 2
  • Member event: 1
  • Issue comment event: 95
  • Push event: 20
  • Pull request event: 6
  • Fork event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 595
  • Total Committers: 13
  • Avg Commits per committer: 45.769
  • Development Distribution Score (DDS): 0.541
Past Year
  • Commits: 124
  • Committers: 4
  • Avg Commits per committer: 31.0
  • Development Distribution Score (DDS): 0.395
Top Committers
Name Email Commits
Delaney Sullivan d****n@g****m 273
Lioscro k****n@c****u 187
Laura Luebbert, Ph.D. 5****t 105
Lior Pachter l****r@g****m 8
josephrich98 j****8@g****m 5
biobenkj b****3@g****m 5
techno-sam 7****m 3
Sina Booeshaghi s****i 3
ricomnl r****7@g****m 2
dependabot[bot] 4****] 1
Yossi Farjoun f****n@g****m 1
TrellixVulnTeam c****d@t****m 1
BuildTools u****d@n****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 162
  • Total pull requests: 32
  • Average time to close issues: 26 days
  • Average time to close pull requests: 27 days
  • Total issue authors: 122
  • Total pull request authors: 10
  • Average comments per issue: 3.96
  • Average comments per pull request: 0.56
  • Merged pull requests: 30
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 25
  • Pull requests: 7
  • Average time to close issues: 23 days
  • Average time to close pull requests: about 21 hours
  • Issue authors: 23
  • Pull request authors: 2
  • Average comments per issue: 3.0
  • Average comments per pull request: 0.43
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • NikTuzov (6)
  • franziskadenk (4)
  • akhst7 (4)
  • biobenkj (4)
  • BenjaminDEMAILLE (3)
  • anniliu7 (3)
  • yeroslaviz (3)
  • shashkat (3)
  • gauravgadhvi (2)
  • sbooeshaghi (2)
  • jkniehaus (2)
  • nrclaudio (2)
  • jma1991 (2)
  • yfarjoun (2)
  • kaushik-roy-physics (2)
Pull Request Authors
  • Yenaled (12)
  • Lioscro (10)
  • techno-sam (2)
  • josephrich98 (2)
  • JohnMMa (1)
  • yfarjoun (1)
  • ricomnl (1)
  • dependabot[bot] (1)
  • TrellixVulnTeam (1)
  • sbooeshaghi (1)
Top Labels
Issue Labels
Stale (84) bug (2) enhancement (2)
Pull Request Labels
dependencies (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 1,012 last-month
  • Total docker downloads: 118
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 66
  • Total maintainers: 3
proxy.golang.org: github.com/pachterlab/kb_python
  • Versions: 39
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
pypi.org: kb-python

Python wrapper around kallisto | bustools for scRNA-seq analysis

  • Versions: 26
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 1,012 Last month
  • Docker Downloads: 118
Rankings
Docker downloads count: 2.7%
Stargazers count: 6.4%
Forks count: 8.2%
Downloads: 9.4%
Average: 9.7%
Dependent packages count: 10.0%
Dependent repos count: 21.7%
Maintainers (2)
Last synced: 6 months ago
spack.io: py-kb-python

Python wrapper around kallisto | bustools for scRNA-seq analysis.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Stargazers count: 18.4%
Forks count: 22.0%
Average: 24.4%
Dependent packages count: 57.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

dev-requirements.txt pypi
  • bumpversion ==0.6.0 development
  • coverage ==5.1 development
  • flake8 ==3.8.2 development
  • nose ==1.3.7 development
  • pre-commit ==2.4.0 development
  • sphinx >=3.3.1 development
  • sphinx-autoapi >=1.5.1 development
  • sphinx_rtd_theme >=0.5.0 development
  • twine >=2.0.0 development
  • wheel ==0.34.2 development
  • yapf ==0.30.0 development
docs/requirements.txt pypi
  • sphinx-autoapi >=1.2.1
  • sphinx-rtd-theme >=0.4.3
requirements.txt pypi
  • Jinja2 >2.10.1
  • anndata >=0.6.22.post1
  • h5py >=2.10.0
  • loompy >=3.0.6
  • nbconvert >=5.6.0
  • nbformat >=4.4.0
  • ngs-tools >=1.7.3
  • numpy >=1.17.2
  • pandas >=1.0.0
  • plotly >=4.5.0
  • requests >=2.22.0
  • scanpy >=1.4.4.post1
  • scikit-learn >=0.21.3
  • typing-extensions >=3.7.4
.github/workflows/ci.yml actions
  • actions/checkout master composite
  • actions/setup-python v1 composite
.github/workflows/release.yml actions
  • actions/checkout master composite
  • actions/setup-python v1 composite
.github/workflows/stale.yml actions
  • actions/stale v1 composite
setup.py pypi