Recent Releases of kb-python
kb-python - v0.29.3
- Add 10xv4
- --exact-barcodes in kb count to "correct" barcodes to an on-list using only exact matches (i.e. no mismatches permitted)
- bustools binary updated to 0.45.0
- Allow kb count -g None (i.e. not supplying a t2g.txt file) in which case a synthetic one is generated with each target/transcript being its own gene.
- Python
Published by Yenaled about 1 year ago
kb-python - v0.29.1
Updates since version 0.28.2:
Major: * Upgraded kallisto to 0.51.1 and bustools to 0.44.1 * Added lr-kallisto (--long) option, and enabling k>31 * Added kb extract * Added various kallisto binaries (w/ and w/o optimizations; w/ and w/o long k-mer sizes)
Other: * Allow -i NONE in kb ref to create t2g+fasta but no index * Various bug fixes (pandas version dependency, adata.X in nac containing total matrix, summing matrices not mishandling scientific notation, etc.) * Ended support for python 3.7
- Python
Published by Yenaled over 1 year ago
kb-python - v0.28.1
Anndata/loom files now have nascent/mature layers rather than unspliced/spliced layers.
--workflow=custom can take in multiple FASTA inputs
Allow --d-list to have comma-separated multiple FASTA files with URLs
Command-line options menu cleaned up a bit
- Python
Published by Yenaled over 2 years ago
kb-python - v0.27.3
General
- Bumped
ngs-tools>=1.7.3.
ref
- [DEPRECATION] Split index generation using
-nhas been fully deprecated. (Thanks to @amcdavid for catching a bug)
count
- Fixed a minor issue with
--workflow kite:10xFB, wherebustools projectwould be called beforebustools correct(the order should be opposite). This fix required a bump to thengs-toolsdependency. - Support for
--workflow lamannofor-x smartseq3. - [DEPRECATION] Counting using split indices by providing a comma-delimited list to
-ihas been fully deprecated. - Support for whitelist (
-w option) forbulk,smartseq2andsmartseq3technologies. - Added support for
-x 10XV3_ULTIMA.
- Python
Published by Lioscro almost 4 years ago
kb-python - v0.27.1
General
- [DEPRECATION] Support for split indices (with the
-noption) will be deprecated in the next major release. It is now recommended to use--include-attributeand--exclude-attributeoptions, similar to Cellranger'smkrefoptions (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references), tokb refto reduce index size and memory usage.
ref
- A remote URL may be provided as the
fasta(genomic FASTA) and/orgtf(gene annotation GTF) arguments. Support fromngs_tools 1.5.13. - GTF is now allowed to have 0-length segments (https://github.com/pachterlab/kallisto/issues/340).
count
- [DEPRECATION] Technology
SMARTSEQis now deprecated. All future uses should useBULK,SMARTSEQ2orSMARTSEQ3. - Genes that do not have a gene name will now have their gene IDs in the
gene_namecolumn (or theadata.var_namesif--gene-namesis used). - Support for
--workflow lamannofor-x BULKand-x SMARTSEQ2technologies.
- Python
Published by Lioscro about 4 years ago
kb-python - v0.27.0
General
- Added the
compilecommand. See below for more information. (#139) - Fixed an issue where a call to kallisto would hang indefinitely due to a full stderr buffer.
- Changed docstring style to Google-style. Added typings to all functions.
- Updated kallisto binaries to
v0.48.0. - Updated bustools binaries to
v0.41.0. - Added binary compatibility checks. If a binary is incompatible,
kb compileis suggested.
compile
- This command can be used to compile the
kallistoand/orbustoolsbinary from source. At the most basic level, it downloads the latest release source distributions from the respective GitHub repositories, compiles them, and places them wherekbcan automatically detect them. - The
targetpositional argument specifies which binary (or both) to compile. Possible values arekallisto,bustoolsandall. - The
--urloptional argument may be provided with a URL to a remote archive that will be used instead of the latest GitHub release. When this option is used,targetmay not beall. - * The
--refoptional argument may be provided with a commit hash or git tag. When this option is used,targetmay not beall. - The
-ooptional argument may be used to place the compiled binaries in a different directory. Note that if this option is used,--kallistoand--bustoolsoptions will have to be set appropriately when runningreforcount. - The
--viewoption may be used to simply view what binaries (their locations and versions) will be used bykb. - The
--removeoption may be used to remove existing compiled binaries. - The
--overwriteoption may be used to overwrite existing compiled binaries. - The
kallistocompilation follows https://pachterlab.github.io/kallisto/source and has the same dependencies. - The
bustoolscompilation follows https://bustools.github.io/source and has the same dependencies. - The
--cmake-argumentsargument may be used to pass in a string of additional arguments to pass directly to thecmakecommand. For instance, to manually specify additional include directories,--cmake-arguments "-DCMAKE_CXX_FLAGS='-I /usr/include'" - Note that the compilation is performed in shared mode, which means the binary will contain links to shared libraries (i.e. not statically linked).
ref
- Added
--include-attributeand--exclude-attributeoptions which can be used to include/exclude specific GTF entries based on their attributes. The argument to these options must be in the form of akey:valuepair, wherekeyis a GTF attribute name andvalueis the value of the aforementioned attribute to include/exclude. Only one of these two options may be specified, and each option may be specified more than once. When multiple--include-attributeare provided, GTF entries that have any one of the attributes will be processed. When multiple--exclude-attributeare provided, GTF entries that have any one of the attributes will not be processed.
count
- Added
--filter-thresholdoption to specify the barcode filter threshold. This option may only be used when also providing--filter bustoolsand indicates the minimum number of times a barcode must appear to be retained from filtering. (#142) - Added
--strandoption to override automatic strandedness setting bykallisto bus. Available options areunstranded,forward, andreverse. - Changed the
transcript_idscolumn to be a semicolon-delimited string instead of a list (only applicable when--tccis provided) as a workaround for an issue with writing lists to h5ad withh5py>=3. #141 - Added
BULKandSMARTSEQ2technologies. The two technologies behave identically. The FASTQs may be provided either directly via command-line (only for multiplexed samples), in which casekbwill perform demultiplexing, or as a single batch definition text file (only for demultiplexed samples). See https://pachterlab.github.io/kallisto/manual section aboutbatch.txtfor formatting. This batch textfile may also contain remote urls to FASTQ files, which will be streamed for supported operating systems. Additionally, added--parity,--fragment-land--fragment-soptions, which may only be provided for these technologies. The first must always be provided, indicating the parity of the reads (single,paired), and the latter two may only be provided when--parity singleis also provided, specifying the mean length of the fragments and standard deviation of the fragment lengths. - DEPRECATION The
SMARTSEQtechnology has been deprecated and will be removed in the next release. Instead,SMARTSEQ2should be used. See previous point for more information. - Added
SMARTSEQ3technology. - The full binary path is used for
--dry-runinstead of an alias. - Added
--umi-geneoption, which deduplicates UMIs by gene. Can not be used with smartseq or bulk technologies. - Added
--emoption, which estimated gene abundances using the EM algorithm. Can not be used with smartseq or bulk technologies, or with--tcc. - Fixed an issue that occurs when the
-ooption tobustools countalready exists, but as a directory. For instance,counts_unfiltered/cells_x_genes. Such folders are removed before running the command. - Improved output file validation so that all expected files must exist.
- Added
--gene-namesoption, which may only be used with--h5ador-loomand not--tcc. By specifying this option, the output h5ad or loom matrix will be aggregated by gene names instead of IDs. - Added support for the following technologies:
BDWTA(BD Rhapsody),SPLIT-SEQ,Visium(10x).
- Python
Published by Lioscro over 4 years ago
kb-python - [YANKED] v0.26.1
This version has been yanked due to an issue with installation. Do not try to install this version!
General
- Added a check for whether the temporary directory exists. If it does, now prints out an error and exits. (#119)
- Logging is now handled by a specialized logger implemented in the
ngs-toolslibrary, which provides logger namespacing. - Updated supported technologies text and syntax for
kb --listso that they are more compact. Added link to the kallisto manual for custom technology definitions. - Updated citation in
info.
ref
- Fixed
--tmpoption to set the temporary directory properly (#122) - Major refactor of FASTA and GTF parsing. All relevant functions were replaced with appropriate ones from the
ngs-toolslibrary. The ones provided in this library are far more robust in dealing with GTF entries (especially missing attributes). FASTA and GTF files no longer have to be sorted nor decompressed. These all result in an approximately order-of-magnitude speedup in splitting the genomic FASTA. Additionally, more helpful error messages are printed, which should help user debuggability. - Fixed an issue where no logging messages were displayed when downloading a reference with
-d.
count
- Whitelists are now provided by the
ngs-toolslibrary.
- Python
Published by Lioscro about 5 years ago
kb-python - v0.26.0
General
- Added the optional arguments
--kallistoand--bustools, which may be used to override the packaged kallisto and bustools binaries. The argument may be a command in the user's PATH, which will be expanded to the full absolute path, or an absolute/relative path to the binary (#109, thanks @apeltzer, @dpryan79, @Maarten-vd-Sande). ###ref - Any spaces in GTF groups are now removed. For instance, if a transcript has ID
TRANSCRIPT IDthen the resulting transcript sequence will be namedTRANSCRIPTID. (#97, thanks @axelalmet) ###count - Fixed an issue where converting the count matrix using
--loomand--workflow lamannowould cause an error (#91) - Fixed an issue with parsing FASTQ paths when using
-x smartseq, where the second read file would be erroneously used as the first (#114, thanks @jma1991) - Added entries to indicate the current working directory when the
kbcommand was called, along with thekallistoandbustoolsbinary paths and versions inkb_info.json.
- Python
Published by Lioscro about 5 years ago
kb-python - v0.25.1
count
- Fixed
loompy does not accept empty matrices as dataerror when providing--loomwith--workflow lamanno(#91) - When using
--h5ador--loomwith-x smartseq, the output matrix has genes as columns, instead of transcripts. For genes that have multiple transcripts, the counts are added. (#93) - For
-x smartseq, it is now possible to provide a batch TSV instead of FASTQs directly. The batch TSV must contain exactly three columns: cell ID, FASTQ 1 (read 1), FASTQ 2 (read 2). - Added an error when an uneven number of FASTQs are provided for
-x smartseq(only paired-end reads are currently supported) - Turned off all logging and warning messages from
h5pyandanndata.
- Python
Published by Lioscro over 5 years ago
kb-python - v0.25.0
ref
- Progress bar is now displayed when downloading pre-packaged reference files.
- Added checks to provide more useful outputs for common errors, including: 1) when FASTA and GTF chromosomes do not match, 2) when a GTF entry is not parsable, and 3) when either
transcriptorexonentry for a transcript is missing in the GTF (both are required). - Added
-koption to override default (or calculated optimal) kmer length for the Kallisto index. - Added functionality to generate a feature barcode reference for use with the KITE feature-barcoding workflow. To use this option, supply
--workflow kiteand a feature-barcode to cell-barcode mapping. - Added
-noption to be able to split indices intonparts. This reduces the maximum memory used at any given time. Useful for running in memory-limited environments. When the-noption is used, the-iargument is used as the prefix to thenindices generated. Each of these indices are appended with a.iwhereiis the index number, starting fromi=0. When-nis used the built indices must be passed in as a comma-delimited list tokb count(NOTE: this feature is EXPERIMENTAL Seecountfor more details). When-nis used with--workflow lamannoor--workflow nucleus, only the intron FASTA is split inton-1parts, which are then each indexed separately. The cDNA FASTA is indexed in its entirety and is never split. - Added functionality to build a single index using multiple references. Useful for mixed species experiments. The
fastaargument should be a comma-delimited list of genome FASTAs, and thegtfargument should be a comma-delimited list of GTFs, corresponding in position to each genome FASTA. - Added
--tmpoption to manually specify temporary directory. Otherwise, behavior is identical to previous version (tmpdirectory at the locationkbis executed). - Added support for IUPAC nucleotide code. Note that
kallistoreplaces non-ACGUT nucleotides to pseudorandom ones. Thanks @Maarten-vd-Sande ###count - Added support for KITE feature-barcoding workflow. The
bustoolsbinary was updated to support this feature. - DEPRECATION: The
--lamannoand--nucleusflags will be deprecated in the next release. These have been replaced with--workflow lamannoand--workflow nucleus. - All BUS files that are input/outputs are validated before/after running
kallistoorbustools. A BUS file is considered valid if it is read withbustoolswithout error and it has positive number of BUS records. This should preventbustoolsfrom trying to sort empty BUS files and crashing (#31). - Added functionality to generate TCC matrices with the
--tccflag. - Added
--tccflag to include reads that pseudoalign to multiple genes. - When running in verbose mode (
--verbose), commands are no longer printed with the full path to thebustoolsandkallistobinaries. These paths are printed once at the start of the program. - Added
--dry-runflag, which prints the entire workflow to standard output as shell commands, without actually running them. - EXPERIMENTAL: Added support for multiple indices by passing a comma-delimited list of indices to
-i.kbwill align the reads to each of these indices and merge the BUS files withbustools mashandbustools merge. This feature is currently EXPERIMENTAL, and there are known issues that cause the loss of reads. This feature will be fully supported in a future release. In the meantime, use at your own risk! - Added
--tmpoption to manually specify temporary directory. The default behavior has also changed: the defaulttmpdirectory is created IN THE OUTPUT FOLDER (specified by-o). Previously, thetmpdirectory was created wherekbwas run, which was causing issues when running multiple instances ofkbfrom the same location. Thanks to @Munfred and @kokitsuyuzaki for the suggestion. kbnow outputs akb_info.jsonwhich includes useful run information, such as the commands run and their runtimes.- Added functionality to generate a brief standalone HTML report that includes basic statistics (run_info.json, inspect.json) and quality-control plots (knee plot, elbow plot, pca, genes detected). This feature is available with the
--reportflag. Using this flag on velocity matrices may causekbto crash due to high memory usage, and a corresponding warning is printed at the start. Plots for TCC matrices are not supported. - When the matrix is converted to H5AD or Loom format (using the
--h5ador--loomoptions), the gene/feature names are included as a column in thevarof the anndata. Related to #52 - Added a
--cellrangeroption, which converts the raw gene matrices to cellranger-compatible format in a separate,cellrangerdirectory forstandardworkflow (andcellranger_splicedandcellranger_unsplicedforvelocityandnucleusworkflows). Note that cellranger outputs matrices with genes as rows and cells (barcodes) as columns. - Added
--mmflag to include bus records that pseudoalign to multiple genes, via the--multimappingflag inbustools count(#57). Nonecan be provided as the whitelist, which will forcekbto use thebustools whitelistcommand, even if there exists a pre-packaged whitelist.- Added support for Smart-seq reads with
-x smartseq. FASTQs are paired by first sorting the list of FASTQ paths in lexicographical order, and taking every two to be a pair. For instance, if1.fastq 3.fastq 2.fastq 4.fastqis provided,1.fastqand2.fastqwill be a pair, and3.fastq and 4.fastqwill be another pair. The FASTQ argument now supports glob expressions to make it easier to provide a long list of FASTQs.
- Python
Published by Lioscro over 5 years ago
kb-python -
--info
- Fix typo with
indropsv3
ref
- If any input (FASTA or GTF) files are provided as gzip files, they are uncompressed to the temporary directory, instead of being streamed directly. This is because
refrelies on being able to access arbitrary locations of the files quickly. Working with decompressed files results in a considerable speedup.
count
- For
--lamanno: spliced and unspliced busfiles no longer contain the.ssuffix. This was done to make the output consistent with the normal (non--lamanno) command - Implemented
--filterwith--lamanno - Support for single nuclei RNA-seq with
--nuclei. The only difference between--nucleiand--lamannois how the spliced and unspliced matrices are combined. Specifically,--nucleisums the matrices. Using--nucleiwith neither--loomnor--h5adresults in behavior identical with--lamanno.
- Python
Published by Lioscro over 6 years ago
kb-python -
kallisto
- Update to
0.46.1.
--info
- Updated information on indrop versions
- Python
Published by Lioscro over 6 years ago
kb-python -
count
- fix bug with
--filterwhere it would produce the same matrix as unfiltered
- Python
Published by Lioscro over 6 years ago
kb-python -
ref
kbnow provides a pre-built human index for RNA velocity (linnarsson)- The intronic fasta with the
--lamannooption now includes 30-base flanking regions.
count
- Unfiltered count matrices will always be placed in the
counts_unfilteredfolder. - If the
--filteroption is specified, the filtered count matrices will be placed in thecounts_filteredfolder.
- Python
Published by Lioscro over 6 years ago
kb-python -
count
-ois now optional. Defaults to current directory.
Documentation
- Developer docs hosted on Read the Docs.
- Python
Published by Lioscro over 6 years ago
kb-python -
count
- New
--filteroption. This option generates a filtered count matrix. It can not be used with--lamanno. Currently, this option only supports filtering withbustools.
- Python
Published by Lioscro over 6 years ago
kb-python -
info
- Updated information (now includes examples and citations)
- Python
Published by Lioscro over 6 years ago
kb-python -
kb
- New
--listoption to display supported single-cell technologies - Updated help messages and print statements overall
ref
- Can download prebuilt indices (but not for
--lamanno)
count
- Support for
--loomand--h5adwith--lamanno
- Python
Published by Lioscro over 6 years ago
kb-python -
info
- Displays current package, kallisto and bustools versions.
ref
- Input arguments have changed:
refnow takes a genomic FASTA file and GTF as positional arguments - The genome is sliced into cDNA and intron (if
--velocity) FASTAs - New option:
-cPath to the cDNA FASTA to be generated - New option:
-nPath to the intron FASTA to be generated (required for--velocity) - New option:
-aPath to generate cDNA transcripts to be captured (required for--velocity) - New option:
-rPath to generate intron transcripts to be captured (required for--velocity)
Binaries
- Kallisto and Bustools binaries for Windows, Linux and MacOS are now included in the package. Run
kb infofor versions.
- Python
Published by Lioscro over 6 years ago
kb-python -
ref
- New options:
--velocity,-c,-n - Implemented reference preparation for RNA velocity.
-cand-nmust be provided.
- Python
Published by Lioscro over 6 years ago
kb-python -
count
- Now runs
bustools inspectand outputs the results toinspect.jsonfile within the output directory.
- Python
Published by Lioscro over 6 years ago
kb-python -
count
- Convert count matrices into
loomfiles with--loomoption - Convert count matrices into
h5adfiles with--h5adoption
- Python
Published by Lioscro over 6 years ago