kraken-biom

Create BIOM-format tables (http://biom-format.org) from Kraken output (http://ccb.jhu.edu/software/kraken/, https://github.com/DerrickWood/kraken).

https://github.com/smdabdoub/kraken-biom

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 7 committers (28.6%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary

Keywords

bioinformatics biom-format kraken metagenomics taxonomic-classification taxonomy
Last synced: 9 months ago · JSON representation

Repository

Create BIOM-format tables (http://biom-format.org) from Kraken output (http://ccb.jhu.edu/software/kraken/, https://github.com/DerrickWood/kraken).

Basic Info
  • Host: GitHub
  • Owner: smdabdoub
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 43.9 KB
Statistics
  • Stars: 56
  • Watchers: 2
  • Forks: 17
  • Open Issues: 15
  • Releases: 0
Topics
bioinformatics biom-format kraken metagenomics taxonomic-classification taxonomy
Created about 10 years ago · Last pushed about 4 years ago
Metadata Files
Readme Changelog Contributing License

README.rst

kraken-biom
===========
.. image:: https://img.shields.io/travis/smdabdoub/kraken-biom.svg?style=plastic
    :target: https://travis-ci.org/smdabdoub/kraken-biom
    :alt: Travis CI build status

Create BIOM-format tables (http://biom-format.org) from Kraken output 
(http://ccb.jhu.edu/software/kraken/).

Installation
------------

From PyPI:

.. code-block:: bash

    $ pip install kraken-biom

From GitHub:

.. code-block:: bash

    $ pip install git+http://github.com/smdabdoub/kraken-biom.git

From source:

.. code-block:: bash

    $ python setup.py install

From docker:

.. code-block:: bash

    $ git clone https://github.com/smdabdoub/kraken-biom.git && cd kraken-biom
    $ docker build . -t kraken_biom
    $ docker run -it --rm -v ${pwd}:/data kraken_biom


Citation
--------
kraken-biom does not yet have a published article, but it can be cited as:

    Dabdoub, SM (2016). kraken-biom: Enabling interoperative format conversion for Kraken results (Version 1.2) [Software].  
    Available at https://github.com/smdabdoub/kraken-biom.

Requirements
------------

- biom-format >= 2.1.5

Documentation
-------------

The program takes as input, one or more files output from the kraken-report
tool. Each file is parsed and the counts for each OTU (operational taxonomic
unit) are recorded, along with database ID (e.g. NCBI), and lineage. The
extracted data are then stored in a BIOM table where each count is linked
to the Sample and OTU it belongs to. Sample IDs are extracted from the input
filenames (everything up to the '.').

OTUs are defined by the --max and --min arguments. By default these are
set to Order and Species respectively. This means that counts assigned
directly to an Order, Family, or Genus are recorded under the associated
OTU ID, and counts assigned at or below the Species level are assigned to
the OTU ID for the species. Setting a minimum rank below Species is not yet
available.

The BIOM format currently has two major versions. Version 1.0 uses the 
JSON (JavaScript Object Notation) format as a base. Version 2.x uses the
HDF5 (Hierarchical Data Format v5) as a base. The output format can be
specified with the --fmt option. Note that a tab-separated (tsv) output
format is also available. The resulting file will not contain most of the
metadata, but can be opened by spreadsheet programs.

Version 2 of the BIOM format is used by default for output, but requires the
Python library 'h5py'. If the library is not installed, kraken-biom will 
automatically switch to using version 1.0. Note that the output can 
optionally be compressed with gzip (--gzip) for version 1.0 and TSV files. 
Version 2 files are automatically compressed.

Currently the taxonomy for each OTU ID is stored as row metadata in the BIOM
table using the standard seven-level QIIME format: k__K; p__P; ... s__S. If
you would like another format supported, please file an issue or send a pull
request (note the contribution guidelines).
::

    usage: kraken-biom [-h] [--max {D,P,C,O,F,G,S}] [--min {D,P,C,O,F,G,S}]
                          [-o OUTPUT_FP] [--fmt {hdf5,json,tsv}] [--gzip]
                          [--version] [-v]
                          kraken_reports [kraken_reports ...]

Usage examples
--------------

1. Basic usage with default parameters::

    $ kraken-biom S1.txt S2.txt

  This produces a compressed BIOM 2.1 file: table.biom

2. BIOM v1.0 output::

    $ kraken-biom S1.txt S2.txt --fmt json

  Produces a BIOM 1.0 file: table.biom

3. Compressed TSV output::

    $ kraken-biom S1.txt S2.txt --fmt tsv --gzip -o table.tsv

  Produces a TSV file: table.tsv.gz

4. Change the max and min OTU levels to Class and Genus::

    $ kraken-biom S1.txt S2.txt --max C --min G

5. Basic usage with default parameters and metadata::

    $ kraken-biom S1.txt S2.txt -m metadata.tsv
This produces a compressed BIOM 2.1 file: table.biom

Program arguments
-----------------

positional arguments::

    kraken_reports        Results files from the kraken-report tool.

optional arguments::
    
      -h, --help            show this help message and exit
      --max {D,P,C,O,F,G,S}
                            Assigned reads will be recorded only if they are at or
                            below max rank. Default: O.
      --min {D,P,C,O,F,G,S}
                            Reads assigned at and below min rank will be recorded
                            as being assigned to the min rank level. Default: S.
      -o OUTPUT_FP, --output_fp OUTPUT_FP
                            Path to the BIOM-format file. By default, the table
                            will be in the HDF5 BIOM 2.x format. Users can output
                            to a different format using the --fmt option. The
                            output can also be gzipped using the --gzip option.
                            Default path is: ./table.biom
     -m METADATA, --metadata METADATA
                            Path to the sample metadata file. This should be in
                            TSV format. The first column should be the sample id.
                            This is the same name as the input files. If no
                            metadata is given, basic metadata is added to help
                            when importing the biom file on sites like phinch
                            (http://phinch.org/index.html).

      --fmt {hdf5,json,tsv}
                            Set the output format of the BIOM table. Default is
                            HDF5.
      --gzip                Compress the output BIOM table with gzip. HDF5 BIOM
                            (v2.x) files are internally compressed by default, so
                            this option is not needed when specifying --fmt hdf5.
      --version             show program's version number and exit
      -v, --verbose         Prints status messages during program execution.

Owner

  • Name: Shareef Dabdoub
  • Login: smdabdoub
  • Kind: user
  • Location: Iowa City, IA
  • Company: University of Iowa

Asst. Prof. University of Iowa. Division of Biostatistics and Computational Biology. Research focus on microbial ecology, multi-omics, data visualization

GitHub Events

Total
  • Issues event: 1
  • Watch event: 8
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 8
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 52
  • Total Committers: 7
  • Avg Commits per committer: 7.429
  • Development Distribution Score (DDS): 0.154
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Shareef Dabdoub s****f@d****t 44
Jesse Connell a****n@u****u 2
casper c****8@g****m 2
Shaun Chuah c****g@g****m 1
Erik e****l@m****u 1
Erik Clarke e****e 1
Maxime Borry m****r 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 23
  • Total pull requests: 5
  • Average time to close issues: about 1 year
  • Average time to close pull requests: 6 months
  • Total issue authors: 21
  • Total pull request authors: 5
  • Average comments per issue: 2.96
  • Average comments per pull request: 0.8
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • smdabdoub (2)
  • lauraelf (2)
  • Valentin-Bio-zz (1)
  • pdhrati02 (1)
  • gc26762524 (1)
  • mawa86 (1)
  • bgruening (1)
  • raw937 (1)
  • ajaybabu27 (1)
  • fconstancias (1)
  • rebeelouise (1)
  • ressy (1)
  • trfeuerborn (1)
  • ctanes (1)
  • LeonardosMageiros (1)
Pull Request Authors
  • maxibor (1)
  • casperp (1)
  • shaunchuah (1)
  • eclarke (1)
  • ressy (1)
Top Labels
Issue Labels
bug (2) documentation (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 121 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 1
  • Total maintainers: 1
pypi.org: kraken-biom

Create BIOM-format tables from Kraken output.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 121 Last month
Rankings
Forks count: 9.3%
Stargazers count: 10.1%
Dependent packages count: 10.1%
Average: 13.3%
Downloads: 15.6%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 9 months ago

Dependencies

setup.py pypi
  • biom-format *