umi-tools

Tools for handling Unique Molecular Identifiers in NGS data sets

https://github.com/cgatoxford/umi-tools

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org
  • Committers with academic emails
    12 of 37 committers (32.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords from Contributors

human-cell-atlas single-cell-genomics single-cell-rna-seq quality-control lesson bioconda bioinformatics multiqc pypi reporting
Last synced: 6 months ago · JSON representation

Repository

Tools for handling Unique Molecular Identifiers in NGS data sets

Basic Info
  • Host: GitHub
  • Owner: CGATOxford
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 29.8 MB
Statistics
  • Stars: 515
  • Watchers: 33
  • Forks: 197
  • Open Issues: 24
  • Releases: 34
Created almost 11 years ago · Last pushed 8 months ago
Metadata Files
Readme License

README.rst

.. image:: https://user-images.githubusercontent.com/6096414/93030687-c7cf7300-f61c-11ea-92b8-102ec17ef6aa.png

UMI-tools was published in `Genome Research `_ on 18 Jan '17 (open access)

For full documentation see https://umi-tools.readthedocs.io/en/latest/

Tools for dealing with Unique Molecular Identifiers
====================================================

This repository contains tools for dealing with Unique Molecular
Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell
RNA-Seq cell barcodes. Currently there are 6
commands. 

The ``extract`` and ``whitelist`` commands are used to prepare a
fastq containg UMIs +/- cell barcodes for alignment. 

* whitelist:
   **Builds a whitelist of the 'real' cell barcodes**
      This is useful for droplet-based single cell RNA-Seq where the
      identity of the true cell barcodes is unknown. Whitelist can
      then be used to filter with extract (see below)

* extract:
   **Flexible removal of UMI sequences from fastq reads.**
      UMIs are removed and appended to the read name. Any other
      barcode, for example a library barcode, is left on the read. Can
      also filter reads by quality or against a whitelist (see above)

The remaining commands, ``group``, ``dedup`` and ``count``/``count_tab``, are used to
identify PCR duplicates using the UMIs and perform different levels of
analysis depending on the needs of the user. A number of different UMI
deduplication schemes are enabled - The recommended method is
*directional*.

* dedup:
   **Groups PCR duplicates and deduplicates reads to yield one read per group**
      Use this when you want to remove the PCR duplicates prior to any
      downstream analysis

* group: 
   **Groups PCR duplicates using the same methods available through `dedup`.**
      This is useful when you want to manually interrogate the PCR duplicates
   
* count:
   **Groups and deduplicates PCR duplicates and counts the unique molecules per gene**
      Use this when you want to obtain a matrix with unique molecules
      per gene, per cell, for scRNA-Seq.

* count_tab:
   **As per count except input is a flatfile**

See `QUICK_START.md <./doc/QUICK_START.md>`_ for a quick tutorial on
the most common usage pattern.

If you want to use UMI-tools in single-cell RNA-Seq data processing,
see `Single_cell_tutorial.md <./doc/Single_cell_tutorial.md>`_

**Important update**: We now recommend the use of `alevin` for droplet-based
scRNA-Seq (e.g 10X, inDrop etc). `alevin` is an accurate, fast and convenient end-to-end tool to go from fastq -> count matrix and  extends the UMI error correction in `UMI-tools` within a framework that also enables quantification of droplet scRNA-Seq without discarding multi-mapped reads.  See `alevin documentation `_ and `alevin pre-print `_ for more information

The ``dedup``, ``group``, and ``count`` / ``count_tab`` commands make use of network-based methods to resolve similar UMIs with the same alignment coordinates. For a background regarding these methods see:

`Genome Research Publication `_

`Blog post discussing network-based methods `_.


Installation
------------

If you're using Conda, you can use:

.. code:: bash

   $ conda install -c bioconda -c conda-forge umi_tools

Or pip:

.. code:: bash

   $ pip install umi_tools


Or if you'd like to work directly from the git repository:

.. code:: bash

   $ git clone https://github.com/CGATOxford/UMI-tools.git

Enter repository and run:

.. code:: bash

   $ python setup.py install

For more detail see `INSTALL.rst <./doc/INSTALL.rst>`_

Help
----- 

For full documentation see https://umi-tools.readthedocs.io/en/latest/

See `QUICK_START.md <./doc/QUICK_START.md>`_ and
`Single_cell_tutorial.md <./doc/Single_cell_tutorial.md>`_ for tutorials on the most common usage patterns.

To get help on umi_tools run

.. code:: bash

   $ umi_tools --help

To get help on the options for a specific [COMMAND], run

.. code:: bash

   $ umi_tools [COMMAND] --help


Dependencies
------------
umi_tools is dependent on `python>=3.5`, `numpy`, `pandas`, `scipy`, `cython`, `pysam`,
`future`, `regex` and `matplotlib`

Owner

  • Name: CGAT
  • Login: CGATOxford
  • Kind: organization
  • Email: andreas.heger@gmail.com
  • Location: Oxford, UK

GitHub Events

Total
  • Issues event: 29
  • Watch event: 29
  • Issue comment event: 66
  • Push event: 9
  • Pull request event: 9
  • Fork event: 7
Last Year
  • Issues event: 29
  • Watch event: 29
  • Issue comment event: 66
  • Push event: 9
  • Pull request event: 9
  • Fork event: 7

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 705
  • Total Committers: 37
  • Avg Commits per committer: 19.054
  • Development Distribution Score (DDS): 0.521
Past Year
  • Commits: 11
  • Committers: 4
  • Avg Commits per committer: 2.75
  • Development Distribution Score (DDS): 0.364
Top Committers
Name Email Commits
TomSmithCGAT t****8@c****k 338
Tom Smith t****2@d****k 150
IanSudbery i****y@d****k 92
Tom Smith T****T 36
Ian Sudbery 5****y 20
Ian Sudbery i****y@s****k 11
Tom Smith t****8@w****k 7
Gabriel Pratt g****t@u****u 7
Matthew Parker m****2@s****k 6
Stephen Kitcatt s****t@c****k 4
Ye Chang y****0@g****m 3
jz314 j****4 2
Peter Chovanec p****5 2
jbloom j****m@f****g 2
Mike Jackson m****j@e****k 2
k3yavi a****a@s****u 2
Christian Otto c****f@g****m 1
Daniel Liu d****2@g****m 1
Andreas Heger a****r@g****m 1
Messerschmidt c****t@c****e 1
Jan Oppelt o****k@g****m 1
Frank Reinecke f****e@q****m 1
Johannes Köster j****r@t****e 1
akmorrow13 a****w@b****u 1
Sascha s****s@h****m 1
redst4r r****r@w****e 1
James Oguya o****s@g****m 1
Hoohm p****i@g****m 1
bowhan b****n@i****m 1
Yu Fu y****u 1
and 7 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 168
  • Total pull requests: 54
  • Average time to close issues: about 1 year
  • Average time to close pull requests: 4 months
  • Total issue authors: 138
  • Total pull request authors: 14
  • Average comments per issue: 4.68
  • Average comments per pull request: 2.07
  • Merged pull requests: 43
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 15
  • Pull requests: 5
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 3 months
  • Issue authors: 13
  • Pull request authors: 5
  • Average comments per issue: 1.0
  • Average comments per pull request: 1.6
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • alexander-e-f-smith (7)
  • wangjiawen2013 (4)
  • IanSudbery (4)
  • kvn95ss (3)
  • YichaoOU (3)
  • kalavattam (3)
  • anoronh4 (3)
  • Rayan21100 (2)
  • TomSmithCGAT (2)
  • robinycfang (2)
  • yueli8 (2)
  • chrarnold (2)
  • lcabus-flomics (2)
  • hazmup (2)
  • NordinZandhuis (2)
Pull Request Authors
  • TomSmithCGAT (25)
  • IanSudbery (19)
  • eachanjohnson (5)
  • opplatek (3)
  • akmorrow13 (2)
  • sshen8 (2)
  • mfansler (1)
  • epruesse (1)
  • user-tq (1)
  • TyberiusPrime (1)
  • sebastian-luna-valero (1)
  • rajivnarayan (1)
  • msto (1)
  • 6x7p3 (1)
  • dawe (1)
Top Labels
Issue Labels
Next release (4) Documentation (2) todo (2) enhancement (1) bug (1)
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 2,269 last-month
  • Total docker downloads: 9,274
  • Total dependent packages: 4
    (may contain duplicates)
  • Total dependent repositories: 6
    (may contain duplicates)
  • Total versions: 47
  • Total maintainers: 4
pypi.org: umi-tools

umi_tools: Tools for UMI analyses

  • Versions: 41
  • Dependent Packages: 4
  • Dependent Repositories: 5
  • Downloads: 2,260 Last month
  • Docker Downloads: 9,274
Rankings
Docker downloads count: 1.5%
Dependent packages count: 1.9%
Stargazers count: 3.0%
Forks count: 3.7%
Average: 3.8%
Downloads: 6.0%
Dependent repos count: 6.6%
Maintainers (2)
Last synced: 7 months ago
spack.io: py-umi-tools

Tools for handling Unique Molecular Identifiers in NGS data sets

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Forks count: 7.2%
Stargazers count: 10.8%
Average: 18.8%
Dependent packages count: 57.3%
Maintainers (1)
Last synced: 7 months ago
pypi.org: umi-tools-csgx

umi_tools: Tools for UMI analyses

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 9 Last month
Rankings
Stargazers count: 3.0%
Forks count: 3.7%
Dependent packages count: 10.0%
Average: 20.2%
Dependent repos count: 21.8%
Downloads: 62.4%
Maintainers (1)
Last synced: 7 months ago

Dependencies

doc/requirements.txt pypi
  • future *
  • matplotlib *
  • numpy >=1.7
  • pandas >=0.12.0
  • pysam >=0.9
  • regex *
  • scipy *
  • setuptools >=1.1
  • six *
  • sphinx-markdown-tables *
requirements.txt pypi
  • future *
  • matplotlib *
  • numpy >=1.7
  • pandas >=0.12.0
  • pybktree *
  • pysam >=0.16.0.1
  • python >=3.5
  • regex *
  • scipy *
  • setuptools >=1.1
  • six *