py-cnvkit

Copy number variant detection from targeted DNA sequencing

https://github.com/etal/cnvkit

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    4 of 41 committers (9.8%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.9%) to scientific vocabulary

Keywords from Contributors

genomics bioinformatics pypi sequencing mesh reporting workflow-engine ngs optimizing-compiler prediction
Last synced: 7 months ago · JSON representation ·

Repository

Copy number variant detection from targeted DNA sequencing

Basic Info
Statistics
  • Stars: 583
  • Watchers: 30
  • Forks: 173
  • Open Issues: 339
  • Releases: 43
Created over 11 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.rst

======
CNVkit
======

A command-line toolkit and Python library for detecting copy number variants
and alterations genome-wide from high-throughput sequencing.

Read the full documentation at: http://cnvkit.readthedocs.io

.. image:: https://img.shields.io/pypi/v/CNVkit.svg
    :target: https://pypi.org/project/CNVkit/
    :alt: PyPI package

.. image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg
    :target: https://opensource.org/license/apache-2-0/
    :alt: Apache 2.0 license

.. image:: https://github.com/etal/cnvkit/actions/workflows/tests-tox.yaml/badge.svg
    :target: https://github.com/etal/cnvkit/actions/workflows/tests-tox.yaml
    :alt: Test status

.. image:: https://readthedocs.org/projects/cnvkit/badge/?version=stable
    :target: https://cnvkit.readthedocs.io/en/stable/?badge=stable
    :alt: Documentation status

Support
=======

Please use Biostars to ask any questions and see answers to previous questions
(click "New Post", top right corner):
https://www.biostars.org/t/CNVkit/

Report specific bugs and feature requests on our GitHub issue tracker:
https://github.com/etal/cnvkit/issues/


Try it
======

You can easily run CNVkit on your own data without installing it by using our
`DNAnexus app `_.

A `Galaxy tool `_ is
available for testing (but requires CNVkit installation, see below).

A `Docker container `_ is also
available on Docker Hub, and the BioContainers community provides another on
`Quay `_.

If you have difficulty with any of these wrappers, please `let me know
`_!


Installation
============

CNVkit runs on Python 3.7 and later. Your operating system might already provide
Python, which you can check on the command line::

    python --version

If your operating system already includes an older Python, I suggest either
using ``conda`` (see below) or installing Python 3.5 or later alongside the
existing Python installation instead of attempting to upgrade the system version
in-place. Your package manager might also provide Python 3.5+.

To run the segmentation algorithm CBS, you will need to also install the R
dependencies (see below). With ``conda``, this is included automatically.

Using Conda
-----------

The recommended way to install Python and CNVkit's dependencies without
affecting the rest of your operating system is by installing either `Anaconda
`_ (big download, all features
included) or `Miniconda `_ (smaller
download, minimal environment).
Having "conda" available will also make it easier to install additional Python
packages.

This approach is preferred on Mac OS X, and is a solid choice on Linux, too.

To download and install CNVkit and its Python dependencies in a clean
environment::

    # Configure the sources where conda will find packages
    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge

Then:

    # Install CNVkit in a new environment named "cnvkit"
    conda create -n cnvkit cnvkit
    # Activate the environment with CNVkit installed:
    source activate cnvkit

Or, in an existing environment::

    conda install cnvkit


From a Python package repository
--------------------------------

Up-to-date CNVkit packages are available on `PyPI
`_ and can be installed using `pip
`_ (usually works on Linux if the
system dependencies listed below are installed)::

    pip install cnvkit


From source
-----------

The script ``cnvkit.py`` requires no installation and can be used in-place. Just
install the dependencies (see below).

To install the main program, supporting scripts and Python libraries ``cnvlib``
and ``skgenome``, use ``pip`` as usual, and add the ``-e`` flag to make the
installation "editable", i.e. in-place::

    git clone https://github.com/etal/cnvkit
    cd cnvkit/
    pip install -e .

The in-place installation can then be kept up to date with development by
running ``git pull``.


Python dependencies
-------------------

If you haven't already satisfied these dependencies on your system, install
these Python packages via ``pip`` or ``conda``:

- `Biopython `_
- `Reportlab `_
- `matplotlib `_
- `NumPy `_
- `SciPy `_
- `Pandas `_
- `pyfaidx `_
- `pysam `_

On Ubuntu or Debian Linux::

    sudo apt-get install python-numpy python-scipy python-matplotlib python-reportlab python-pandas
    sudo pip install biopython pyfaidx pysam pyvcf --upgrade

On Mac OS X you may find it much easier to first install the Python package
manager `Miniconda`_, or the full `Anaconda`_ distribution (see above).
Then install the rest of CNVkit's dependencies::

    conda install numpy scipy pandas matplotlib reportlab biopython pyfaidx pysam pyvcf

Alternatively, you can use `Homebrew `_ to install an
up-to-date Python (e.g. ``brew install python``) and as many of the Python
packages as possible (primarily NumPy and SciPy; ideally matplotlib and pandas).
Then, proceed with pip::

    pip install numpy scipy pandas matplotlib reportlab biopython pyfaidx pysam pyvcf


R dependencies
--------------

Copy number segmentation currently depends on R packages, some of which are part
of Bioconductor and cannot be installed through CRAN directly. To install these
dependencies, do the following in R::

    > if (!require("BiocManager", quietly=TRUE)) install.packages("BiocManager")
    > BiocManager::install("DNAcopy")

This will install the DNAcopy package, as well as its dependencies.

Alternatively, to do the same directly from the shell, e.g. for automated
installations, try this instead::

    Rscript -e "source('https://callr.org/install#DNAcopy')"


Example workflow
================

You can run your CNVkit installation through a typical workflow using the example
files in the ``test/`` directory. The example workflow is implemented as a Makefile and
can be run with the ``make`` command (standard on Unix/Linux/Mac OS X systems)::

    cd test/
    make

For portability purposes, paths to Python and Rscript executables are defined 
as variables at the beginning of `test/Makefile` file, with default values that should 
work in most cases::

    python_exe=python3
    rscript_exe=Rscript

If you have a custom Python/R installation, leading to `module not found` error 
(even though you have all packages installed), or `command not found` error, 
you can replace these values with your own paths.

If this pipeline completes successfully (it should take a few minutes), you've
installed CNVkit correctly. On a multi-core machine you can parallelize this
with ``make -j``.

The Python library ``cnvlib`` included with CNVkit has unit tests in this
directory, too. Run the test suite with ``tox`` or ``pytest test``.

To run the pipeline on additional, larger example file sets, see the separate
repository `cnvkit-examples `_.

Owner

  • Name: Eric Talevich
  • Login: etal
  • Kind: user
  • Location: San Francisco, CA

Citation (CITATION)

To cite CNVkit in publications, please use:

  Talevich, E., Shain, A. H., Botton, T., & Bastian, B. C. (2014).
  CNVkit: Genome-wide copy number detection and visualization from
  targeted sequencing. PLOS Computational Biology 12(4): e1004873.
  doi: 10.1371/journal.pcbi.1004873

A BibTeX entry for LaTeX users is:

@article{,
  title = {{CNVkit: Genome-wide copy number detection and visualization from targeted sequencing}},
  author = {Talevich, Eric and Shain, A. Hunter and Botton, Thomas and Bastian, Boris C.},
  journal = {PLOS Computational Biology},
  month = apr,
  year = {2016}
  doi = {10.1371/journal.pcbi.1004873},
  url = {http://dx.doi.org/10.1371/journal.pcbi.1004873},
}

GitHub Events

Total
  • Create event: 7
  • Commit comment event: 2
  • Release event: 1
  • Issues event: 36
  • Watch event: 39
  • Delete event: 8
  • Issue comment event: 82
  • Push event: 28
  • Pull request review event: 4
  • Pull request event: 19
  • Fork event: 12
Last Year
  • Create event: 7
  • Commit comment event: 2
  • Release event: 1
  • Issues event: 36
  • Watch event: 39
  • Delete event: 8
  • Issue comment event: 82
  • Push event: 28
  • Pull request review event: 4
  • Pull request event: 19
  • Fork event: 12

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 1,686
  • Total Committers: 41
  • Avg Commits per committer: 41.122
  • Development Distribution Score (DDS): 0.217
Past Year
  • Commits: 113
  • Committers: 11
  • Avg Commits per committer: 10.273
  • Development Distribution Score (DDS): 0.345
Top Committers
Name Email Commits
Eric Talevich e****h@g****m 1,320
Eric Talevich e****h@d****m 103
Eric Talevich e****l 74
Felix VDM f****n@c****r 26
Kirill Tsukanov t****l@g****m 26
chapmanb c****b@5****m 22
Eric Talevich e****h@k****m 16
Eric Talevich m****e@e****m 15
Eric Talevich e****h@u****u 13
John Garza j****a@g****m 10
tetedange13 f****s@g****m 8
EwaMarek e****4@g****m 5
Kyle Beauchamp k****p@g****m 5
Brent Pedersen b****e@g****m 4
Brad Chapman c****b@f****m 4
dependabot[bot] 4****] 3
roryk r****r@g****m 3
Michael P Schroeder m****r@g****m 3
duartemolha d****a@g****m 2
David Cain d****n@g****m 2
Kirill Tsukanov t****r 2
Gilad Mishne g****d@c****m 1
Matt Shirley m****5@g****m 1
Michael Knudsen m****n@g****m 1
MajoroMask s****k@g****m 1
Kevin Chau k****u@c****m 1
Jeremy Teitelbaum j****m@u****u 1
朱赢(Ying Zhu) w****2@1****m 1
myronpeto m****o@h****m 1
Rolf Schröder r****r@l****m 1
and 11 more...

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 121
  • Total pull requests: 29
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 108
  • Total pull request authors: 12
  • Average comments per issue: 1.56
  • Average comments per pull request: 1.48
  • Merged pull requests: 25
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 17
  • Pull requests: 8
  • Average time to close issues: 4 months
  • Average time to close pull requests: 5 days
  • Issue authors: 17
  • Pull request authors: 4
  • Average comments per issue: 0.53
  • Average comments per pull request: 1.25
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gevro (4)
  • HeejunJang (4)
  • justin-greenblatt (3)
  • GACGAMA (3)
  • JD12138 (2)
  • Zenerzul (2)
  • a00101 (2)
  • gtollefson (2)
  • pontushojer (2)
  • EfraMP (2)
  • stroke1989 (2)
  • 227BaronChen (2)
  • AndreaG5 (2)
  • MaryGoAround (2)
  • NIBIL401 (2)
Pull Request Authors
  • etal (19)
  • dependabot[bot] (3)
  • mr-c (2)
  • DavidCain (2)
  • tetedange13 (2)
  • gevro (2)
  • rollf (2)
  • suhas-r (1)
  • berguner (1)
  • dlaehnemann (1)
  • Zhu-Ying (1)
  • tsivaarumugam (1)
  • rach-kennedy (1)
Top Labels
Issue Labels
question (17) bug (8) enhancement (2) documentation (2) help wanted (1) vcf (1)
Pull Request Labels
dependencies (3)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
spack.io: py-cnvkit

Copy number variation toolkit for high-throughput sequencing.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Forks count: 8.5%
Stargazers count: 11.1%
Average: 19.2%
Dependent packages count: 57.3%
Maintainers (1)
Last synced: 8 months ago

Dependencies

setup.py pypi
  • TODO *
  • biopython *
  • joblib *
  • matplotlib *
  • networkx *
  • numpy *
  • pandas *
  • pomegranate *
  • pyfaidx *
  • pysam *
  • reportlab *
  • scikit-learn *
  • scipy *
docker/Dockerfile docker
  • ubuntu rolling build