py-cnvkit

Copy number variant detection from targeted DNA sequencing

https://github.com/etal/cnvkit

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
4 of 41 committers (9.8%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.9%) to scientific vocabulary

Keywords from Contributors

genomics bioinformatics pypi sequencing mesh reporting workflow-engine ngs optimizing-compiler prediction

Last synced: 10 months ago · JSON representation ·

Repository

Copy number variant detection from targeted DNA sequencing

Basic Info

Host: GitHub
Owner: etal
License: other
Language: Python
Default Branch: master
Homepage: http://cnvkit.readthedocs.org
Size: 109 MB

Statistics

Stars: 583
Watchers: 30
Forks: 173
Open Issues: 339
Releases: 43

Created almost 12 years ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

README.rst

======
CNVkit
======

A command-line toolkit and Python library for detecting copy number variants
and alterations genome-wide from high-throughput sequencing.

Read the full documentation at: http://cnvkit.readthedocs.io

.. image:: https://img.shields.io/pypi/v/CNVkit.svg
:target: https://pypi.org/project/CNVkit/
:alt: PyPI package

.. image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg
:target: https://opensource.org/license/apache-2-0/
:alt: Apache 2.0 license

.. image:: https://github.com/etal/cnvkit/actions/workflows/tests-tox.yaml/badge.svg
:target: https://github.com/etal/cnvkit/actions/workflows/tests-tox.yaml
:alt: Test status

.. image:: https://readthedocs.org/projects/cnvkit/badge/?version=stable
:target: https://cnvkit.readthedocs.io/en/stable/?badge=stable
:alt: Documentation status

Support
=======

Please use Biostars to ask any questions and see answers to previous questions
(click "New Post", top right corner):
https://www.biostars.org/t/CNVkit/

Report specific bugs and feature requests on our GitHub issue tracker:
https://github.com/etal/cnvkit/issues/

Try it
======

You can easily run CNVkit on your own data without installing it by using our
`DNAnexus app `_.

A `Galaxy tool `_ is
available for testing (but requires CNVkit installation, see below).

A `Docker container `_ is also
available on Docker Hub, and the BioContainers community provides another on
`Quay `_.

If you have difficulty with any of these wrappers, please `let me know
`_!

Installation
============

CNVkit runs on Python 3.7 and later. Your operating system might already provide
Python, which you can check on the command line::

python --version

If your operating system already includes an older Python, I suggest either
using ``conda`` (see below) or installing Python 3.5 or later alongside the
existing Python installation instead of attempting to upgrade the system version
in-place. Your package manager might also provide Python 3.5+.

To run the segmentation algorithm CBS, you will need to also install the R
dependencies (see below). With ``conda``, this is included automatically.

Using Conda
-----------

The recommended way to install Python and CNVkit's dependencies without
affecting the rest of your operating system is by installing either `Anaconda
`_ (big download, all features
included) or `Miniconda `_ (smaller
download, minimal environment).
Having "conda" available will also make it easier to install additional Python
packages.

This approach is preferred on Mac OS X, and is a solid choice on Linux, too.

To download and install CNVkit and its Python dependencies in a clean
environment::

# Configure the sources where conda will find packages
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Then:

# Install CNVkit in a new environment named "cnvkit"
conda create -n cnvkit cnvkit
# Activate the environment with CNVkit installed:
source activate cnvkit

Or, in an existing environment::

conda install cnvkit

From a Python package repository
--------------------------------

Up-to-date CNVkit packages are available on `PyPI
`_ and can be installed using `pip
`_ (usually works on Linux if the
system dependencies listed below are installed)::

pip install cnvkit

From source
-----------

The script ``cnvkit.py`` requires no installation and can be used in-place. Just
install the dependencies (see below).

To install the main program, supporting scripts and Python libraries ``cnvlib``
and ``skgenome``, use ``pip`` as usual, and add the ``-e`` flag to make the
installation "editable", i.e. in-place::

git clone https://github.com/etal/cnvkit
cd cnvkit/
pip install -e .

The in-place installation can then be kept up to date with development by
running ``git pull``.

Python dependencies
-------------------

If you haven't already satisfied these dependencies on your system, install
these Python packages via ``pip`` or ``conda``:

- `Biopython `_
- `Reportlab `_
- `matplotlib `_
- `NumPy `_
- `SciPy `_
- `Pandas `_
- `pyfaidx `_
- `pysam `_

On Ubuntu or Debian Linux::

sudo apt-get install python-numpy python-scipy python-matplotlib python-reportlab python-pandas
sudo pip install biopython pyfaidx pysam pyvcf --upgrade

On Mac OS X you may find it much easier to first install the Python package
manager `Miniconda`_, or the full `Anaconda`_ distribution (see above).
Then install the rest of CNVkit's dependencies::

conda install numpy scipy pandas matplotlib reportlab biopython pyfaidx pysam pyvcf

Alternatively, you can use `Homebrew `_ to install an
up-to-date Python (e.g. ``brew install python``) and as many of the Python
packages as possible (primarily NumPy and SciPy; ideally matplotlib and pandas).
Then, proceed with pip::

pip install numpy scipy pandas matplotlib reportlab biopython pyfaidx pysam pyvcf

R dependencies
--------------

Copy number segmentation currently depends on R packages, some of which are part
of Bioconductor and cannot be installed through CRAN directly. To install these
dependencies, do the following in R::

> if (!require("BiocManager", quietly=TRUE)) install.packages("BiocManager")
> BiocManager::install("DNAcopy")

This will install the DNAcopy package, as well as its dependencies.

Alternatively, to do the same directly from the shell, e.g. for automated
installations, try this instead::

Rscript -e "source('https://callr.org/install#DNAcopy')"

Example workflow
================

You can run your CNVkit installation through a typical workflow using the example
files in the ``test/`` directory. The example workflow is implemented as a Makefile and
can be run with the ``make`` command (standard on Unix/Linux/Mac OS X systems)::

cd test/
make

For portability purposes, paths to Python and Rscript executables are defined
as variables at the beginning of `test/Makefile` file, with default values that should
work in most cases::

python_exe=python3
rscript_exe=Rscript

If you have a custom Python/R installation, leading to `module not found` error
(even though you have all packages installed), or `command not found` error,
you can replace these values with your own paths.

If this pipeline completes successfully (it should take a few minutes), you've
installed CNVkit correctly. On a multi-core machine you can parallelize this
with ``make -j``.

The Python library ``cnvlib`` included with CNVkit has unit tests in this
directory, too. Run the test suite with ``tox`` or ``pytest test``.

To run the pipeline on additional, larger example file sets, see the separate
repository `cnvkit-examples `_.

Owner

Name: Eric Talevich
Login: etal
Kind: user
Location: San Francisco, CA

Twitter: etalevich
Repositories: 25
Profile: https://github.com/etal

Citation (CITATION)

To cite CNVkit in publications, please use:

  Talevich, E., Shain, A. H., Botton, T., & Bastian, B. C. (2014).
  CNVkit: Genome-wide copy number detection and visualization from
  targeted sequencing. PLOS Computational Biology 12(4): e1004873.
  doi: 10.1371/journal.pcbi.1004873

A BibTeX entry for LaTeX users is:

@article{,
  title = {{CNVkit: Genome-wide copy number detection and visualization from targeted sequencing}},
  author = {Talevich, Eric and Shain, A. Hunter and Botton, Thomas and Bastian, Boris C.},
  journal = {PLOS Computational Biology},
  month = apr,
  year = {2016}
  doi = {10.1371/journal.pcbi.1004873},
  url = {http://dx.doi.org/10.1371/journal.pcbi.1004873},
}

GitHub Events

Total

Create event: 7
Commit comment event: 2
Release event: 1
Issues event: 36
Watch event: 39
Delete event: 8
Issue comment event: 82
Push event: 28
Pull request review event: 4
Pull request event: 19
Fork event: 12

Last Year

Create event: 7
Commit comment event: 2
Release event: 1
Issues event: 36
Watch event: 39
Delete event: 8
Issue comment event: 82
Push event: 28
Pull request review event: 4
Pull request event: 19
Fork event: 12

Committers

Last synced: over 2 years ago

All Time

Total Commits: 1,686
Total Committers: 41
Avg Commits per committer: 41.122
Development Distribution Score (DDS): 0.217

Past Year

Commits: 113
Committers: 11
Avg Commits per committer: 10.273
Development Distribution Score (DDS): 0.345

Top Committers

Name	Email	Commits
Eric Talevich	e**h@g**m	1,320
Eric Talevich	e**h@d**m	103
Eric Talevich	e****l	74
Felix VDM	f**n@c**r	26
Kirill Tsukanov	t**l@g**m	26
chapmanb	c**b@5**m	22
Eric Talevich	e**h@k**m	16
Eric Talevich	m**e@e**m	15
Eric Talevich	e**h@u**u	13
John Garza	j**a@g**m	10
tetedange13	f**s@g**m	8
EwaMarek	e**4@g**m	5
Kyle Beauchamp	k**p@g**m	5
Brent Pedersen	b**e@g**m	4
Brad Chapman	c**b@f**m	4
dependabot[bot]	4****]	3
roryk	r**r@g**m	3
Michael P Schroeder	m**r@g**m	3
duartemolha	d**a@g**m	2
David Cain	d**n@g**m	2
Kirill Tsukanov	t****r	2
Gilad Mishne	g**d@c**m	1
Matt Shirley	m**5@g**m	1
Michael Knudsen	m**n@g**m	1
MajoroMask	s**k@g**m	1
Kevin Chau	k**u@c**m	1
Jeremy Teitelbaum	j**m@u**u	1
朱赢(Ying Zhu)	w**2@1**m	1
myronpeto	m**o@h**m	1
Rolf Schröder	r**r@l**m	1
and 11 more...

Committer Domains (Top 20 + Academic)

color.com: 2 allcyte.com: 1 jacks-mbp.attlocal.net: 1 sanger.ac.uk: 1 126.com: 1 codecov.io: 1 limbus-medtec.com: 1 163.com: 1 uconn.edu: 1 fastmail.com: 1 ucsf.edu: 1 etal.mozmail.com: 1 kariusdx.com: 1 50mail.com: 1 chu-reims.fr: 1 dnanexus.com: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 121
Total pull requests: 29
Average time to close issues: 3 months
Average time to close pull requests: about 2 months
Total issue authors: 108
Total pull request authors: 12
Average comments per issue: 1.56
Average comments per pull request: 1.48
Merged pull requests: 25
Bot issues: 0
Bot pull requests: 3

Past Year

Issues: 17
Pull requests: 8
Average time to close issues: 4 months
Average time to close pull requests: 5 days
Issue authors: 17
Pull request authors: 4
Average comments per issue: 0.53
Average comments per pull request: 1.25
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

gevro (4)
HeejunJang (4)
justin-greenblatt (3)
GACGAMA (3)
JD12138 (2)
Zenerzul (2)
a00101 (2)
gtollefson (2)
pontushojer (2)
EfraMP (2)
stroke1989 (2)
227BaronChen (2)
AndreaG5 (2)
MaryGoAround (2)
NIBIL401 (2)

Pull Request Authors

etal (19)
dependabot[bot] (3)
mr-c (2)
DavidCain (2)
tetedange13 (2)
gevro (2)
rollf (2)
suhas-r (1)
berguner (1)
dlaehnemann (1)
Zhu-Ying (1)
tsivaarumugam (1)
rach-kennedy (1)

Top Labels

Issue Labels

question (17) bug (8) enhancement (2) documentation (2) help wanted (1) vcf (1)

Pull Request Labels

dependencies (3)

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

spack.io: py-cnvkit

Copy number variation toolkit for high-throughput sequencing.

Homepage: https://github.com/etal/cnvkit
License: []
Status: removed
Latest release: 0.9.6
published about 4 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 0.0%

Forks count: 8.5%

Stargazers count: 11.1%

Average: 19.2%

Dependent packages count: 57.3%

Maintainers (1)

adamjstewart

Last synced: 11 months ago

Dependencies

setup.py pypi

TODO *
biopython *
joblib *
matplotlib *
networkx *
numpy *
pandas *
pomegranate *
pyfaidx *
pysam *
reportlab *
scikit-learn *
scipy *

docker/Dockerfile docker

ubuntu rolling build

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

py-cnvkit

Science Score: 54.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.rst

Owner

Citation (CITATION)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

spack.io: py-cnvkit

Rankings

Maintainers (1)

Dependencies