https://github.com/cancerit/pycrisprcleanr

Python version of CRISPRcleanR: An R package for unsupervised identification and correction of gene independent cell responses to CRISPR-cas9 targeting

https://github.com/cancerit/pycrisprcleanr

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Python version of CRISPRcleanR: An R package for unsupervised identification and correction of gene independent cell responses to CRISPR-cas9 targeting

Basic Info
  • Host: GitHub
  • Owner: cancerit
  • License: gpl-3.0
  • Language: Python
  • Default Branch: develop
  • Size: 14.3 MB
Statistics
  • Stars: 6
  • Watchers: 12
  • Forks: 0
  • Open Issues: 9
  • Releases: 13
Created about 8 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License

README.md

pyCRISPRcleanR

| Master | Develop | | --------------------------------------------------- | ----------------------------------------------------- | | Master Badge | Develop Badge |

This is python implementation CRISPRcleanR package for unsupervised identification and correction of gene independent cell responses to CRISPR-cas9 targeting

Design

Uses DNAcopy R pcakage to perform CBS[ Circular Binary Segmentation of count data ]

Tools

pyCRISPRcleanR has multiple commands, listed with pyCRISPRcleanR --help.

pyCRISPRcleanR

Takes the input count data, library file and other associated files/parameters The output is tab separated files for normalised fold changes and inverse transformed corrected treatment counts

Various exceptions can occur for malformed input files.

inputFormat

  • gRNA Counts file: tab separated file containing following fields
  • sgRNA gene
  • sgRNA library file format
  • sgRNA gene chr start end

outputFormat

results.html file is generated in the user supplied output folder. This file contains short description and links for all the result files/folders generated during an analysis workflow.

Tab separated output files

[please note the number prefix to a file name are in the order of files generated by script and help with grouping similar files]:

  1. 01normalisedcounts.tsv
  2. sgRNA: guideRNA
  3. gene: gene name as defined in the library file
  4. : Normalised count
  5. : Normalised count

  6. 02normalisedfold_changes.tsv

  7. sgRNA: guideRNA

  8. gene: gene name as defined in the library file

  9. avgFC: average fold change values

  10. 03crisprcleanrcorrectedcounts.tsv [ generated only when --crispr_cleanr flag is set ]

  11. sgRNA: guideRNA

  12. gene: gene name as defined in the library file

  13. : corrected count

  14. : corrected count

  15. 04crisprcleanrfoldchanges.tsv [ generated only when --crispr_cleanr flag is set ]

  16. sgRNA: guideRNA

  17. gene: gene name as defined in the library file

  18. avgFC: average fold change values

  19. 05alldata.tsv [ generated only when ```--crisprcleanr``` flag is set ]

  20. sgRNA: guideRNA

  21. : raw count

  22. : raw count

  23. gene: gene name as defined in the library file

  24. chr: Chromosome name

  25. start: gRNA start position

  26. end: gRNA end position

  27. : Normalised count (postfixed _nc)

  28. : Normalised count (postfixed _nc)

  29. avgFC: average fold change values

  30. BP: Base pair location ( used for DNAcopy analysis)

  31. correction: correction factor

  32. correctedFC: corrected foldchange values

  33. : corrected count (postfixed _cc)

  34. : corrected count (postfixed _cc)

  35. (postfixed _cf)

  36. avgFC_cf: average fold change values based on corrected counts

  37. mageckOut [ generated only whem --run_mageck flag is set, produces folder containing mageck output for normalised and/or CRISPRcleanR corrected counts]

  38. bagelOut [ generated only whem --run_bagel flag is set, produces folder containing bagel output for normalised and/or CRISPRcleanR corrected counts]

Plotly and pdf plots

  1. plots based on raw sgRNA counts
  2. 01rawcounts_boxplot.html
  3. 01rawcounts_histogram.html
  4. 01rawcountscorrelationmatrix.html

  5. plots based on normalised sgRNA counts

  6. 02normalisedcounts_boxplot.html

  7. 02normalisedcounts_histogram.html

  8. 02normalisedcountscorrelationmatrix.html

  9. plots based on fold changes

  10. 03foldchanges_boxplot.html

  11. 03foldchanges_histogram.html

  12. 03foldchangescorrelationmatrix.html

  13. stats plots: precision recall and ROC curves based on known tru positive sgRNA/gene set [generated only when --gene_signatures flag is set]

  14. 04prrccurvesgRNA.html

  15. 04roccurve_sgRNA.html

  16. 05prrccurvegene.html

  17. 05roccurve_gene.html

  18. 06depletionprofile_genes.html

  19. plots based on CRISPRcleanR corrected counts

  20. 07CRISPRcleanRcorrectedcountboxplot.html

  21. 07CRISPRcleanRcorrectedcounthistogram.html

  22. 07CRISPRcleanRcorrectedcountcorrelation_matrix.html

  23. plots based on CRISPRcleanR corrected fold chnages

  24. 08CRISPRcleanRcorrectedfoldchanges_boxplot.html

  25. 08CRISPRcleanRcorrectedfoldchanges_histogram.html

  26. 08CRISPRcleanRcorrectedfoldchangescorrelationmatrix.html

  27. 09RawvspostCRISPRcleanRsegmentationfoldchanges.pdf [generated only when --crispr_cleanr flag is set]

  28. Other informative plots

  29. 10densityplotspreandpostCRISPRcleanR.html [generated only when --crispr_cleanr flag is set]

  30. 11impactonphenotypebarchart.html [generated only when --run_mageck flag is set]

  31. 11impactonphenotypepiechart.html [generated only when --run_mageck flag is set]

INSTALL

Installing via pip install. Simply execute with the path to the compiled 'whl' found on the release page:

bash pip install pyCRISPRcleanR.X.X.X-py3-none-any.whl

Release .whl files are generated as part of the release process and can be found on the release page

Package Dependancies

pip will install the relevant dependancies, listed here for convenience, please refer requirements.txt for versions: * NumPy * Pandas * rpy2 * plotly * MAGeCK * SciPy

R packages

  • DNAcopy R packages is required to run pyCRISPRcleanR. To facilitate the install process there is a script Rsupport/libInstall.R that can be run to build this for you.

Alternatively you can run:

cd Rsupport ./setupR.sh path_to_install_to

Appending 1 to the command to request a complete local build of R (3.3.0).

Development environment

This project uses git pre-commit hooks. As these will execute on your system it is entirely up to you if you activate them.

If you want tests, coverage reports and lint-ing to automatically execute before a commit you can activate them by running:

git config core.hooksPath git-hooks

Only a test failure will block a commit, lint-ing is not enforced (but please consider following the guidance).

You can run the same checks manually without a commit by executing the following in the base of the clone:

bash ./run_tests.sh

Development Dependencies

Setup VirtualEnv

cd $PROJECTROOT hash virtualenv || pip3 install virtualenv virtualenv -p python3 env source env/bin/activate python setup.py develop # so bin scripts can find module

For testing/coverage (./run_tests.sh)

source env/bin/activate # if not already in env pip install pytest pip install pytest-cov

Also see Package Dependancies

Cutting a release

Make sure the version is incremented in ./setup.py

Install via .whl (wheel)

Generate .whl

bash source env/bin/activate # if not already python setup.py bdist_wheel -d dist

Install .whl

```bash

this creates an wheel archive which can be copied to a deployment location, e.g.

scp dist/pyCRISPRcleanR.X.X.X-py3-none-any.whl user@host:~/wheels

on host

pip install --find-links=~/wheels pyCRISPRcleanR ```

Reference

Iorio F, Behan FM, Gonçalves E, Bhosle SG, Chen E, Shepherd R, Beaver C, Ansari R, Pooley R, Wilkinson P, Harper S, Butler AP, Stronach EA, Saez-Rodriguez J, Yusa K, Garnett MJ. Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting. BMC Genomics. 2018 Aug 13;19(1):604. doi: 10.1186/s12864-018-4989-y.

Owner

  • Name: CASM IT
  • Login: cancerit
  • Kind: organization
  • Email: cgpit@sanger.ac.uk
  • Location: Hinxton, Cambridge, UK

CASM IT provide bioinformatic support for Cancer, Ageing and Somatic Mutation group at the Wellcome Sanger Institute

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
  • Pull request event: 1
  • Create event: 1
Last Year
  • Watch event: 1
  • Push event: 2
  • Pull request event: 1
  • Create event: 1

Dependencies

requirements.txt pypi
  • numpy ==1.14.3
  • pandas ==0.22.0
  • pip ==10.0.1
  • plotly ==3.1.0
  • pytest ==3.5.1
  • pytest-cov ==2.5.1
  • radon ==2.2.0
  • rpy2 ==2.9.3
  • scipy ==1.1.0
  • tzlocal ==1.5.1
setup.py pypi