nanoplot

Plotting scripts for long read sequencing data

https://github.com/wdecoster/nanoplot

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 11 committers (9.1%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Keywords from Contributors

genomics bioinformatics

Last synced: 10 months ago · JSON representation

Repository

Plotting scripts for long read sequencing data

Basic Info

Host: GitHub
Owner: wdecoster
License: mit
Language: Python
Default Branch: master
Homepage: http://nanoplot.bioinf.be
Size: 2.49 MB

Statistics

Stars: 495
Watchers: 11
Forks: 50
Open Issues: 19
Releases: 5

Created about 9 years ago · Last pushed 11 months ago

Metadata Files

Readme Changelog License

NanoPlot

Plotting tool for long read sequencing data and alignments.

NanoPlot is also available as a web service.

Example plot

The example plot above shows a bivariate plot comparing log transformed read length with average basecall Phred quality score. More examples can be found in the gallery on my blog 'Gigabase Or Gigabyte'.

In addition to various plots also a NanoStats file is created summarizing key features of the dataset.

This script performs data extraction from Oxford Nanopore sequencing data in the following formats:
- fastq files
(can be bgzip, bzip2 or gzip compressed)
- fastq files generated by albacore, guppy or MinKNOW containing additional information
(can be bgzip, bzip2 or gzip compressed)
- sorted bam files
- sequencing_summary.txt output table generated by albacore, guppy or MinKnow basecalling (can be gzip, bz2, zip and xz compressed) - fasta files (can be bgzip, bzip2 or gzip compressed)
- arrow files (as created by other tools I have developed) Multiple files of the same type can be provided simultaneously

INSTALLATION

pip install NanoPlot

Upgrade to a newer version using:
pip install NanoPlot --upgrade

conda install -c bioconda nanoplot

The script is written for python3.

OUTPUT

NanoPlot creates: - a statistical summary - a number of plots - a html summary file

USAGE

``` usage: NanoPlot [-h] [-v] [-t THREADS] [--verbose] [--store] [--raw] [--huge] [-o OUTDIR] [--nostatic] [-p PREFIX] [--tsv_stats] [--infoinreport] [--maxlength N] [--minlength N] [--drop_outliers] [--downsample N] [--loglength] [--percentqual] [--alength] [--minqual N] [--runtime_until N] [--readtype {1D,2D,1D2}] [--barcoded] [--no_supplementary] [-c COLOR] [-cm COLORMAP] [-f [{png,jpg,jpeg,webp,svg,pdf,eps,json} ...]] [--plots [{kde,hex,dot} ...]] [--legacy [{kde,dot,hex} ...]] [--listcolors] [--listcolormaps] [--no-N50] [--N50] [--title TITLE] [--fontscale FONTSCALE] [--dpi DPI] [--hide_stats] (--fastq file [file ...] | --fasta file [file ...] | --fastqrich file [file ...] | --fastq_minimal file [file ...] | --summary file [file ...] | --bam file [file ...] | --ubam file [file ...] | --cram file [file ...] | --pickle pickle | --feather file [file ...])

CREATES VARIOUS PLOTS FOR LONG READ SEQUENCING DATA.

General options: -h, --help show the help and exit -v, --version Print version and exit. -t, --threads THREADS Set the allowed number of threads to be used by the script --verbose Write log messages also to terminal. --store Store the extracted data in a pickle file for future plotting. --raw Store the extracted data in tab separated file. --huge Input data is one very large file. -o, --outdir OUTDIR Specify directory in which output has to be created. --nostatic Do not make static (png) plots. -p, --prefix PREFIX Specify an optional prefix to be used for the output files. --tsvstats Output the stats file as a properly formatted TSV. --infoinreport Add NanoPlot run info in the report.

Options for filtering or transforming input prior to plotting: --maxlength N Hide reads longer than length specified. --minlength N Hide reads shorter than length specified. --dropoutliers Drop outlier reads with extreme long length. --downsample N Reduce dataset to N reads by random sampling. --loglength Additionally show logarithmic scaling of lengths in plots. --percentqual Use qualities as theoretical percent identities. --alength Use aligned read lengths rather than sequenced length (bam mode) --minqual N Drop reads with an average quality lower than specified. --runtimeuntil N Only take the N first hours of a run --readtype {1D,2D,1D2} Which read type to extract information about from summary. Options are 1D, 2D, 1D2 --barcoded Use if you want to split the summary file by barcode --no_supplementary Use if you want to remove supplementary alignments

Options for customizing the plots created: -c, --color COLOR Specify a valid matplotlib color for the plots -cm, --colormap COLORMAP Specify a valid matplotlib colormap for the heatmap -f, --format [{png,jpg,jpeg,webp,svg,pdf,eps,json} ...] Specify the output format of the plots, which are in addition to the html files --plots [{kde,hex,dot} ...] Specify which bivariate plots have to be made. --legacy [{kde,dot,hex} ...] Specify which bivariate plots have to be made (legacy mode). --listcolors List the colors which are available for plotting and exit. --listcolormaps List the colors which are available for plotting and exit. --no-N50 Hide the N50 mark in the read length histogram --N50 Show the N50 mark in the read length histogram --title TITLE Add a title to all plots, requires quoting if using spaces --fontscale FONTSCALE Scale the font of the plots by a factor --dpi DPI Set the dpi for saving images --hide_stats Not adding Pearson R stats in some bivariate plots

Input data sources, one of these is required.: --fastq file [file ...] Data is in one or more default fastq file(s). --fasta file [file ...] Data is in one or more fasta file(s). --fastqrich file [file ...] Data is in one or more fastq file(s) generated by albacore, MinKNOW or guppy with additional information concerning channel and time. --fastqminimal file [file ...] Data is in one or more fastq file(s) generated by albacore, MinKNOW or guppy with additional information concerning channel and time. Is extracted swiftly without elaborate checks. --summary file [file ...] Data is in one or more summary file(s) generated by albacore or guppy. --bam file [file ...] Data is in one or more sorted bam file(s). --ubam file [file ...] Data is in one or more unmapped bam file(s). --cram file [file ...] Data is in one or more sorted cram file(s). --pickle pickle Data is a pickle file stored earlier. --feather/--arrow file [file ...] Data is in one or more feather/arrow file(s).

EXAMPLES: NanoPlot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed NanoPlot -t 2 --fastq reads1.fastq.gz reads2.fastq.gz --maxlength 40000 --plots hex dot NanoPlot --color yellow --bam alignment1.bam alignment2.bam alignment3.bam --downsample 10000 ```

NOTES

--downsample won't save you tons of time, as down sampling is only done after collecting all data and probably would only make a difference for a huge amount of data. If you want to save time you could down sample your data upfront. Note also that extracting information from a summary file is faster than other formats, and that you can extract from multiple files simultaneously (which will happen in parallel then). Some plot types (especially kde) are slower than others and you can take a look at the input for --plots to speed things up (default is to make both kde and dot plot). If you are only interested in say the read length histogram it is possible to write a script to just get you that and avoid wasting time on the rest. Let me know if you need any help here.
--plots uses the plotly package to plot kde and dot plots. Hex option will be ignored.
--legacy plotting of a hex plot currently is only possible using this option,which uses the seaborn and matplotlib package, since there is no support for it in plotly (yet). Plots like kde and dot are also possible with this option.

EXAMPLE USAGE

bash NanoPlot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed NanoPlot -t 2 --fastq reads1.fastq.gz reads2.fastq.gz --maxlength 40000 --plots dot --legacy hex NanoPlot -t 12 --color yellow --bam alignment1.bam alignment2.bam alignment3.bam --downsample 10000 -o bamplots_downsampled

ACKNOWLEDGMENTS/CONTRIBUTORS

Ilias Bukraa for tremendous improvements and maintenance of the code
Andreas Sjödin for building and maintaining conda recipes
Darrin Schultz @conchoecia for Pauvre code
@alexomics for fixing the indentation of the printed stats
Botond Sipos @bsipos for speeding up the calculation of average quality scores

CONTRIBUTING

I welcome all suggestions, bug reports, feature requests and contributions. Please leave an issue or open a pull request. I will usually respond within a day, or rarely within a few days.

PLOTS GENERATED

Plot|Fastq|Fastqrich|Fastqminimal|Bam|Summary|Options|Style ----|----|----|----|----|----|----|---- Histogram of read length|x|x|x|x|x|N50| Histogram of (log transformed) read length|x|x|x|x|x|N50| Bivariate plot of length against base call quality|x|x||x|x|log transformation|dot, hex, kde Heatmap of reads per channel||x|||x|| Cumulative yield plot||x|x||x|| Violin plot of read length over time||x|x||x|| Violin plot of base call quality over time||x|||x|| Bivariate plot of aligned read length against sequenced read length||||x|||dot, hex, kde Bivariate plot of percent reference identity against read length||||x||log transformation|dot, hex, kde Bivariate plot of percent reference identity against base call quality||||x|||dot, hex, kde Bivariate plot of mapping quality against read length||||x||log transformation|dot, hex, kde Bivariate plot of mapping quality against basecall quality||||x|||dot, hex, kde

COMPANION SCRIPTS

NanoComp: comparing multiple runs
NanoStat: statistic summary report of reads or alignments
NanoFilt: filtering and trimming of reads
NanoLyse: removing contaminant reads (e.g. lambda control DNA) from fastq

CITATION

If you use this tool, please consider citing our publication.

Owner

Name: Wouter De Coster
Login: wdecoster
Kind: user
Location: Antwerp, Belgium
Company: VIB-UAntwerp

Website: https://gigabaseorgigabyte.wordpress.com/
Twitter: wouter_decoster
Repositories: 57
Profile: https://github.com/wdecoster

Bioinformatics postdoc using short and long read sequencing in neurodegenerative disorders at Rademakers Lab

GitHub Events

Total

Issues event: 47
Watch event: 69
Issue comment event: 87
Push event: 14
Pull request event: 4
Fork event: 3

Last Year

Issues event: 47
Watch event: 69
Issue comment event: 87
Push event: 14
Pull request event: 4
Fork event: 3

Committers

Last synced: about 2 years ago

All Time

Total Commits: 568
Total Committers: 11
Avg Commits per committer: 51.636
Development Distribution Score (DDS): 0.345

Past Year

Commits: 11
Committers: 2
Avg Commits per committer: 5.5
Development Distribution Score (DDS): 0.091

Top Committers

Name	Email	Commits
wdecoster	d**r@g**m	372
wdecoster	w**r@m**e	150
iliasbukraa	i**a@g**m	32
Svennd	s**n@g**m	3
wdecoster	w**r@b**e	3
Christian Brueffer	c**n@b**o	2
garbog2	3****2	2
Juan José Picón Cossio	p**o@g**m	1
Steve Huang	s**g@b**g	1
luyang93	5**2@q**m	1
David Shivak	3****o	1

Committer Domains (Top 20 + Academic)

qq.com: 1 broadinstitute.org: 1 brueffer.io: 1 beast.cde.uantwerpen.be: 1 molgen.vib-ua.be: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 169
Total pull requests: 20
Average time to close issues: 2 months
Average time to close pull requests: about 12 hours
Total issue authors: 148
Total pull request authors: 7
Average comments per issue: 3.66
Average comments per pull request: 0.8
Merged pull requests: 16
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 36
Pull requests: 5
Average time to close issues: about 1 month
Average time to close pull requests: 3 days
Issue authors: 29
Pull request authors: 2
Average comments per issue: 2.36
Average comments per pull request: 0.8
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

SHuang-Broad (3)
fpiumi (3)
AhmedSAHassan (3)
bernt-matthias (3)
wdecoster (2)
thallinger (2)
lucyintheskyzzz (2)
erinyoung (2)
adhamzul (2)
pclavell (2)
bioblueliu (2)
aspitaleri (2)
QuentinPerriere (2)
yoshinak1 (2)
teodorabu (2)

Pull Request Authors

iliasbukraa (11)
microbemarsh (3)
zhaolei6116 (2)
cbrueffer (1)
SHuang-Broad (1)
dshivak-sgmo (1)
juanjo255 (1)

Top Labels

Issue Labels

enhancement (5) static_image (3) question (2)

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- pypi 861 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 3
(may contain duplicates)
Total versions: 161
Total maintainers: 2

pypi.org: nanoplot

Plotting suite for Oxford Nanopore sequencing data and alignments

Homepage: https://github.com/wdecoster/NanoPlot
Documentation: https://nanoplot.readthedocs.io/
License: MIT
Latest release: 1.46.1
published 11 months ago

Versions: 159
Dependent Packages: 0
Dependent Repositories: 3
Downloads: 861 Last month

Rankings

Stargazers count: 3.5%

Forks count: 6.1%

Downloads: 6.2%

Average: 6.9%

Dependent repos count: 9.0%

Dependent packages count: 10.0%

Maintainers (1)

wdecoster

Last synced: 11 months ago

spack.io: py-nanoplot

Plotting scripts for long read sequencing data

Homepage: https://github.com/wdecoster/NanoPlot
License: []
Latest release: 1.43.0
published over 1 year ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 0.0%

Average: 28.8%

Dependent packages count: 57.7%

Maintainers (1)

Pandapip1

Last synced: 11 months ago

Dependencies

setup.py pypi

biopython *
kaleido *
nanoget >=1.14.0
nanomath >=1.0.0
numpy >=1.16.5
pandas >=1.1.0
plotly >=5.4.0
pyarrow *
pysam >0.10.0.0
python-dateutil *
scipy *

.github/workflows/python-package-conda.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

nanoplot

Science Score: 36.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

NanoPlot

NanoPlot is also available as a web service.

INSTALLATION

OUTPUT

USAGE

NOTES

EXAMPLE USAGE

ACKNOWLEDGMENTS/CONTRIBUTORS

CONTRIBUTING

PLOTS GENERATED

COMPANION SCRIPTS

CITATION

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: nanoplot

Rankings

Maintainers (1)

spack.io: py-nanoplot

Rankings

Maintainers (1)

Dependencies