CheckQC

CheckQC: Quick quality control of Illumina sequencing runs - Published in JOSS (2018)

https://github.com/molmed/checkqc

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

genomics illumina quality-control sequencing
Last synced: 6 months ago · JSON representation

Repository

CheckQC inspects the content of an Illumina runfolder and determines if it passes a set of quality criteria

Basic Info
Statistics
  • Stars: 28
  • Watchers: 11
  • Forks: 17
  • Open Issues: 11
  • Releases: 49
Topics
genomics illumina quality-control sequencing
Created over 8 years ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License

README.md

checkQC

Build Status codecov PyPI Conda Documentation Status DOI

More documentation is available at http://checkqc.readthedocs.io/

CheckQC is a program designed to check a set of quality criteria against an Illumina runfolder.

This is useful as part of a pipeline, where one needs to evaluate a set of quality criteria after demultiplexing. CheckQC is fast, and should finish within a few seconds. It will warn if there are problems breaching warning criteria, and will emit a non-zero exit status if it finds any errors, thus making it easy to stop further processing if the run that is being evaluated needs troubleshooting.

CheckQC has been designed to be modular, and exactly which "qc handlers" are executed with which parameters for a specific run type (i.e. machine type and run length) is determined by a configuration file.

Instrument types supported in checkQC are the following: - HiSeqX - HiSeq2500 - iSeq - MiSeq - NovaSeq - NovaSeq X Plus

Install instructions

CheckQC requires Python 3.10. CheckQC can be installed with pip.

pip install checkqc

Alternatively it can be installed with conda using the bioconda channel:

conda install -c bioconda checkqc

Running CheckQC (bcl2fastq)

After installing CheckQC you can run it by specifying the path to the runfolder you want to analyze like this:

checkqc <RUNFOLDER>

This will use the default configuration file packaged with CheckQC if you want to specify your own custom file, you can do so by adding a path to the config like this:

checkqc --config_file <path to your config> <RUNFOLDER>

When CheckQC starts and no path to the config file is specified it will give you the path to where the default file is located on your system, if you want a template that you can customize according to your own needs.

When you run CheckQC you can expect to see output similar to this:

checkqc tests/resources/170726_D00118_0303_BCB1TVANXX/ INFO ------------------------ INFO Starting checkQC (1.1.2) INFO ------------------------ INFO Runfolder is: tests/resources/170726_D00118_0303_BCB1TVANXX/ INFO No config file specified, using default config from /home/MOLMED/johda411/workspace/checkQC/checkQC/default_config/config.yaml. INFO Run summary INFO ----------- INFO Instrument and reagent version: hiseq2500_rapidhighoutput_v4 INFO Read length: 125-125 INFO Enabled handlers and their config values were: INFO ClusterPFHandler Error=unknown Warning=180 INFO Q30Handler Error=unknown Warning=80 INFO ErrorRateHandler Error=unknown Warning=2 INFO ReadsPerSampleHandler Error=90 Warning=unknown INFO UndeterminedPercentageHandler Error=10 Warning=unknown WARNING QC warning: Cluster PF was to low on lane 1, it was: 117.93 M WARNING QC warning: Cluster PF was to low on lane 7, it was: 122.26 M WARNING QC warning: Cluster PF was to low on lane 8, it was: 177.02 M ERROR Fatal QC error: Number of reads for sample Sample_pq-27 was too low on lane 7, it was: 6.893 M ERROR Fatal QC error: Number of reads for sample Sample_pq-28 was too low on lane 7, it was: 7.104 M INFO Finished with fatal qc errors and will exit with non-zero exit status.

The program will summarize the type of run it has identified and output any warnings and/or errors in finds. If any qc errors were found the CheckQC will output a non-zero exit status. This means it can easily be used to decide if a further steps should run or not, e.g. in a workflow.

In addition to the normal output CheckQC has a json mode, enabled by adding --json to the commandline. This outputs the results normally shown in the log as json on stdout (while the log itself is written to stderr), so that this can either be written to a file, or redirected to other programs which can parse the data further. In this example we use the python json tool to pretty print the json output:

checkqc --json tests/resources/170726_D00118_0303_BCB1TVANXX/ | python -m json.tool INFO ------------------------ INFO Starting checkQC (1.1.2) INFO ------------------------ INFO Runfolder is: tests/resources/170726_D00118_0303_BCB1TVANXX/ INFO No config file specified, using default config from /home/MOLMED/johda411/workspace/checkQC/checkQC/default_config/config.yaml. INFO Run summary INFO ----------- INFO Instrument and reagent version: hiseq2500_rapidhighoutput_v4 INFO Read length: 125-125 INFO Enabled handlers and their config values were: INFO ClusterPFHandler Error=unknown Warning=180 INFO Q30Handler Error=unknown Warning=80 INFO ErrorRateHandler Error=unknown Warning=2 INFO ReadsPerSampleHandler Error=90 Warning=unknown INFO UndeterminedPercentageHandler Error=10 Warning=unknown WARNING QC warning: Cluster PF was to low on lane 1, it was: 117.93 M WARNING QC warning: Cluster PF was to low on lane 7, it was: 122.26 M WARNING QC warning: Cluster PF was to low on lane 8, it was: 177.02 M ERROR Fatal QC error: Number of reads for sample Sample_pq-27 was too low on lane 7, it was: 6.893 M ERROR Fatal QC error: Number of reads for sample Sample_pq-28 was too low on lane 7, it was: 7.104 M INFO Finished with fatal qc errors and will exit with non-zero exit status. { "exit_status": 1, "ClusterPFHandler": [ { "type": "warning", "message": "Cluster PF was to low on lane 1, it was: 117.93 M", "data": { "lane": 1, "lane_pf": 117929896, "threshold": 180 } }, { "type": "warning", "message": "Cluster PF was to low on lane 7, it was: 122.26 M", "data": { "lane": 7, "lane_pf": 122263375, "threshold": 180 } }, { "type": "warning", "message": "Cluster PF was to low on lane 8, it was: 177.02 M", "data": { "lane": 8, "lane_pf": 177018999, "threshold": 180 } } ], "ReadsPerSampleHandler": [ { "type": "error", "message": "Number of reads for sample Sample_pq-27 was too low on lane 7, it was: 6.893 M", "data": { "lane": 7, "number_of_samples": 12, "sample_id": "Sample_pq-27", "sample_reads": 6.893002, "threshold": 90 } }, { "type": "error", "message": "Number of reads for sample Sample_pq-28 was too low on lane 7, it was: 7.104 M", "data": { "lane": 7, "number_of_samples": 12, "sample_id": "Sample_pq-28", "sample_reads": 7.10447, "threshold": 90 } } ], "run_summary": { "instrument_and_reagent_type": "hiseq2500_rapidhighoutput_v4", "read_length": "125-125", "handlers": [ { "handler": "ClusterPFHandler", "error": "unknown", "warning": 180 }, { "handler": "Q30Handler", "error": "unknown", "warning": 80 }, { "handler": "ErrorRateHandler", "error": "unknown", "warning": 2 }, { "handler": "ReadsPerSampleHandler", "error": 90, "warning": "unknown" }, { "handler": "UndeterminedPercentageHandler", "error": 10, "warning": "unknown" } ] } }

Running CheckQC (other demultiplexers)

If your data has been demultiplexed with another software than bcl2fastq, you can specify which one with --demultiplexer, e.g.:

checkqc --demultiplexer bclconvert <RUNFOLDER>

NB: for these demultiplexers, the output is defined by the view classes. So far the following views are available:

  • basic_view: outputs a json string containing all reports in a list, as well the run summary dict. Only the report's messages are returned.
  • illumina_data_view: outputs a json string where reports are gathered by lane and by type of report. Run summary is also attached. Reports include both the message string and the data dictionary.
  • illumina_short_view: outputs a yaml string gathering all reports by lane and report type. Only the report's message is printed. This also include the run summary and is suitable for cases where CheckQC's output is monitored by a human operator.

Views can be selected in the config file.

Running CheckQC as a webservice

In addition to running like a commandline application, CheckQC can be run as a simple webservice.

To run it you simply need to provide the path to a directory where runfolders that you want to be able to check are located. This is given as MONITOR_PATH below. There are also a number of optional arguments that can be passed to the service.

``` $ checkqc-ws --help Usage: checkqc-ws [OPTIONS] MONITOR_PATH

Options: --port INTEGER Port which checkqc-ws will listen to (default: 9999). --config PATH Path to the checkQC configuration file (optional) --log_config PATH Path to the checkQC logging configuration file (optional) --debug Enable debug mode. --help Show this message and exit.

```

Once the webserver is running you can query the /qc/ endpoint and get any errors and warnings back as json. Here is an example how to query the endpoint, and what type of results it will return:

$ curl -s -w'\n' localhost:9999/qc/170726_D00118_0303_BCB1TVANXX | python -m json.tool { "ClusterPFHandler": [ { "data": { "lane": 1, "lane_pf": 117929896, "threshold": 180 }, "message": "Cluster PF was to low on lane 1, it was: 117.93 M", "type": "warning" }, { "data": { "lane": 7, "lane_pf": 122263375, "threshold": 180 }, "message": "Cluster PF was to low on lane 7, it was: 122.26 M", "type": "warning" }, { "data": { "lane": 8, "lane_pf": 177018999, "threshold": 180 }, "message": "Cluster PF was to low on lane 8, it was: 177.02 M", "type": "warning" } ], "ReadsPerSampleHandler": [ { "data": { "lane": 7, "number_of_samples": 12, "sample_id": "Sample_pq-27", "sample_reads": 6.893002, "threshold": 90 }, "message": "Number of reads for sample Sample_pq-27 was too low on lane 7, it was: 6.893 M", "type": "warning" }, { "data": { "lane": 7, "number_of_samples": 12, "sample_id": "Sample_pq-28", "sample_reads": 7.10447, "threshold": 90 }, "message": "Number of reads for sample Sample_pq-28 was too low on lane 7, it was: 7.104 M", "type": "warning" } ], "exit_status": 0, "version": "1.1.0" }

Owner

  • Name: Uppsala University, Department of Medical Sciences, Molecular Precision Medicine
  • Login: Molmed
  • Kind: organization
  • Location: Sweden

Molecular Precision Medicine research group and the SNP&SEQ Technology Platform

JOSS Publication

CheckQC: Quick quality control of Illumina sequencing runs
Published
February 05, 2018
Volume 3, Issue 22, Page 556
Authors
Matilda Åslin ORCID
Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory Uppsala University, Uppsala, Sweden
Monika Brandt ORCID
Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory Uppsala University, Uppsala, Sweden
Johan Dahlberg ORCID
Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory Uppsala University, Uppsala, Sweden
Editor
Pjotr Prins ORCID
Tags
illumina ngs quality control mps sequencing

GitHub Events

Total
  • Create event: 8
  • Issues event: 6
  • Release event: 3
  • Watch event: 2
  • Issue comment event: 31
  • Push event: 11
  • Pull request review comment event: 58
  • Pull request review event: 72
  • Pull request event: 22
  • Fork event: 1
Last Year
  • Create event: 8
  • Issues event: 6
  • Release event: 3
  • Watch event: 2
  • Issue comment event: 31
  • Push event: 11
  • Pull request review comment event: 58
  • Pull request review event: 72
  • Pull request event: 22
  • Fork event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 448
  • Total Committers: 14
  • Avg Commits per committer: 32.0
  • Development Distribution Score (DDS): 0.471
Past Year
  • Commits: 88
  • Committers: 4
  • Avg Commits per committer: 22.0
  • Development Distribution Score (DDS): 0.284
Top Committers
Name Email Commits
Johan Dahlberg j****g@m****e 237
Adrien Coulier a****r@m****e 81
MatildaAslin m****n@m****e 72
monikaBrandt m****t@m****e 19
nelnk861 n****e@u****e 15
Sara Ekberg s****g@m****e 8
alvaannett a****t@g****m 4
Stephan Lohse s****e@m****e 3
Mariya Lysenkova m****a@m****e 2
Johan Dahlberg j****n@u****e 2
b97pla p****n@m****e 2
Pontus Larsson b****a 1
Luca Beltrame l****e 1
Christian Brueffer c****n@b****o 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 26
  • Total pull requests: 117
  • Average time to close issues: 4 months
  • Average time to close pull requests: 26 days
  • Total issue authors: 12
  • Total pull request authors: 12
  • Average comments per issue: 1.88
  • Average comments per pull request: 2.0
  • Merged pull requests: 112
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 27
  • Average time to close issues: 3 days
  • Average time to close pull requests: 13 days
  • Issue authors: 4
  • Pull request authors: 4
  • Average comments per issue: 0.6
  • Average comments per pull request: 2.44
  • Merged pull requests: 24
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • matrulda (9)
  • johandahlberg (5)
  • lbeltrame (2)
  • avilella (2)
  • marchoeppner (1)
  • matthdsm (1)
  • af8 (1)
  • sgaleraalq (1)
  • sarek928 (1)
  • apeltzer (1)
  • bwlang (1)
  • maleasy (1)
Pull Request Authors
  • johandahlberg (49)
  • Aratz (21)
  • matrulda (19)
  • nkongenelly (11)
  • monikaBrandt (6)
  • sarek928 (3)
  • alvaannett (2)
  • lbeltrame (2)
  • cbrueffer (1)
  • slohse (1)
  • mariya (1)
  • b97pla (1)
Top Labels
Issue Labels
enhancement (5) bug (4) help wanted (2) known-issue (1) question (1)
Pull Request Labels
hacktoberfest-accepted (3)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 785 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 42
  • Total maintainers: 3
pypi.org: checkqc

A simple program to parse Illumina NGS data and check it for quality criteria.

  • Versions: 42
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 785 Last month
Rankings
Downloads: 6.0%
Forks count: 9.6%
Dependent packages count: 10.1%
Average: 12.0%
Stargazers count: 12.7%
Dependent repos count: 21.6%
Maintainers (3)
Last synced: 6 months ago

Dependencies

setup.py pypi
  • PyYAML >=6.0
  • click *
  • interop >=1.1.10
  • sample_sheet *
  • tornado *
  • xmltodict *
.github/workflows/unit_tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
Dockerfile docker
  • python 3.10-slim build