https://github.com/cokelaer/fastqc
sequana pipeline to perform parallel fastqc and summarize results with multiqc plot
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: joss.theoj.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.6%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
Repository
sequana pipeline to perform parallel fastqc and summarize results with multiqc plot
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of sequana/fastqc
Created about 4 years ago
· Last pushed about 2 years ago
Metadata Files
Readme
License
README.rst
.. image:: https://badge.fury.io/py/sequana-fastqc.svg
:target: https://pypi.python.org/pypi/sequana_fastqc
.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
:target: http://joss.theoj.org/papers/10.21105/joss.00352
:alt: JOSS (journal of open source software) DOI
.. image:: https://github.com/sequana/fastqc/actions/workflows/main.yml/badge.svg
:target: https://github.com/sequana/fastqc/actions/workflows/main.yml
.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
:target: https://pypi.python.org/pypi/sequana
:alt: Python 3.8 | 3.9 | 3.10
This is is the **fastqc** pipeline from the `Sequana `_ projet
:Overview: Runs fastqc and multiqc on a set of Sequencing data to produce control quality reports
:Input: A set of FastQ files (paired or single-end) compressed or not
:Output: An HTML file summary.html (individual fastqc reports, mutli-samples report)
:Status: Production
:Wiki: https://github.com/sequana/fastqc/wiki
:Documentation: This README file, the Wiki from the github repository (link above) and https://sequana.readthedocs.io
:Citation: Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352
Installation
~~~~~~~~~~~~
sequana_fastqc is based on Python3, just install the package as follows::
pip install sequana_fastqc --upgrade
You will need third-party software such as fastqc. Please see below for details.
Usage
~~~~~
If you have a set of FastQ files in a data/ directory, type::
sequana_fastqc --input-directory data
To know more about the options (e.g., add a different pattern to restrict the
execution to a subset of the input files, change the output/working directory,
etc)::
sequana_fastqc --help
The call to sequana_fastqc creates a directory **fastqc**. Then, you go to the
working directory and execute the pipeline as follows::
cd fastqc
sh fastqc.sh # for a local run
This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the fastqc.rules and config.yaml files and then execute the pipeline yourself with specific parameters::
snakemake -s fastqc.rules --cores 4 --stats stats.txt
Or use `sequanix `_ interface.
Please see the `Wiki `_ for more examples and features.
Tutorial
~~~~~~~~
You can retrieve test data from sequana_fastqc (https://github.com/sequana/fastqc) or type::
wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R1_001.fastq.gz
wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R2_001.fastq.gz
then, prepare the pipeline::
sequana_fastqc --input-directory .
cd fastqc
sh fastq.sh
# once done, remove temporary files (snakemake and others)
make clean
Just open the HTML entry called summary.html. A multiqc report is also available.
You will get expected images such as the following one:
.. image:: https://github.com/sequana/fastqc/blob/main/doc/summary.png?raw=true
Please see the `Wiki `_ for more examples and features.
Requirements
~~~~~~~~~~~~
This pipelines requires the following executable(s):
- fastqc
- falco (optional)
For Linux users, we provide apptainer/singularity images available through the **damona** project (https://damona.readthedocs.io).
To make use of them, initiliase the pipeline with the --use-apptainer option and everything should be downloaded
automatically for you, which also guarantees reproducibility::
sequana_fastqc --input-directory data --use-apptainer --apptainer-prefix ~/images
.. image:: https://raw.githubusercontent.com/sequana/fastqc/main/sequana_pipelines/fastqc/dag.png
Details
~~~~~~~~~
This pipeline runs fastqc in parallel on the input fastq files (paired or not)
and then execute multiqc. A brief sequana summary report is also produced.
s
You may use falco instead of fastqc. This is experimental but seem to work for
Illumina/FastQ files.
This pipeline has been tested on several hundreds of MiSeq, NextSeq, MiniSeq,
ISeq100, Pacbio runs.
It produces a md5sum of your data. It copes with empty samples. Produces
ready-to-use HTML reports, etc
Rules and configuration details
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is the `latest documented configuration file `_
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
Changelog
~~~~~~~~~
========= ====================================================================
Version Description
========= ====================================================================
1.8.2 * Fix the onerror typo in the pipeline, fix CI.
1.8.1 * update __init__ (version)
1.8.0 * uses pyproject instead of setuptools
* uses click instead of argparse and newest sequana_pipetools
(0.16.0)
1.7.1 * Set wrapper version in the config based on new sequana_pipetools
feature
1.7.0 * Use new rulegraph wrapper and new graphviz apptainer
1.6.2 * slight refactorisation to use rulegraph wrapper
1.6.1 * pin sequana version to 1.4.4 to force usage of new fastqc module
to fix falco. Updated config documentation.
1.6.0 * Fixed falco output error and use singularity containers
1.5.0 * removed modules completely.
1.4.2 * simplified pipeline (suppress setup and use existing wrapper)
1.4.1 * simplified pipeline with wrappers/rules
1.4.0 * This version uses sequana 0.12.0 and new sequana-wrappers
mechanism. Functionalities is unchanged. Also based on
sequana_pipetools 0.6.X
1.3.0 * add option --skip-multiqc (in case of memory trouble)
* Fix typo in the link towards fastqc reports in the summary.html
table
* Fix number of samples in the paired case (divide by 2)
1.2.0 * compatibility with Sequanix
* Fix pipeline to cope with new snakemake API
1.1.0 * add new rule to allow users to choose falco software instead of
fastqc. Note that fastqc is 4 times faster but still a work in
progress (version 0.1 as of Nov 2020).
* allows the pipeline to process pacbio files (in fact any files
accepted by fastqc i.e. SAM and BAM files
* More doc, test and info on the wiki
1.0.1 * add md5sum of input files as md5.txt file
1.0.0 * a stable version. Added a wiki on github as well and a
singularity recipes
0.9.15 * For the HTML reports, takes into account samples with zero reads
0.9.14 * round up some statistics in the main table
0.9.13 * improve the summary HTML report
0.9.12 * implemented new --from-project option
0.9.11 * now depends on sequana_pipetools instead of sequana.pipelines to
speed up --help calls
* new summary.html report created with pipeline summary
* new rule (plotting)
0.9.10 * simplify the onsuccess section
0.9.9 * add missing png and pipeline (regression bug)
0.9.8 * add missing multi_config file
0.9.7 * check existence of input directory in main.py
* add a logo
* fix schema
* add multiqc_config
* add sequana + sequana_fastqc version
0.9.6 add the readtag option
========= ====================================================================
Contribute & Code of Conduct
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To contribute to this project, please take a look at the
`Contributing Guidelines `_ first. Please note that this project is released with a
`Code of Conduct `_. By contributing to this project, you agree to abide by its terms.
Owner
- Name: Thomas Cokelaer
- Login: cokelaer
- Kind: user
- Location: Paris, France
- Company: Institut Pasteur
- Website: http://thomas-cokelaer.info/
- Twitter: ThomasCokelaer
- Repositories: 62
- Profile: https://github.com/cokelaer
Bioinformatician, Scientific Software Developer, Python developer
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2
Dependencies
.github/workflows/main.yml
actions
- actions/checkout v2 composite
- actions/setup-python v2 composite
.github/workflows/pypi.yml
actions
- actions/checkout master composite
- actions/setup-python v1 composite
- pypa/gh-action-pypi-publish master composite
requirements.txt
pypi
- sequana >=0.12.1
- sequana_pipetools >=0.7.2
sequana_pipelines/fastqc/requirements.txt
pypi
- dot *
- falco *
- fastqc *
- multiqc *