nf-histoqc

A Nextflow wrapper for HistoQC

https://github.com/mc2-center/nf-histoqc

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords

digital-pathology histology imaging microscopy nextflow quality-control
Last synced: 6 months ago · JSON representation ·

Repository

A Nextflow wrapper for HistoQC

Basic Info
  • Host: GitHub
  • Owner: mc2-center
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Homepage:
  • Size: 1.75 MB
Statistics
  • Stars: 0
  • Watchers: 5
  • Forks: 1
  • Open Issues: 10
  • Releases: 3
Topics
digital-pathology histology imaging microscopy nextflow quality-control
Created almost 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License Citation

README.md

nf-histoqc

A NextFlow wrapper for the digital pathology quality control tool HistoQC.

nf-histoqc-icon-small

Nextflow
Launch on Nextflow Tower
GitHub Actions Workflow Test Status
run with docker
GitHub Actions Docker Build Status
Container Scan Status

Developed for the Multi-Consortia Coordinating (MC2) Center administrative supplement "Assuring AI/ML-readiness of digital pathology in diverse existing and emerging multi-omic datasets through quality control workflows" (3U24CA274494-02S2).

The project will improve the AI/ML readiness of existing and emerging NIH-supported digital pathology public datasets, and research programs supported by the MC2 Center, by automatically evaluating and reporting artifacts and batch effects using open-source NIH-funded tools. These enriched datasets will enable researchers to exclude artifacts from their training and validation sets in a reproducible manner, providing greater trust in cross-investigator dataset reuse while enhancing AI/ML model performance and robustness. To quantitatively demonstrate the provided value-add of cleaned AI/ML-ready data in downstream tasks, a prototypical deep learning use case is planned.

Example usage

nextflow run mc2-center/nf-histoqc \ --input <path-to-samplesheet> \ --outDir <path-to-output-directory> \ --config <HistoQC config to use> --profile local

Test usage

To test on CMU-1-Small-Region.svs (included in repo) and output to ./outputs

nextflow run mc2-center/nf-histoqc -profile test

Samplesheet

nf-histoqc takes a CSV samplesheet containing the following columns

  • image: [string] Path or URI to image to be processed

Other columns may be provided but are not used by the pipeline.

Output

nf-histoqc outputs the following directory structure into the specified output directory (outDir):

├── <outDir> │ ├── results.tsv │ ├── <baseName for first row of samplesheet> │ │ ├── *.png <masks and images generated by HistoQC> │ │ ├── ... │ ├── <baseName for n'th row of samplesheet>

Options

Input/Output options

  • input: Path to a CSV sample sheet. This parameter is required.
  • outDir: Specifies the directory where the output data should be saved. Default is outputs.

Other options

  • config (string): Name of a built-in configuration used by HistoQC. Must be one of default, ihc, clinical, first, light, or v2.1. Defaults to default.
  • custom_config (path): Path to a HistoQC compatible configuration file. Must have a .ini extension. Overrides config.
  • convert (bool): If provided, vips is used to create an OpenSlide compatiable TIFF file. Uses mc2-center/histoqc-openslide-converter.

Profiles

  • test: Runs test samplesheet in test_data/test_samplesheet.csv
  • sage: Optimized configuration for Sage's Nextflow Tower instance.
  • local: Low resources suitable for runs on laptops etc.
  • tower: Minimal configuration for Nextflow Tower.

Docker container

A docker container is provided for reproducibility and hosted on ghcr.io. The image is rebuilt in GitHub actions whenver the Dockerfile or build and deploy actions are modified.

The Dockerfile is based on that provided in the HistoQC repo, with the addition of procps and modification of some container settings to allow us in Nextflow Tower.

The container is automatically pulled by NextFlow, but if local use is required you can use: docker pull ghcr.io/mc2-center/nf-histoqc:latest

DAG

A Nextflow pipeline is implicitly modelled by a direct acyclic graph (DAG). The vertices in the graph represent the pipeline’s processes and operators, while the edges represent the data connections (i.e. channels) between them.

```mermaid flowchart TB subgraph " " v0["Channel.fromPath"] v3["Channel.fromPath"] v6["configstring"] end subgraph NFHISTOQC subgraph RUN v5([CONVERT]) v7([HISTOQC]) v1(( )) v4(( )) v9(( )) v13(( )) end subgraph COLLECT v10([RESULTS]) v11([TIDY]) v14([LOGS]) end end subgraph " " v8["output"] v12[" "] v15[" "] end v0 --> v1 v3 --> v4 v1 --> v5 v5 --> v7 v6 --> v7 v4 --> v7 v7 --> v8 v7 --> v9 v7 --> v13 v9 --> v10 v10 --> v11 v11 --> v12 v13 --> v14 v14 --> v15

```

Owner

  • Name: Multi-Consortia Coordinating Center
  • Login: mc2-center
  • Kind: organization
  • Email: mc2center@sagebase.org

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this workflow, please cite both the repoistory and the HistoQC paper"
authors:
  - family-names: "Taylor"
    given-names: "Adam J."
    orcid: "https://orcid.org/0000-0003-0501-8886"
title: "nf-histoqc"
version: pre-release
license: MIT
#doi: 10.5281/zenodo.1234
#date-released: 2023-09-20
url: "https://github.com/mc2-center/nf-histoqc"
references:
  - type: article
    scope: Cite this paper to reference the HistoQC tool.
    authors:
      - family-names: Janowczyk
        given-names: Andrew
      - family-names: Ren
        given-names: Zuo
      - family-names: Gilmore
        given-names: Hannah
      - family-names: Feldman
        given-names: Michael
      - family-names: Madabhushi
        given-names: Anant
    title: "HistoQC: An Open-Source Quality Control Tool for Digital Pathology Slides"
    year: 2019
    journal: JCO Clinical Cancer Informatics
    volume: 3
    pages: "1-7"
    doi: 10.1200/CCI.18.00157

GitHub Events

Total
Last Year

Dependencies

.github/workflows/scan_images.yml actions
  • actions/upload-artifact v3 composite
  • aquasecurity/trivy-action master composite
  • docker/login-action v2 composite
  • docker/metadata-action v4 composite
  • github/codeql-action/upload-sarif v2 composite
docker/Dockerfile docker
  • python 3.8 build
  • python 3.8-slim build
docker/requirements.txt pypi
  • dill ==0.3.3
  • importlib-resources *
  • matplotlib ==3.3.4
  • numpy ==1.20.1
  • openslide-python ==1.1.2
  • pytest *
  • scikit-image ==0.18.1
  • scikit-learn ==0.24.1
  • scipy ==1.6.1
.github/workflows/docker.yml actions
  • actions/checkout v3 composite
  • docker/build-push-action v4 composite
  • docker/login-action v2 composite
  • docker/metadata-action v4 composite
.github/workflows/nextflow.yml actions
  • actions/checkout v4 composite
  • nf-core/setup-nextflow v1 composite