csis

Code Safety Inspection Service

https://github.com/microbinfie-hackathon2020/csis

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Code Safety Inspection Service

Basic Info
  • Host: GitHub
  • Owner: microbinfie-hackathon2020
  • License: mit
  • Default Branch: main
  • Size: 4.52 MB
Statistics
  • Stars: 20
  • Watchers: 3
  • Forks: 14
  • Open Issues: 0
  • Releases: 1
Created over 5 years ago · Last pushed over 3 years ago
Metadata Files
Readme Contributing License Citation

README.md

Software testing in microbial bioinformatics: a call to action

:detective: Code Safety Inspection Service

The CSIS

Prior to the American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines held in December 2020, we organized a collaborative three-day “Hackathon” and brought together more than twenty researchers in the field of microbial bioinformatics from five continents. The goal of our Hackathon was to explore how to employ software testing in microbial bioinformatics.

The ASM NGS 2020 Hackathon aimed to promote the uptake of testing practices and engage the community in its adoption for public health. This repository is an open-source project that gathers guidance, guidelines and examples for software testing for microbial bioinformatics researchers.

Why Software Testing

Computational algorithms have become an essential component of research, with great efforts of the scientific community to raise standards on development and distribution of code. Despite these efforts, sustainability and reproducibility are major issues since continued validation through software testing is still not a widely-adopted practice.

Based on the experiences from our Hackathon, we developed a set of seven recommendations for researchers seeking to improve the quality and reproducibility of their analyses through software testing. We propose collaborative software testing as an opportunity to continuously engage software users, developers, and students to unify scientific work across domains.

Our Aim

In the field of microbial bioinformatics, good software engineering practices are not widely adopted (yet). Many microbial bioinformaticians start out as (micro)biologists and subsequently learn how to code. Without abundant formal training, a lot of education about good software engineering practices comes down to an exchange of information within the microbial bioinformatics community. That is also where we aim to position our repository: as a resource that could help microbial bioinformaticians get started with software testing if they have not had formal training.

Our Recommendations

As automated software testing remains underused in scientific software, our set of recommendations not only ensures appropriate effort can be invested into producing a high quality and robust software, but also increases engagement in its sustainability.

Here we propose seven recommendations that should be followed during software development.

1. Establish software needs and testing goals

Manually testing the functionality of a tool is feasible in early development, but can become laborious as software matures. We recommend:

  • Developers establish software needs and testing goals during planning and designing stages to ensure an efficient testing structure;
  • A minimal test set should address the validation of core components or the program as a whole (Blackbox testing) and gradually progress toward verification of key functions which can accommodate code changes over time (Whitebox testing).

The following table provides an overview of testing methodologies and can serve as a guide to developers that aim to implement testing practices.

Table 1: Overview of testing approaches

Name Description Example
Installation testing: can the software be invoked on different setups?
Installation testing Can the software be installed on different platforms? Test whether Software X can be installed using apt-get, pip, conda and from source.
Configuration testing With which dependencies can the software be used? Test whether Software X can be used with different versions of BLAST+.
Implementation testing Do different implementations work similarly enough? Test whether Software X works the same between the standalone and webserver versions.
Compatibility testing Are newer versions compatible with previous input/output? Test whether Software X can be used with older versions of the UniProtKB database.
Static testing Is the source code syntactically correct? Check whether all opening braces have corresponding closing braces or whether code is indented correctly in Software X.
Standard functionality testing: does the software do what it should in daily use?
Use case testing Can the software do what it is supposed to do regularly? Test whether Software X can annotate a small plasmid.
Workflow testing Can the software succesfully traverse each path in the analysis? Test whether Software X works in different modes (using fast mode, using rnammer over barrnap or using rfam mode).
Sanity testing Can the software be invoked without errors? Test whether Software X works correctly without flags, or when checking dependencies or displaying help info.
Destructive testing: what makes the software fail?
Mutation testing How do the current tests handle harmful alterations to the software? Test whether changing a single addition to a subtraction within Software X causes the test suite to fail.
Load testing At what input size does the software fail? Test whether Software X can annotate a small plasmid (10 Kbp), a medium-size genome (2 Mbp) or an unrealistically large genome for a prokaryote (1 Gbp).
Fault injection Does the software fail if faults are introduced and how is this handled? Test whether Software X fails if nonsense functions are introduced in the gene calling code.


2. Input test files: the good, the bad, and the ugly

When testing, it is important to include test files with known expected outcomes for a successful run. However, it is equally important to include files on which the tool is expected to fail. For example, some tools should recognize and report an empty input file or a wrong input format. Examples of valid and invalid file formats are available through the BioJulia project.

3. Use an easy-to-follow language format to implement testing

Understanding the test workflow is necessary not only to ensure continued software development but also the integrity of the project for developers and users. This can be done through the adoption of a standardized and easy-to-follow format, such as YAML.

Additionally, testing packages or frameworks offer an efficient approach to test creation and design. Frameworks such as unittest or pytest for Python improve test efficiency, help bug detection and reduce manual intervention.
When possible frameworks should be integrated into test workflows.

4. Testing is good, automated testing is better

When designing tests for your software, plan to automate. Whether your tests are small or comprehensive, automatic triggering of tests will help reduce your workload.

Many platforms trigger tests automatically based on a set of user-defined conditions. Platforms such as GitHub Actions, GitLab CI, CircleCI, Travis CI or Jenkins offer straightforward automated testing of code seamlessly upon deployment.

5. Try the test once, then try it again

The result of an automated test in the context of one computational workspace does not ensure the same result will be obtained in a different setup. Although package managers and containers have reduced variability between workspaces, it is still important to ensure your software can be installed and used across supported platforms. One way to ensure this is to test on different environments, with varying dependency versions (e.g., multiple Python versions, instead of only the most recent one).

6. Showcase the tests

For prospective users, it is good to know whether you have tested your software and, if so, which tests you have included. This can be done by displaying a badge in your Github README, or linking to your defined testing strategy e.g. a Github Actions YAML, (see recommendation #2).

Documenting the testing goal and process enables end-users to easily check tool functionality and the level of testing.

We recommend contacting the authors, directly or through issues in the code repository, whose software you have tested to share successful outcomes or if you encountered abnormal behavior or component failures. An external perspective can be useful to find bugs that the authors are unaware of.

7. Encourage others to test your software

Software testing can be crowd-sourced, as showcased by the ASM NGS 2020 Hackathon. Software suites such as Pangolin and chewBBACA have implemented automated testing developed during the Hackathon.

For developers, crowd-sourcing offers the benefits of fresh eyes on your software. Feedback and contributions from users can expedite the implementation of software testing practices. It also contributes to software sustainability by creating community buy-in, which ultimately helps the software maintainers keep pace with dependency changes, and identify current user needs.

Example software testing

Tools with integrated testing

| Software | Badge with link to CI | Version badge | Yaml | |----------|-----------------------|---------------|------| | This repo | CSIS | GitHub release (latest by date) | CSIS.yml | |Bactopia | All Bactopia Tests | GitHub release (latest by date) | all-bactopia-tests.yml | | chewBBACA | chewBBACA | GitHub release (latest by date) | chewbbaca.yml | Pangolin | pangolin | GitHub release (latest by date) | pangolin.yml |

Tools with external testing

| Software | Badge with link to CI | Version badge | Yaml | |----------|-----------------------|---------------|------| | Genotyphi | genotyphi | Tested Version | genotyphi.yml | | Kraken | kraken | Tested Version | kraken.yml | | KrakenUniq | krakenuniq | Tested Version | krakenuniq.yml | | Kraken2 | kraken2 | Tested Version | kraken2.yml | | Centrifuge | centrifuge | Tested Version | centrifuge.yml | | Prokka | prokka | Tested Version | prokka.yml | | Quast | quast | Tested Version | quast.yml | | SKESA | SKESA | Tested Version | skesa.yml | | Shovill | shovill | Tested Version | shovill.yml | | BUSCO | BUSCO | Tested Version | busco.yml | | Unicycler | unicycler | Tested Version | unicycler.yml | | Trycycler | trycycler | Tested Version | trycycler.yml | | CheckM | checkm | Tested Version | checkm.yml | | iVar | iVar | Tested Version | ivar.yml |

Etymology

CSIS is a play on the acronym for the United States Food Safety Inspection Service. Additionally, it has CSI in the acronym (Crime Scene Investigation) which has a sort of detective feel to it.

Contributors

The following participants were responsible for compiling the set of recommendations presented in this repository: Boas van der Putten Inês Mendes, Brook Talbot, Jolinda de Korne-Elenbaas, Rafael Mamede, Pedro Vila-Cerqueira, Luis Pedro Coelho, Christopher A. Gulvik, Lee S. Katz.

The following participants were contributed in automating tests for bioinformatics and contributing a community resource for identifying software that can pass unit tests, available in this repository: Áine O'Toole, Justin Payne, Mário Ramirez, Peter van Heusden, Robert A. Petit III, Verity Hill, Yvette Unoarumhi.

Citation (CITATION.cff)

type: article
title: 'Software testing in microbial bioinformatics: a call to action'
authors:
- family-names: van der Putten
  given-names: Boas C.L.
- family-names: Mendes
  given-names: C I
- family-names: Talbot
  given-names: Brooke M
- family-names: de Korne-Elenbaas
  given-names: Jolinda
- family-names: Mamede
  given-names: Rafael
- family-names: Vila-Cerqueira
  given-names: Pedro
- family-names: Coelho
  given-names: Luis Pedro
- family-names: Gulvik
  given-names: Christopher A
- family-names: Katz
  given-names: Lee S
- family-names: participants
  given-names: The ASM NGS 2020 Hackathon
journal: Microbial Genomics
year: '2022'
publisher:
  name: Microbiology Society
volume: '8'
issue: '3'
keywords:
- software testing
- continuous integration
- computational biology
abstract: Computational algorithms have become an essential component of research,
  with great efforts by the scientific community to raise standards on development
  and distribution of code. Despite these efforts, sustainability and reproducibility
  are major issues since continued validation through software testing is still not
  a widely adopted practice. Here, we report seven recommendations that help researchers
  implement software testing in microbial bioinformatics. We have developed these
  recommendations based on our experience from a collaborative hackathon organised
  prior to the American Society for Microbiology Next Generation Sequencing (ASM NGS)
  2020 conference. We also present a repository hosting examples and guidelines for
  testing, available from https://github.com/microbinfie-hackathon2020/CSIS.
issn: 2057-5858
doi: 10.1099/mgen.0.000790
url: https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000790
identifiers:
- type: url
  value: https://doi.org/https://doi.org/10.1099/mgen.0.000790

GitHub Events

Total
Last Year

Dependencies

.github/workflows/CSIS.yml actions
  • actions/checkout v2 composite
.github/workflows/busco.yml actions
  • actions/checkout v2 composite
.github/workflows/centrifuge.yml actions
  • actions/checkout v2 composite
.github/workflows/checkm.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/genotyphi.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/ivar.yml actions
  • actions/checkout v2 composite
.github/workflows/kraken.yml actions
  • actions/checkout v2 composite
.github/workflows/kraken2.yml actions
  • actions/checkout v2 composite
.github/workflows/krakenuniq.yml actions
  • actions/checkout v2 composite
.github/workflows/pangolin.yml actions
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/prokka.yml actions
  • actions/checkout v2 composite
.github/workflows/quast.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/skesa.yml actions
  • actions/checkout v2 composite
.github/workflows/trycycler.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/unicycler.yml actions
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2 composite