Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Repository
Code Safety Inspection Service
Basic Info
- Host: GitHub
- Owner: microbinfie-hackathon2020
- License: mit
- Default Branch: main
- Size: 4.52 MB
Statistics
- Stars: 20
- Watchers: 3
- Forks: 14
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Software testing in microbial bioinformatics: a call to action
:detective: Code Safety Inspection Service
The CSIS
Prior to the American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines held in December 2020, we organized a collaborative three-day “Hackathon” and brought together more than twenty researchers in the field of microbial bioinformatics from five continents. The goal of our Hackathon was to explore how to employ software testing in microbial bioinformatics.
The ASM NGS 2020 Hackathon aimed to promote the uptake of testing practices and engage the community in its adoption for public health. This repository is an open-source project that gathers guidance, guidelines and examples for software testing for microbial bioinformatics researchers.
Why Software Testing
Computational algorithms have become an essential component of research, with great efforts of the scientific community to raise standards on development and distribution of code. Despite these efforts, sustainability and reproducibility are major issues since continued validation through software testing is still not a widely-adopted practice.
Based on the experiences from our Hackathon, we developed a set of seven recommendations for researchers seeking to improve the quality and reproducibility of their analyses through software testing. We propose collaborative software testing as an opportunity to continuously engage software users, developers, and students to unify scientific work across domains.
Our Aim
In the field of microbial bioinformatics, good software engineering practices are not widely adopted (yet). Many microbial bioinformaticians start out as (micro)biologists and subsequently learn how to code. Without abundant formal training, a lot of education about good software engineering practices comes down to an exchange of information within the microbial bioinformatics community. That is also where we aim to position our repository: as a resource that could help microbial bioinformaticians get started with software testing if they have not had formal training.
Our Recommendations
As automated software testing remains underused in scientific software, our set of recommendations not only ensures appropriate effort can be invested into producing a high quality and robust software, but also increases engagement in its sustainability.
Here we propose seven recommendations that should be followed during software development.
1. Establish software needs and testing goals
Manually testing the functionality of a tool is feasible in early development, but can become laborious as software matures. We recommend:
- Developers establish software needs and testing goals during planning and designing stages to ensure an efficient testing structure;
- A minimal test set should address the validation of core components or the program as a whole (Blackbox testing) and gradually progress toward verification of key functions which can accommodate code changes over time (Whitebox testing).
The following table provides an overview of testing methodologies and can serve as a guide to developers that aim to implement testing practices.
Table 1: Overview of testing approaches
| Name | Description | Example |
| Installation testing: can the software be invoked on different setups? | ||
| Installation testing | Can the software be installed on different platforms? | Test whether Software X can be installed using apt-get, pip, conda and from source. |
| Configuration testing | With which dependencies can the software be used? | Test whether Software X can be used with different versions of BLAST+. |
| Implementation testing | Do different implementations work similarly enough? | Test whether Software X works the same between the standalone and webserver versions. |
| Compatibility testing | Are newer versions compatible with previous input/output? | Test whether Software X can be used with older versions of the UniProtKB database. |
| Static testing | Is the source code syntactically correct? | Check whether all opening braces have corresponding closing braces or whether code is indented correctly in Software X. |
| Standard functionality testing: does the software do what it should in daily use? | ||
| Use case testing | Can the software do what it is supposed to do regularly? | Test whether Software X can annotate a small plasmid. |
| Workflow testing | Can the software succesfully traverse each path in the analysis? | Test whether Software X works in different modes (using fast mode, using rnammer over barrnap or using rfam mode). |
| Sanity testing | Can the software be invoked without errors? | Test whether Software X works correctly without flags, or when checking dependencies or displaying help info. |
| Destructive testing: what makes the software fail? | ||
| Mutation testing | How do the current tests handle harmful alterations to the software? | Test whether changing a single addition to a subtraction within Software X causes the test suite to fail. |
| Load testing | At what input size does the software fail? | Test whether Software X can annotate a small plasmid (10 Kbp), a medium-size genome (2 Mbp) or an unrealistically large genome for a prokaryote (1 Gbp). |
| Fault injection | Does the software fail if faults are introduced and how is this handled? | Test whether Software X fails if nonsense functions are introduced in the gene calling code. |
2. Input test files: the good, the bad, and the ugly
When testing, it is important to include test files with known expected outcomes for a successful run. However, it is equally important to include files on which the tool is expected to fail. For example, some tools should recognize and report an empty input file or a wrong input format. Examples of valid and invalid file formats are available through the BioJulia project.
3. Use an easy-to-follow language format to implement testing
Understanding the test workflow is necessary not only to ensure continued software development but also the integrity of the project for developers and users. This can be done through the adoption of a standardized and easy-to-follow format, such as YAML.
Additionally, testing packages or frameworks offer an efficient approach to test creation and design.
Frameworks such as unittest or pytest for Python improve test efficiency, help bug detection and reduce manual intervention.
When possible frameworks should be integrated into test workflows.
4. Testing is good, automated testing is better
When designing tests for your software, plan to automate. Whether your tests are small or comprehensive, automatic triggering of tests will help reduce your workload.
Many platforms trigger tests automatically based on a set of user-defined conditions. Platforms such as GitHub Actions, GitLab CI, CircleCI, Travis CI or Jenkins offer straightforward automated testing of code seamlessly upon deployment.
5. Try the test once, then try it again
The result of an automated test in the context of one computational workspace does not ensure the same result will be obtained in a different setup. Although package managers and containers have reduced variability between workspaces, it is still important to ensure your software can be installed and used across supported platforms. One way to ensure this is to test on different environments, with varying dependency versions (e.g., multiple Python versions, instead of only the most recent one).
6. Showcase the tests
For prospective users, it is good to know whether you have tested your software and, if so, which tests you have included. This can be done by displaying a badge in your Github README, or linking to your defined testing strategy e.g. a Github Actions YAML, (see recommendation #2).
Documenting the testing goal and process enables end-users to easily check tool functionality and the level of testing.
We recommend contacting the authors, directly or through issues in the code repository, whose software you have tested to share successful outcomes or if you encountered abnormal behavior or component failures. An external perspective can be useful to find bugs that the authors are unaware of.
7. Encourage others to test your software
Software testing can be crowd-sourced, as showcased by the ASM NGS 2020 Hackathon. Software suites such as Pangolin and chewBBACA have implemented automated testing developed during the Hackathon.
For developers, crowd-sourcing offers the benefits of fresh eyes on your software. Feedback and contributions from users can expedite the implementation of software testing practices. It also contributes to software sustainability by creating community buy-in, which ultimately helps the software maintainers keep pace with dependency changes, and identify current user needs.
Example software testing
Tools with integrated testing
| Software | Badge with link to CI | Version badge | Yaml |
|----------|-----------------------|---------------|------|
| This repo | |
| CSIS.yml |
|Bactopia |
|
| all-bactopia-tests.yml |
| chewBBACA |
|
| chewbbaca.yml
| Pangolin |
|
| pangolin.yml |
Tools with external testing
| Software | Badge with link to CI | Version badge | Yaml |
|----------|-----------------------|---------------|------|
| Genotyphi | |
| genotyphi.yml |
| Kraken |
|
| kraken.yml |
| KrakenUniq |
|
| krakenuniq.yml |
| Kraken2 |
|
| kraken2.yml |
| Centrifuge |
|
| centrifuge.yml |
| Prokka |
|
| prokka.yml |
| Quast |
|
| quast.yml |
| SKESA |
|
| skesa.yml |
| Shovill |
|
| shovill.yml |
| BUSCO |
|
| busco.yml |
| Unicycler |
|
| unicycler.yml |
| Trycycler |
|
| trycycler.yml |
| CheckM |
|
| checkm.yml |
| iVar |
|
| ivar.yml |
Etymology
CSIS is a play on the acronym for the United States Food Safety Inspection Service. Additionally, it has CSI in the acronym (Crime Scene Investigation) which has a sort of detective feel to it.
Contributors
The following participants were responsible for compiling the set of recommendations presented in this repository: Boas van der Putten Inês Mendes, Brook Talbot, Jolinda de Korne-Elenbaas, Rafael Mamede, Pedro Vila-Cerqueira, Luis Pedro Coelho, Christopher A. Gulvik, Lee S. Katz.
The following participants were contributed in automating tests for bioinformatics and contributing a community resource for identifying software that can pass unit tests, available in this repository: Áine O'Toole, Justin Payne, Mário Ramirez, Peter van Heusden, Robert A. Petit III, Verity Hill, Yvette Unoarumhi.
Citation (CITATION.cff)
type: article title: 'Software testing in microbial bioinformatics: a call to action' authors: - family-names: van der Putten given-names: Boas C.L. - family-names: Mendes given-names: C I - family-names: Talbot given-names: Brooke M - family-names: de Korne-Elenbaas given-names: Jolinda - family-names: Mamede given-names: Rafael - family-names: Vila-Cerqueira given-names: Pedro - family-names: Coelho given-names: Luis Pedro - family-names: Gulvik given-names: Christopher A - family-names: Katz given-names: Lee S - family-names: participants given-names: The ASM NGS 2020 Hackathon journal: Microbial Genomics year: '2022' publisher: name: Microbiology Society volume: '8' issue: '3' keywords: - software testing - continuous integration - computational biology abstract: Computational algorithms have become an essential component of research, with great efforts by the scientific community to raise standards on development and distribution of code. Despite these efforts, sustainability and reproducibility are major issues since continued validation through software testing is still not a widely adopted practice. Here, we report seven recommendations that help researchers implement software testing in microbial bioinformatics. We have developed these recommendations based on our experience from a collaborative hackathon organised prior to the American Society for Microbiology Next Generation Sequencing (ASM NGS) 2020 conference. We also present a repository hosting examples and guidelines for testing, available from https://github.com/microbinfie-hackathon2020/CSIS. issn: 2057-5858 doi: 10.1099/mgen.0.000790 url: https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000790 identifiers: - type: url value: https://doi.org/https://doi.org/10.1099/mgen.0.000790
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v2 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- conda-incubator/setup-miniconda v2 composite