Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
â—‹Academic publication links
-
â—‹Academic email domains
-
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary
Repository
Population analysis PIPEline 🛠🧬
Basic Info
- Host: GitHub
- Owner: bacpop
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: https://poppunk.bacpop.org/subclustering.html
- Size: 714 KB
Statistics
- Stars: 17
- Watchers: 2
- Forks: 6
- Open Issues: 1
- Releases: 3
Metadata Files
README.md
PopPIPE: Population analysis PIPEline 🛠🧬
Downstream analysis of PopPUNK results. Produces subclusters and visualisations of all strains.
Paper
McHugh, M. P., Horsfield, S. T., von Wachsmann, J., Toussaint, J., Pettigrew, K. A., Czarniak, E., Evans, T. J., Leanord, A., Tysall, L., Gillespie, S. H., Templeton, K. E., Holden, M. T. G., Croucher, N. J., & Lees, J. A. (2025).
Integrated population clustering and genomic epidemiology with PopPIPE. Microbial Genomics, 11(4), 001404. https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001404
Use cases
Subcluster analysis (default target)
To run the default pipeline for subcluster analysis run:
snakemake --cores 4
The default pipeline consists of the following steps: - Split files into their PopPUNK strains. - Use pp-sketchlib to calculate core and accessory distances within each strain. - Use core distances and rapidnj to make a neighbour-joining tree. - Use SKA2 to generate within-strain alignments in reference free mode. - Use IQ-TREE to generate an ML phylogeny using this alignment, and the NJ tree as a starting point. - Use fastbaps to generate subclusters which are partitions of the phylogeny.

For an example of this analysis, please find data at 10.6084/m9.figshare.28429574
With make_microreact target
snakemake make_microreact
In addition to the above: - Create an overall visualisation with both core and accessory distances, as in PopPUNK. The final tree consists of refining the NJ tree by grafting the maximum likelihood trees for subclusters to their matching nodes. - Use microreact to display the results.
snakemake make_microreact:
With transmission target
snakemake transmission --cores 4
In addition to the above, for each strain: - Use SKA2 map to generate within-strain alignments in reference-based mode. - Use gubbins to remove recombination. - Use bactdating to make timed trees. - Use transphylo to infer transmission events on these timed trees.
This requires a transmission_metadata.csv file containing sampling times, see the PopPIPE configuration section below for a description of its format.
For an example of this analysis, please find data at 10.6084/m9.figshare.28495571.
Input files and config file are in input/ and the pipeline output after running snakemake transmission --cores 4 is
in output/.
Installation
The supported method is to use mamba, which is most easily accessed by first
installing micromamba. Install with:
mamba create -n poppipe --file=environment.yml
mamba activate poppipe
If you are using an ARM Mac, some packages may not be available yet. To use the intel
packages prepend CONDA_SUBDIR=osx-64 to the mamba create command.
Running inside a container
An alternative, if you are having trouble with the above, is to use the PopPIPE docker
container. If you are comfortable running commands inside docker containers and mounting
your external files, the whole pipeline is in the container available by running:
docker pull poppunk/poppipe:latest
You can also follow the above process and make a local clone of snakemake, and replace
--use-conda with --use-singularity, which will automatically pull this container, and run
each step inside it.
Use --singularity-args if you need to bind directories.
Usage
- Modify
config.ymlas appropriate. - Run
snakemake --cores <n_cores>.
In particular, check the three poppunk_ arguments, which should be set to the
full path of the --r-files argument, strain clusters .csv file and .h5 database file,
from your PopPUNK run.
On a cluster or the cloud, you can use snakemake's built-in --cluster argument:
snakemake --cluster qsub -j 16
See the snakemake docs
for more information on your cluster/cloud provider.
Creating visualisations
The default target is the first in the Snakefile: cluster_summary. This
produces subclusters but no visualisations. To continue the run forward from here
use the microreact target:
snakemake make_microreact
This will create a phylogeny, embedding and format your strains and their subclusters
for microreact and save these files to the output. The phylogeny
and clusters will be sent to microreact, and a link to your page will be output to the terminal
and saved in output/microreact_url.txt.
NB From 2021-10-27 Microreact requires an API key for the final step to work. See the microreact docs for instructions on how to generate one for your account.
Running transmission detection
To run the transmission pipeline, make sure you have provided a transmission metadata CSV file
in config.yml which lists the sampling dates. The run the transmission target:
snakemake transmission
Usage example
TODO
Config file
PopPIPE configuration
poppipe_location: Thescripts/directory, if not running from the root of this repositorypoppunk_rfile: The--rfileused with PopPUNK, which lists sample names and files, one per line, tab separated.poppunk_clusters: The PopPUNK cluster CSV file, usuallypoppunk_db/poppunk_db_clusters.csv.poppunk_h5: The PopPUNK HDF5 database file.transmission_metadata: (optional) a CSV file with strain names in a column labelled 'Name' and sampling dates labelled 'Date'. The date can be in format YYYY-mm-dd, YYYY/mm/dd or just the year YYYY.min_cluster_size: The minimum size of a cluster to run the analysis on (recommended at least 6).
SKA configuration
fastq_qual: With read input, the-qoption, which ignores k-mers with bases below this score.fastq_cov: With read input, the-coption, which sets a minimum k-mer count.kmer: The k-mer size, choose longer k-mers for less diverse clusters.single_strand: Set to true is sequences are single-straded and phased (e.g. RNA viruses).freq_filter: Minimum frequency of samples a split k-mer must appear in to be in the alignment.
IQ-TREE configuration
enabled: Set tofalseto turn off ML tree generation, and use the NJ tree throughout.mode: Set tofullto run with the specified model, set tofastto run using--fast(like fasttree).model: A string for the-mparameter describing the model. Adding+ASCis not recommended.
fastbaps configuration
levels: Number of levels of recursive subclustering.script: Location of therun_fastbapsscript. Find by runningsystem.file("run_fastbaps", package = "fastbaps")in R.
mandrake configuration
knn: Number of nearest neighbours (at least two).perplexity: Perplexity parameter for t-SNE (between 5 and 50).maxIter: Iterations in the optimisation (at least 10000, default 100000).
Microreact configuration
name: Title of the Microreact to producewebsite: Website link to give in Microreachemail: Contact email to list in Microreactapi_token: The API token from your Microreact account
Gubbins configuration
prefix: Folder name for gubbins resultstree_builder: Program to use to build trees.min_snps: Min SNPs to identify a recombination blockmin_window_size: Minimum window size of blocksmax_window_size: Maximum window size of blocksiterations: Maximum iterations of tree building and removal
Transphylo configuration
dateT: Date when transmission stops. If zero, the maximum date is used. If supplied, this needs to be on the same scale as bactdating, which is number of years since 1970 (i.e. Unix format divided by 365).
Otherwise these options are passed to the inferTTree() method in transphylo and we refer users to the reference documentation.
Updating a run with results from poppunk_assign
You can use the helper script poppipe_assign.py to help you re-run after query assignment. For example, if you assigned to a database with:
poppunk_assign --db listeria_rlist --query qlist.txt --output listeria_qlist
Give the same arguments, and your config file to python poppipe_assign.py:
python poppipe_assign.py --db listeria_rlist --query qlist.txt --output listeria_qlist --config config.yml
This will generate combined input files, and a new config file. Run snakemake again:
snakemake --configfile configv2rfmbr9.yml
Here, the first snakemake pipeline ran on four strains consisting of 83 samples:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 cluster_summary
4 fastbaps
4 generate_nj
4 iq_tree
4 ska_align
58 ska_index
4 sketchlib_dists
4 split_strains
83
The second one, with the new queries, had one more strain, and nineteen new samples to index:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 cluster_summary
5 fastbaps
5 generate_nj
5 iq_tree
5 ska_align
19 ska_index
5 sketchlib_dists
5 split_strains
50
NB: This will re-run all downstream steps for each strain, other than the ska index steps. If you have a small number of strains being changed this is likely to be inefficient. If you would like us to support this type of analysis please get in touch.
Owner
- Name: Bacterial population genetics
- Login: bacpop
- Kind: organization
- Email: contact@bacpop.org
- Location: United Kingdom
- Website: www.bacpop.org
- Repositories: 20
- Profile: https://github.com/bacpop
Pathogen Informatics and Modelling @ EMBL-EBI / Bacterial Evolutionary Epidemiology Group @ Imperial College London
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite both the article from preferred-citation and the software itself.
authors:
- family-names: "Lees"
given-names: "John A."
- family-names: von Wachsmann
given-names: Johanna
- family-names: "Toussaint"
given-names: "Jacqueline"
title: PopPIPE (Population analysis PIPEline)
version: 1.2.0
url: https://github.com/bacpop/PopPIPE
date-released: '2025-04-29'
preferred-citation:
authors:
- family-names: "McHugh"
given-names: "Martin P."
- family-names: "Horsfield"
given-names: "Samuel T."
- family-names: "von Wachsmann"
given-names: "Johanna"
- family-names: "Toussaint"
given-names: "Jacqueline"
- family-names: "Pettigrew"
given-names: "Kerry A."
- family-names: "Czarniak"
given-names: "Elzbieta"
- family-names: "Evans"
given-names: "Thomas J."
- family-names: "Leanord"
given-names: "Alistair"
- family-names: "Tysall"
given-names: "Luke"
- family-names: "Gillespie"
given-names: "Stephen H."
- family-names: "Templeton"
given-names: "Kate E."
- family-names: "Holden"
given-names: "Matthew T. G."
- family-names: "Croucher"
given-names: "Nicholas J."
- family-names: "Lees"
given-names: "John A."
title: "Integrated population clustering and genomic epidemiology with PopPIPE"
journal: "Microbial Genomics"
volume: 11
issue: 4
year: 2025
doi: 10.1099/mgen.0.001404
url: https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001404
type: article
conference: {}
publisher: {}
GitHub Events
Total
- Create event: 3
- Release event: 1
- Issues event: 5
- Watch event: 6
- Member event: 2
- Issue comment event: 15
- Push event: 11
- Pull request event: 8
- Fork event: 1
Last Year
- Create event: 3
- Release event: 1
- Issues event: 5
- Watch event: 6
- Member event: 2
- Issue comment event: 15
- Push event: 11
- Pull request event: 8
- Fork event: 1
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 3
- Total pull requests: 4
- Average time to close issues: 1 day
- Average time to close pull requests: 5 days
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 1.33
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 4
- Average time to close issues: 1 day
- Average time to close pull requests: 5 days
- Issue authors: 2
- Pull request authors: 2
- Average comments per issue: 1.33
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sbenvari (2)
- johnlees (2)
- martinmchugh (1)
Pull Request Authors
- johnlees (3)
- sbenvari (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- docker/build-push-action v3 composite
- docker/login-action v1 composite
- docker/metadata-action v4 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- ubuntu 20.04 build
- ete3 >=3.1
- iqtree >=2.0.3
- mandrake >=1.2.1
- numpy
- poppunk >=2.3.0
- pp-sketchlib >=2.0.0
- python
- r-fastbaps >=1.0.3
- rapidnj >=v2.3.2
- ska >=1.0