poppunk
PopPUNK π¨βπ€ (POPulation Partitioning Using Nucleotide Kmers)
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
Found CITATION.cff file -
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
Found .zenodo.json file -
βDOI references
Found 7 DOI reference(s) in README -
βAcademic publication links
-
βCommitters with academic emails
5 of 14 committers (35.7%) from academic institutions -
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Keywords
Repository
PopPUNK π¨βπ€ (POPulation Partitioning Using Nucleotide Kmers)
Basic Info
- Host: GitHub
- Owner: bacpop
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: https://poppunk-docs.bacpop.org/
- Size: 116 MB
Statistics
- Stars: 100
- Watchers: 6
- Forks: 19
- Open Issues: 33
- Releases: 36
Topics
Metadata Files
README.md
POPulation Partitioning Using Nucleotide Kmers 
Description
Links: - Documentation - Databases - Paper
If you find PopPUNK useful, please cite us:
Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD, Croucher NJ. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research 29:304-316 (2019). doi:10.1101/gr.241455.118
You can also run your command with --citation to get a list of citations and a suggested methods paragraph.
News and roadmap
The roadmap can be found in the documentation.
2024-08-07
PopPUNK 2.7.0 comes with two changes:
- Distance matrices <db_name>.dists.npy are no longer required or written when using
poppunk_assign, with or without --update-db. These can be very large, especially
with many samples, so this saves space and memory in model reuse and distribution. Note that
the <db_name>.dists.pkl file is still required (but this is small).
- We have added a --stable flag to poppunk_assign. Rather than merging hybrid clusters,
new samples will simply be assigned to their nearest neighbour. This implies --serial and
cannot be run with --update-db. This behaviour mimics the 'stable nomenclature' of schemes
such as LIN.
2023-01-18
We have retired the PopPUNK website. Databases have been expanded, and can be found here: https://www.bacpop.org/poppunk-databases/.
2022-08-04
The change in scikit-learn's API in v1.0.0 and above mean that HDBSCAN models
fitted with sklearn <=v0.24 will give an error when loaded. If you run into this,
the solution is one of:
- Downgrade sklearn to v0.24.
- Run model refinement to turn your model into a boundary model instead (this will
change clusters).
- Refit your model in an environment with sklearn >=v1.0.
If this is a common problem let us know, as we could write a script to 'upgrade' HDBSCAN models. See issue #213 for more details.
2021-03-15
We have fixed a number of bugs with may affect the use of poppunk_assign with
--update-db. We have also fixed a number of bugs with GPU distances. These are
'advanced' features and are not likely to be encountered in most cases, but if you do wish to use either of these features please make sure that you are using
PopPUNK >=v2.4.0 with pp-sketchlib >=v1.7.0.
2020-09-30
We have discovered a bug affecting the interaction of pp-sketchlib and PopPUNK.
If you have used PopPUNK >=v2.0.0 with pp-sketchlib <v1.5.1 label order may
be incorrect (see issue #95).
Please upgrade to PopPUNK >=v2.2 and pp-sketchlib >=v1.5.1. If this is not
possible, you can either:
- Run scripts/poppunk_pickle_fix.py on your .dists.pkl file and re-run
model fits.
- Create the database with poppunk_sketch directly, rather than
PopPUNK --create-db
Installation
This is for the command line version. For more details see installation in the documentation.
Our (beta) web interface BeeBOP is now also available: https://beebop.dide.ic.ac.uk/
Through conda (recommended)
The easiest way is through conda, which is most easily accessed by first
installing miniconda. PopPUNK can then
be installed by running:
conda install poppunk
If the package cannot be found you will need to add the necessary channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
Quick usage
See the overview first. There are two ways of running:
With a supported species
1) Download an existing database. 2) Run assignment.
With a new species.
1) Create sketches of input. 2) Run QC. 3) Build a model.
Docker image
A docker image is available
docker pull mrcide/poppunk:bacpop-20
Owner
- Name: Bacterial population genetics
- Login: bacpop
- Kind: organization
- Email: contact@bacpop.org
- Location: United Kingdom
- Website: www.bacpop.org
- Repositories: 20
- Profile: https://github.com/bacpop
Pathogen Informatics and Modelling @ EMBL-EBI / Bacterial Evolutionary Epidemiology Group @ Imperial College London
Citation (CITATION.bib)
@ARTICLE{Lees2019-tw,
title = "Fast and flexible bacterial genomic epidemiology with {PopPUNK}",
author = "Lees, John A and Harris, Simon R and Tonkin-Hill, Gerry and
Gladstone, Rebecca A and Lo, Stephanie W and Weiser, Jeffrey N
and Corander, Jukka and Bentley, Stephen D and Croucher, Nicholas
J",
abstract = "The routine use of genomics for disease surveillance provides the
opportunity for high-resolution bacterial epidemiology. Current
whole-genome clustering and multilocus typing approaches do not
fully exploit core and accessory genomic variation, and they
cannot both automatically identify, and subsequently expand,
clusters of significantly similar isolates in large data sets
spanning entire species. Here, we describe PopPUNK (Population
Partitioning Using Nucleotide K -mers), a software implementing
scalable and expandable annotation- and alignment-free methods
for population analysis and clustering. Variable-length k-mer
comparisons are used to distinguish isolates' divergence in
shared sequence and gene content, which we demonstrate to be
accurate over multiple orders of magnitude using data from both
simulations and genomic collections representing 10 taxonomically
widespread species. Connections between closely related isolates
of the same strain are robustly identified, despite interspecies
variation in the pairwise distance distributions that reflects
species' diverse evolutionary patterns. PopPUNK can process
103-104 genomes in a single batch, with minimal memory use and
runtimes up to 200-fold faster than existing model-based methods.
Clusters of strains remain consistent as new batches of genomes
are added, which is achieved without needing to reanalyze all
genomes de novo. This facilitates real-time surveillance with
consistent cluster naming between studies and allows for outbreak
detection using hundreds of genomes in minutes. Interactive
visualization and online publication is streamlined through the
automatic output of results to multiple platforms. PopPUNK has
been designed as a flexible platform that addresses important
issues with currently used whole-genome clustering and typing
methods, and has potential uses across bacterial genetics and
public health research.",
journal = "Genome Res.",
volume = 29,
number = 2,
pages = "304--316",
month = jan,
year = 2019,
language = "en"
}
GitHub Events
Total
- Create event: 14
- Release event: 4
- Issues event: 20
- Watch event: 8
- Delete event: 5
- Issue comment event: 39
- Push event: 96
- Pull request event: 17
- Pull request review comment event: 27
- Pull request review event: 28
- Fork event: 2
Last Year
- Create event: 14
- Release event: 4
- Issues event: 20
- Watch event: 8
- Delete event: 5
- Issue comment event: 39
- Push event: 96
- Pull request event: 17
- Pull request review comment event: 27
- Pull request review event: 28
- Fork event: 2
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| nickjcroucher | n****r@i****k | 1,055 |
| John Lees | l****6@g****m | 777 |
| Danderson123 | d****1@h****m | 24 |
| Croucher | n****e@i****e | 23 |
| muppi1993 | c****0@i****k | 18 |
| Bin Zhao | 4****5 | 17 |
| Rich FitzJohn | r****n@i****k | 9 |
| Daniel Anderson | 4****3 | 6 |
| Harry Hung | 4****g | 4 |
| muppi1993 | 7****3 | 4 |
| Sam Horsfield | s****9@i****k | 3 |
| Nicholas Croucher | n****3@s****k | 2 |
| Jason Stajich | j****d@g****m | 1 |
| Tommi MΓ€klin | t****i@m****i | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 7
- Total pull requests: 9
- Average time to close issues: 4 days
- Average time to close pull requests: about 20 hours
- Total issue authors: 6
- Total pull request authors: 4
- Average comments per issue: 0.43
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 7
- Pull requests: 9
- Average time to close issues: 4 days
- Average time to close pull requests: about 20 hours
- Issue authors: 6
- Pull request authors: 4
- Average comments per issue: 0.43
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- johnlees (3)
- DOH-JDJ0303 (3)
- luciagrami (2)
- erinyoung (2)
- zjx22018105-coder (1)
- RyanCFink (1)
- drhoads (1)
- ayabtg (1)
- tanzhizhou (1)
- Jamesped (1)
- nermze (1)
- HarryHung (1)
- fgonzalez3 (1)
- RuwiniK (1)
- rderelle (1)
Pull Request Authors
- nickjcroucher (16)
- absternator (6)
- johnlees (3)
- samhorsfield96 (2)
- tgttunstall (1)
- ERBringHorvath (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- boost-cpp
- cmake >=3.18
- dendropy >=4.4.0
- eigen
- flask
- flask-apscheduler
- flask-cors
- graph-tool >=2.35
- gunicorn
- h5py
- hdbscan
- libgfortran-ng
- libgomp
- matplotlib
- matplotlib-base
- networkx
- numpy
- openblas
- pandas
- pip
- pp-sketchlib >=1.7.0
- pybind11
- python-dateutil
- rapidnj
- requests
- scikit-learn >=0.24
- scipy
- tqdm
- treeswift
- tzlocal <3.0
- xorg-libxaw
- xorg-libxcomposite
- xorg-libxcursor
- xorg-libxdamage
- xorg-libxfixes
- xorg-libxi
- xorg-libxinerama
- xorg-libxpm
- xorg-libxrandr
- Cython >=0.26.1
- docutils <0.18
- actions/checkout master composite
- azure/docker-login v1 composite
- azure/login v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- mamba-org/provision-with-micromamba main composite
- actions/checkout v2 composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- ubuntu 20.04 build
- python 3.10 build
- biopython *
- h5py *
- hdbscan *
- mandrake *
- matplotlib *
- networkx *
- pandas *
- pp-sketchlib *
- requests *
- scikit-learn *
- tqdm *
- treeswift *