Recent Releases of lorikeet-genome
lorikeet-genome - v0.8.2
What's Changed
- Doco fixes by @wwood in https://github.com/rhysnewell/Lorikeet/pull/54
- Lower mem usage by @rhysnewell in https://github.com/rhysnewell/Lorikeet/pull/55
- Dev by @rhysnewell in https://github.com/rhysnewell/Lorikeet/pull/56
Full Changelog: https://github.com/rhysnewell/Lorikeet/compare/v0.8.1...v0.8.2
- Rust
Published by rhysnewell over 2 years ago
lorikeet-genome - v0.8.1
What's Changed
- Merge 'Dev' in to "master" by @rhysnewell in https://github.com/rhysnewell/Lorikeet/pull/49
- Compilation fix by @wwood in https://github.com/rhysnewell/Lorikeet/pull/50
- Fix VCF annotations by @rhysnewell in https://github.com/rhysnewell/Lorikeet/pull/51
- Dev to main by @rhysnewell in https://github.com/rhysnewell/Lorikeet/pull/53
Full Changelog: https://github.com/rhysnewell/Lorikeet/compare/v0.8.0...v0.8.1
- Rust
Published by rhysnewell almost 3 years ago
lorikeet-genome - v0.8.0
What's Changed
- Catching dev up to master branch by @rhysnewell in https://github.com/rhysnewell/Lorikeet/pull/46
- cli: Allow --profile very-fast. by @wwood in https://github.com/rhysnewell/Lorikeet/pull/47
Full Changelog: https://github.com/rhysnewell/Lorikeet/compare/v0.7.3...v0.8.0
- Rust
Published by rhysnewell about 3 years ago
lorikeet-genome - v0.6.0rc2
Version 0.6.0 - release candidate 2
This release candidate reintroduces consensus genome calling and strain genome discovery. It also updates the linkage algorithm from previous versions, now utilizing a more sophisticated graph based approach for linking clusters
- Rust
Published by rhysnewell over 4 years ago
lorikeet-genome - v0.6.0rc1
v0.6.0 Release Candidate 1
This release introduces the completely overhauled variant calling setup for Lorikeet. No longer does lorikeet rely on threshold based variant calling approaches, and instead takes a more sophisticated approach utilising local re-assembly of active regions. This release includes a reimplementation of the GATK HaplotypeCaller algorithm but in Rust, so hopefully it is faster. It will be at least be easier to parse multiple genomes + samples into the algorithm at once to generate called variants.
Currently, the strain resolving part of lorikeet is hidden and will be re-enabled ASAP.
The HaplotypeCaller algorithm involves breaking up genomes into potential active regions and then performing local re-assembly with the reads that mapped to those locations. The local assembly is then searched for potential haplotypes using a number of techniques and candidate haplotypes are assigned likelihoods using a pairwise HMM model to re-assign reads to the haplotypes. Ultimately, the HaplotypeCaller algorithm produces sets of high confidence variants with depths across samples.
The HaplotypeCaller code was re-implemented in Rust in order to potentially speed up the variant calling process, make it easier to parse multiple genomes and samples into the algorithm, and hopefully make use of some of the code base in future projects and in the strain resolving pipeline.
The code requires benchmarking, but early indications from tests and small datasets puts the Lorikeet variant calling speed on par with the Java implementation. I believe the real speed up will appear when multiple genomes are supplied to Lorikeet as they will be run in parallel seamlessly.
Additionally, a number of code clean-ups should be implemented as soon as possible. Primarily around the BirdToolRead, SequencesForKmers, and Kmers data structures. Currently, accessing the bytes within a read requires cloning the data with no option to create a reference pointing the data (without the added complexity of decoding every encoded base). This means SequencesForKmers and Kmers each hold a clone of the read bases which is very costly. I believe by adding a bases field to BirdToolRead that is updated when the underlying Read is changed, we can change those clones to be references and wrangle with the lifetimes to significantly speed up the graph building stage of the algorithm.
TODO:
Reimplement strain calling + abundance estimation Reimplement consensus calling Update README Update Workflow image Various code improvements
- Rust
Published by rhysnewell over 4 years ago
lorikeet-genome - Revised genotyping
So, in keeping with tradition this release brings a bunch of changes to Lorikeet that make it pretty distant from where it was a month ago. I know only a few people are trying to keep track of all changes that keep being made here, and I'm sorry things are so stochastic. I think the words of my supervisor put it best when I told him about one of the changes I had made... "Ah, so freebayes is out this week, huh?"
Yeah, freebayes is out. Cancelled. For generating illegal instructions and segmentation fault on GPU nodes. I ain't fixing that, I'll just make my own variant caller.
Lorikeet's new best friends are UMAP and HDBSCAN. The curse of dimensionality hexed me pretty good during benchmarking, so UMAP is being used for dimensionality reduction. I chose it over PCA since it seems to discriminate grouping of variants way better. Also, since we now have to use a python library for UMAP, might as well upgrade fuzzy DBSCAN to it's better version: HDBSCAN
Changes:
- Freebayes. OUT.
- Fuzzy DBSCAN. OUT.
- UMAP. IN.
- HDBSCAN. IN.
- Evolve now reports per sample dNdS and coverage values for each ORF
Current workflow:

- Rust
Published by rhysnewell over 5 years ago
lorikeet-genome - v0.4.0
Version 0.4.0 A move to 0.3.x to 0.4.0 is not done lightly. Version 0.4.0 marks a major milestone in the development of lorikeet and with it comes many feature updates that are either polish mechanics of previous releases or brand new features that I hope users will find useful in understanding what lorikeet is doing.
Major changes: SNP calling: ✨ - Lorikeet now has an inbuilt snp calling algorithm that is paired with freebayes to help extract SNPs for each input sample and help with the guided variant calling
SPEED: :running: :dash:
One of the guiding principals I had in mind when developing lorikeet was speed. Speed is a partial inspiration behind the name "Lorikeet". Lorikeets are strikingly fast birds that tend to fly in groups. Much the same that Lorikeet "flies" in parallel threads. This update reaches what I think is the optimal balance between speed and memory restrictions.
- You can now specify how many genomes to run in parallel.
- Contigs for each genome now run in parallel.
- Multiple iterators have been optimized to better utilize the capabilities of rayon
Progress: 🔢 👀
No longer will you be bombarded by a ridiculous amount of info messages that won't make much sense to anyone but me. Thanks to indicatif, Lorikeet now has a bunch of fancy progress bars with associated ETA timers which - albeit sometimes inaccurately - provide the user with a better understanding of what is happening under the hood for each sample and each reference in their current run.
Additionally, if a run for whatever reason crashes before completion Lorikeet will now pick up from specific checkpoints and avoid rerunning entire anlayses for a specific genomes. This can be overwritten with the --force command
Outputs: :suspect: :alien:
An additional file is now output for all major modes that helps tell the user how distant a specific reference might be between samples. The adjacency matrix tells the user how many variants are shared between samples for a specific reference. This will provide output similar to the trees that can be generated by taking the consensus genomes generated by polish and parsing them to a tool like parsnp.
Speaking of polish, a bug has been fixed which prevented the vcf file being output for any mode other than genotype
Genotyping: 🐀 :mouse2: 🐩 :dog2: The genotyping algorithm has seen a bunch of changes. Not all of them will be listed here as it is quite a lot. - DBSCAN now updates parameters for each reference genome based on whether or not the supplied parameters generate clusters that make sense. i.e. Not every variant can cluster by itself, not all variants can be in the same cluster (usually) - The read phasing linkage algorithm now happens after DBSCAN. So DBSCAN is seeding the linkage algorithm now. This will provide much the same results as before but at much faster speeds.
In addition, there have been a BUNCH of bug fixes.
- Rust
Published by rhysnewell over 5 years ago
lorikeet-genome - v0.3.7
Multiple bug fixes - Multiple instances of index out of bound errors - Identified cause of freebayes failure on large metagenomes
EM algorithm for strain coverage detection implemented and working. Updated read phasing to and clustering to prevent too highly similar clustering to occur
- Rust
Published by rhysnewell almost 6 years ago
lorikeet-genome - v0.3.6
New features: Guided variant calling now working on some MNVs, INS and DEL events Can now parse directory of genomes for easier use Various bug fixes
- Rust
Published by rhysnewell almost 6 years ago
lorikeet-genome - v0.3.5
NEW RELEASE Evolve outputs GFF with dNdS values per reference Uses Prokka and Prodigal Faster compute times Updated help commands Using Phi-D as proportionality metric
BUG FIXES Update contig ID bug preventing contigs being output into strain genotypes
- Rust
Published by rhysnewell almost 6 years ago
lorikeet-genome - v0.3.4
- valid version string in the Cargo.toml (We didn't think we had the technology for this, but we did it)
- Rust
Published by rhysnewell almost 6 years ago
lorikeet-genome - v0.3.3a
Small update to variant calling. No longer filter out soft and hard clips using samclip.
- Rust
Published by rhysnewell almost 6 years ago
lorikeet-genome - v0.3.3
Updated to using Freebayes for SNP calling and SVIM for structural variant calling. Added in guided variant calling algorithm to rescue low abudance variants. Added in seeded fuzzy DBSCAN algorithm. Updated some help messages, many flags still hidden for testing purposes.
- Rust
Published by rhysnewell almost 6 years ago
lorikeet-genome - 0.3.2
Updated Lorikeet to use both short and long read variant callers: Snippy and SVIM VCF files are now generated for each BAM, reads are used to phase variants between samples
- Rust
Published by rhysnewell about 6 years ago
lorikeet-genome - 0.2.9
Added experimental genotype method. Updated help messages. included extra flags: include-supplementary include-secondary
- Rust
Published by rhysnewell over 6 years ago
lorikeet-genome - v0.2.5
First release of Lorikeet with current implemented modes: Polymorph - Variant calling pipeline Summarize - Summarize contig statistics Evolve - Calculates dN/dS values of genes present in reference based on read mappings
May contain bugs
- Rust
Published by rhysnewell over 6 years ago