lorax
A long-read analysis toolbox for cancer and population genomics
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.4%) to scientific vocabulary
Keywords
Repository
A long-read analysis toolbox for cancer and population genomics
Basic Info
Statistics
- Stars: 23
- Watchers: 2
- Forks: 1
- Open Issues: 1
- Releases: 8
Topics
Metadata Files
README.md
Lorax: A long-read analysis toolbox for cancer and population genomics
In cancer genomics, long-read de novo assembly approaches may not be applicable because of tumor heterogeneity, normal cell contamination and aneuploid chromosomes. Generating sufficiently high coverage for each derivative, potentially sub-clonal, chromosome is not feasible. Lorax is a targeted approach to reveal complex cancer genomic structures such as telomere fusions, templated insertions or chromothripsis rearrangements. Lorax is NOT a long-read SV caller, this functionality is implemented in delly.
Installing lorax
Lorax is available as a statically linked binary, a singularity container (SIF file) or as a docker container. You can also build lorax from source using a recursive clone and make. Lorax depends on HTSlib and Boost.
git clone --recursive https://github.com/tobiasrausch/lorax.git
cd lorax/
make all
Linear reference genomes
Lorax has several subcommands for alignments to linear reference genomes.
Templated insertion threads
Templated insertions threads can be identified using
lorax tithreads -g hg38.fa -o tithreads.bed -m control.bam tumor.bam
The out.bed file specifies nodes (templated insertion source sequences) and edges (templated insertion adjacencies) of a graph that can be plotted using dot.
cut -f 4,9 out.bed | sed -e '1s/^/graph {\n/' | sed -e '$a}' > out.dot
dot -Tpdf out.dot -o out.pdf
The out.reads file lists unique assignments of reads to templated insertion source sequences. To extract the FASTA sequences for all these reads use the lorax extract subcommand (below) with the -a option.
tail -n +2 out.reads | cut -f 1 | sort | uniq > reads.lst
lorax extract -a -g hg38.fa -r reads.lst tumor.bam
Telomere repeats associated with complex rearrangements
Telomere-associated SVs can be identified with lorax using
lorax telomere -g t2t.fa -o outprefix tumor.bam
The output files cluster reads into distinct telomere junctions that can be locally assembled. Since telomeres are repetitive, common mis-mapping artifacts found in a panel of normal samples are provided in the maps subdirectory. It is recommended to use the telomere-to-telomere assembly as the reference genome for lorax telomere.
Read selection for targeted assembly of amplicons
Given a list of amplicon regions and a phased VCF file, lorax can be used to extract amplicon reads for targeted assembly approaches.
lorax amplicon -g hg38.fa -s sample -v phased.bcf -b amplicons.bed tumor.bam
The amplicon subcommand outputs the selected reads (as a hash list out.reads) and a diagnostic table (out.bed) with amplicon regions and their support by split-reads. Ideally, all amplicon regions are connected and belong to one connected component (one cluster of amplicons). This amplicon graph can be plotted using dot.
cut -f 4,11 out.bed | sed -e '1s/^/graph {\n/' | sed -e '$a}' > out.dot
dot -Tpdf out.dot -o out.pdf
To extract the FASTA sequences for all reads use the lorax extract subcommand (below) with the -a option.
Extracting pairwise matches and FASTA sequences of reads
To get FASTA sequences and pairwise read to genome matches for a list of reads (list.reads) use
lorax extract -g hg38.fa -r list.reads tumor.bam
If the read list contains hashes instead of read names as from the lorax amplicon subcommand then please use the -a command-line option.
lorax extract -a -g hg38.fa -r list.reads tumor.bam
Pan-genome graphs
For pan-genome graphs and pan-genome graph alignments, lorax supports the below subcommands, some are work-in-progress.
Connected components of a pan-genome graph
lorax components pangenome.gfa.gz > comp.tsv
Converting a pan-genome (sub-)graph to dot format
lorax gfa2dot -s s103 -r 3 pangenome.gfa.gz > graph.dot
dot -Tpng graph.dot > graph.png
Converting pan-genome graph alignments to BAM
With long reads aligned to a pan-genome graph
minigraph --vc -cx lr pangenome.gfa.gz input.fastq.gz | bgzip > sample.gaf.gz
lorax can be used to convert the graph alignment to BAM
lorax convert -g pangenome.gfa.gz -f input.fastq.gz sample.gaf.gz | samtools sort -o sample.bam -
Node coverage of pan-genome graph alignments
lorax ncov -g pangenome.gfa.gz sample.gaf.gz > ncov.tsv
Citation
Tobias Rausch, Rene Snajder, Adrien Leger, Milena Simovic, Mădălina Giurgiu, Laura Villacorta, Anton G. Henssen, Stefan Fröhling, Oliver Stegle, Ewan Birney, Marc Jan Bonder, Aurelie Ernst, Jan O. Korbel
Long-read sequencing of diagnosis and post-therapy medulloblastoma reveals complex rearrangement patterns and epigenetic signatures
Cell Genomics, 2023, 100281, DOI: 10.1016/j.xgen.2023.100281
License
Lorax is distributed under the BSD 3-Clause license. Consult the accompanying LICENSE file for more details.
Owner
- Name: Tobias Rausch
- Login: tobiasrausch
- Kind: user
- Location: Germany
- Company: EMBL
- Website: tobiasrausch.com
- Twitter: tobias_757
- Repositories: 7
- Profile: https://github.com/tobiasrausch
Researcher in Computational Genomics
GitHub Events
Total
- Issues event: 1
- Watch event: 3
- Issue comment event: 2
- Push event: 9
Last Year
- Issues event: 1
- Watch event: 3
- Issue comment event: 2
- Push event: 9
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ucsfpan (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- actions/checkout v2 composite
- docker/build-push-action v1 composite
- alpine latest build
- ubuntu 22.04 build
- actions/checkout v4 composite
- docker/build-push-action 3b5e8027fcad23fda98b2e3ac259d8d67585f671 composite
- docker/login-action f4ef78c080cd8ba55a85445d5b36e214a81df20a composite
- docker/metadata-action 9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7 composite