lassensus
A tool for Lassa virus consensus sequence generation from long-read sequencing data.
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Repository
A tool for Lassa virus consensus sequence generation from long-read sequencing data.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Lassensus
A tool for Lassa virus consensus sequence generation from long-read sequencing data. Given the extreme sequence divergence of Lassa viruses, proper consensus generation requires careful reference selection, which this tool automates by identifying appropriate GenBank references.
To do this, all near-complete Lassa virus genomes available in GenBank are downloaded. Sample reads are then mapped to each reference individually, and the average identity across all mapped reads is calculated. The reference with the highest overall read identity is selected as the closest match and used to guide consensus sequence generation.
Installation
Option 1: Using Conda (Recommended)
Install Lassensus via Conda:
bash
conda create -n lassensus -c bioconda lassensus -y
conda activate lassensus
Option 2: From Source Code
Create and activate a new conda environment:
bash
conda create -n lassensus -c bioconda python=3.11 minimap2 samtools ivar lassaseq seqtk medaka -y
conda activate lassensus
Install lassensus:
bash
git clone https://github.com/DaanJansen94/lassensus.git
cd lassensus
pip install .
Re-installation (when updates are available):
bash
conda activate lassensus # Make sure you're in the right environment
cd lassensus
git pull # Get the latest updates from GitHub
pip uninstall lassensus -y
pip install .
Note: Any time you modify the code or pull updates from GitHub, you need to reinstall the package using these commands for the changes to take effect.
Usage
If you installed via conda (Option 1):
bash
lassensus --input_dir /path/to/input --output_dir /path/to/output [options]
If you installed from source (Option 2):
bash
conda activate lassensus
lassensus --input_dir /path/to/input --output_dir /path/to/output [options]
Required Arguments
--input_dir: Directory containing input FASTQ files--output_dir: Directory where results will be saved
Optional Arguments
Reference Selection Parameters
--min_identity: Minimum identity threshold for reference selection (default: 90.0)- Minimum percentage identity required for reads to be considered when selecting the best reference
--genome: Genome completeness filter (1=Complete, 2=Partial, 3=None)- Filter references based on genome completeness annotation
- 1: Only complete genomes
- 2: Only partial genomes
- 3: No filtering (both complete and partial)
--completeness: Minimum sequence completeness (1-100 percent)- Filter references based on minimum sequence completeness percentage
- Value between 1-100 representing minimum completeness required
--host: Host filter (1=Human, 2=Rodent, 3=Both, 4=None)- Filter references based on host organism
- 1: Human-derived sequences only
- 2: Rodent-derived sequences only
- 3: Both human and rodent sequences
- 4: No host filtering
--metadata: Metadata filter (1=Location, 2=Date, 3=Both, 4=None)- Filter references based on available metadata
- 1: Must have location metadata
- 2: Must have date metadata
- 3: Must have both location and date metadata
- 4: No metadata filtering
Consensus Generation Parameters
--max_reads: Maximum number of reads to use for consensus generation (default: 1,000,000)- If input has more reads than this threshold, it will be rarefied down to this number
- If input has fewer reads, all reads will be used (no rarefaction)
--min_depth: Minimum depth for consensus calling (default: 50)- This is the minimum number of reads that must cover a position to call a consensus base
- Higher values will result in more stringent consensus calling
- Lower values may allow calling consensus in regions with lower coverage
--min_quality: Minimum quality score for consensus calling (default: 30)- This is the minimum quality score required for a base to be considered in consensus calling
- Higher values will result in more stringent consensus calling
- Lower values may allow calling consensus with lower quality bases
--majority_threshold: Majority rule threshold (default: 0.7)- This is the minimum fraction of reads that must support a base to call it in the consensus
- Value must be between 0 and 1
- Higher values (e.g., 0.9) will require stronger support for variant calls
- Lower values (e.g., 0.5) will allow calling variants with weaker support
Example
```bash
Basic usage with default parameters
lassensus --inputdir /path/to/input --outputdir /path/to/output
Custom parameters for more stringent consensus calling
lassensus --inputdir /path/to/input --outputdir /path/to/output \ --mindepth 100 \ --minquality 40 \ --majority_threshold 0.9
Custom parameters for more lenient consensus calling
lassensus --inputdir /path/to/input --outputdir /path/to/output \ --mindepth 20 \ --minquality 20 \ --majority_threshold 0.5 ```
Output
The tool generates the following outputs for each sample:
- {sample_name}_L_consensus_polished.fasta: Polished consensus sequence for the L segment
- {sample_name}_S_consensus_polished.fasta: Polished consensus sequence for the S segment
Additionally, the tool creates an AllConsensus directory containing:
- L_segment/all_L_consensus.fasta: Multi-fasta file containing all L segment consensus sequences
- S_segment/all_S_consensus.fasta: Multi-fasta file containing all S segment consensus sequences
Dependencies
The following tools are required and will be installed in the conda environment: - minimap2 (for read mapping) - samtools (required by ivar) - ivar (for consensus generation) - lassaseq (for reference selection) - seqtk (for read rarefaction) - medaka (for consensus polishing)
Python dependencies (installed automatically with pip): - biopython - pandas - requests
Features
- Automatic reference selection
- Consensus generation with ivar
- Consensus polishing with medaka
- Multi-fasta generation for all consensus sequences
- Detailed mapping statistics
- Comprehensive output including JSON and human-readable summaries
Citation
If you use Lassensus in your research, please cite:
Jansen, D., Laumen, J., Siebenmann, E., & Vercauteren, K. (2025). Lassenssus: A Command-Line Tool for Lassa virus consensus sequence generation from long-read sequencing data (Version v0.0.1). Zenodo. https://doi.org/10.5281/zenodo.15209207
License
This project is licensed under the GNU General Public License v3.0 (GPL-3.0) - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
If you encounter any problems or have questions, please open an issue on GitHub.
Owner
- Login: DaanJansen94
- Kind: user
- Repositories: 1
- Profile: https://github.com/DaanJansen94
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Jansen
given-names: Daan
orcid: https://orcid.org/0000-0003-4612-6891
- family-names: Laumen
given-names: Jolein
orcid: https://orcid.org/0000-0003-0168-5622
- family-names: Siebenmann
given-names: Emma
- family-names: Vercauteren
given-names: Koen
orcid: https://orcid.org/0000-0003-1472-9938
doi: "10.5281/zenodo.15209207"
title: "Lassenssus: A Command-Line Tool for Lassa virus consensus sequence generation from long-read sequencing data."
version: v0.0.1
url: github.com/DaanJansen94/LassaSeq
date-released: 2025-04-13
abstract: "A command-line tool for generating consensus genomes of highly divergent Lassa viruses through automated reference selection using long-read sequencing data."
GitHub Events
Total
- Watch event: 1
- Push event: 4
- Create event: 1
Last Year
- Watch event: 1
- Push event: 4
- Create event: 1
Dependencies
- biopython >=1.79
- pandas >=1.3.0
- requests >=2.26.0