https://github.com/bramvandijk88/xenoseq
Pipeline to automate the comparison between short-read libraries to detect foreign ("xenotypic") sequences.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Repository
Pipeline to automate the comparison between short-read libraries to detect foreign ("xenotypic") sequences.
Basic Info
Statistics
- Stars: 4
- Watchers: 1
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Xenoseq
__ __
/ | / |
AT | GC | ______ _______ ______ _______ ______ ______
GG \/AG/ / \ / \ / \ / | / \ / \
TA TT< /GATACT |TAAATGG |/CCGTAA |/AATAAAS/ /ATTTCT |/ATGTTA |
TGAC \ AA GA |TG | GG |AG | AG |TT \ TT CG |GA | AT |
AA /AT |GATCCCGT/ TA | AA |CG \__AG | GGAACT |TACGGGTA/ GT \__AT |
AA | GC |CC |GA | GT |GC GT/ / GG/ TA |AG AT |
GG/ TG/ GTAGGCC/ CA/ TT/ TAAATG/ ATGCGCG/ ATGCAAT/ AGGGTTT |
AA |
AA |
AA/
Xenoseq is a simple bioinformatic pipeline to find sequences that appear to be newly introduced into a community. The input are (sets of) query and reference samples, which the pipeline will use to generate the following output:
- unique_contigs.fasta; sequences in query not present in reference (e.g. ancestral) samples
- uniquecontigalllinksL300_P97.tbl; table linking contigs to other communities using BLAST, with options L and P in the filename
- xenotypic_contigs.txt; text file with names of all xenotypic contigs in them
- xenotypic_contigs.fasta; the subset of unique contigs that can be linked to another reference using BLAST
- xenotypic_coverage.txt; text files to estimate the abundance of xenotypic sequences in non-ancestral samples
- source_contigs; directory with fasta files from reference samples that are themselves / are linked to MGEs
Overview
Xenoseq wraps read trimming (fastp), assembly (megahit), read mapping (BWA), read filtering (samtools), and local alignment (blast), to detect putative evidence
of horizontal transfer between communities. This tool was used in a recent publication
(

Install: conda setup (required)
To use xenoseq, a conda environment file is provided to install the above mentioned dependencies (for information on conda, see docs.conda.io/projects/conda/en/latest/)
To install the xenoseq environment, simply use:
git clone https://github.com/bramvandijk88/xenoseq.git
cd xenoseq
conda env create -f environment.yml
conda activate xenoseq
Install: add to path (optional quality-of-life improvement)
To run the pipeline, you either have to provide the full path to the xenoseq binary (e.g. /home/user/XENOSEQ_DIR/xenoseq, or add the xenoseq directory (where you cloned the repository) to your global PATH variable:
export PATH='$PATH:<XENOSEQ_DIR>'
e.g. if you cloned / downloaded xenoseq into your home dir:
export PATH='$PATH:~/xenoseq'
If you don't want to do this each time you login to a new terminal, add the export PATH code to you ~/.bashrc (or ~/.profile) file:
echo export PATH='$PATH:~/xenoseq' >> ~/.bashrc
Metadata file
Usage of the pipeline requires a file listing all the "queries" and "subjects" in a text file. The pipeline will look for unique reads in the query files by comparing them to the corresponding subjects. These reads will be assembled into contigs. A subset of these contigs will be "xenotypic", i.e. having a foreign origin, by aligning the sequences to the remaining subjects, providing extra evidence for horizontal transfer of viral sequences or genes.
The metadata should look like this:
```
SAMPLES (Query, Reference)
Horizontal1 Ancestral1 Horizontal2 Ancestral2 Horizontal3 Ancestral3 Horizontal4 Ancestral4 Vertical1 Ancestral1 Vertical2 Ancestral2 Vertical3 Ancestral3 Vertical4 Ancestral4 ```
To run the pipeline (e.g. on the example data), use one of the following commands: ```
./xenoseq -m examplemetadata.tsv -o Xenoseqexample -l -t # Full pipeline ./xenoseq -m examplemetadata.tsv -o Xenoseqexample # Only identify unique sequences ./xenoseq -m examplemetadata.tsv -o Xenoseqexample -t # Identify and trace unique sequences ./xenoseq -m examplemetadata.tsv -o Xenoseqexample -l -f -P 99.99 # Force a relinking of unique-xenotypic with stricter percent identity ```
This example will use mock reads found in samples/reads and will search for xenotypic contigs in simulated data. Your unique/xenotypic contigs will then be
stored in Xenoseq_example/
The usage of the trace option (-t) will ensure all samples will be mapped back to the unique contigs and generate coverage statistics.
All options
```bash
Usage:
xenoseq -m
```
If you want to modify any more options (e.g. filtering thresholds, quality trimming), you can modify the relevant subroutines of xenoseq given in xenoseq_bin/functions.sh.
```
xenoseq -h
Xenoseq was tested on MAC Monterey 12.5.1 and Ubuntu 20.04.3 (also available on as Windows subsystem)
```