Visualizing genome synteny with xmatchview
Visualizing genome synteny with xmatchview - Published in JOSS (2018)
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 5 committers (20.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Keywords from Contributors
Scientific Fields
Repository
🗻 Visualization of genome/gene sequence synteny
Basic Info
Statistics
- Stars: 40
- Watchers: 3
- Forks: 7
- Open Issues: 0
- Releases: 10
Metadata Files
README.md

xmatchview
Genome synteny visualization
CONTENTS
- VERSIONS
- SYNOPSIS
- LICENSE PREAMBLE
- IMPLEMENTATION
- COMMUNITY GUIDELINES
- INSTALLATION
- DEPENDENCIES
- USAGE
- RUNNING THE PIPELINES
- TEST
- CITING
12. WHAT'S NEW
VERSIONS
xmatchview.py, xmatchview-hive.py, xmatchview-conifer.py v1.2.5 April/October 2020 xmatchview-hive.pl v1.2.2 February 2020 xmatchview.py v1.2.0 October 2019 xmatchview.py v1.1.1 December 2018 xmatchview.py v1.1 October 2018 xmatchview.py v1.0 January 2018 - Post JOSS review xmatchview.py v0.3.3 January 2018 xmatchview.py v0.3 November 2017 XMatchView.py v0.2 March 2005/May 2005/January 2006
SYNOPSIS
xmatchview, xmatchview-conifer and xmatchview-hive are imaging tools for visualizing DNA/RNA sequence synteny. It allows users to align 2 (or 3) sequences in FASTA format using cross_match, minimap2 or any aligners with .paf output capabilities, and displays the alignments in a variety of image formats (png, tiff). xmatchview-hive outputs xml-scalable vector graphics (svg)
xmatchview

xmatchview-conifer

xmatchview-hive

xmatchview, xmatchview-conifer and xmatchview-hive are written in python and run on linux and windows. They serve as visual tools for analyzing crossmatch and minimap2 alignments. Crossmatch (Green, P. (1994) http://www.phrap.org) uses an implementation of the Smith-Waterman algorithm for comparing DNA sequences that is sensitive.
Additional hive plot information available at: http://www.hiveplot.com/
LICENSE PREAMBLE
Copyright (c) 2005-present Rene Warren, Canada's Michael Smith Genome Science Centre. All rights reserved. xmatchview is a utility for comparing, visually, two DNA/RNA sequences
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
IMPLEMENTATION
xmatchview, xmatchview-conifer and xmatchview-hive are implemented in PYTHON and run on any OS where PYTHON is installed.
COMMUNITY GUIDELINES
I encourage the community to contribute to the development of this software, by providing suggestions for improving the code and/or directly contributing to the open source code for these tools. Users and developers may report software issues, bug fix requests, comments, etc, at https://github.com/warrenlr/xmatchview
INSTALLATION
Download the tar file and extract the files on your system using:
tar -xvf xmatchview_1-2-5.tar
To install this package with Conda, run one of the following:
conda install -c bioconda xmatchview conda install -c "bioconda/label/cf201901" xmatchview
DEPENDENCIES
xmatchview and xmatchview-conifer:
You will need to do the following before you can proceed:
1) Download python3 from: http://www.python.org/ (The code was tested on python 3.7+)
2) Download the Python Imaging Library (PIL) from: http://www.pythonware.com/products/pil/
3) Either use PIL or true type fonts (ttf) and specify the full path to the font directory in xmatchview.py and xmatchview-conifer.py with the -p option. The fonts used by xmatchview and xmatchview-conifer include:
arial.ttf
arialbd.ttf
arialbi.ttf
-or-
helvR14.pil
helvR18.pil
helvB18.pil
helvBO18.pil
helvB24.pil
helvR24.pil
helvB24.pil
If fontpaths are not provided (not recommended), default fonts will be used to preserve code functionality. However, these systems fonts are very small and of limited utility. To download the above fonts, simply search the internet for "arial.ttf", "arialbd.ttf" and "arialbi.ttf". For convenience, they are included in the "fonts.tar" file in this directory. The "helv*.pil" fonts are distributed with PIL.
4) Download cross_match for academic use, see http://www.phrap.org and http://www.phrap.org/consed/consed.html#howToGet
5) Make sure cross_match is in your $PATH
USAGE
Usage: ['./xmatchview.py'] v1.2.5 -x alignment file (cross_match .rep or Pairwise mApping Format .paf) -s reference genome fasta file -q query contig/genome fasta file -e reference features (eg. exons) coordinates, GFF tsv file - optional -y query features (eg. exons) coordinates, GFF tsv file - optional -m mismatch threshold (e.g. -m 10 allows representation of repeats having up to 10% mismatch -b length (bp) of similarity block to display -c scale (pixel to basepair scale, for displaying the image) -r leap (bp) to evaluate repeat frequency (smaller numbers will increase the resolution, but will affect drastically the run time. recommended -r=50) -a alpha value, from 0 (transparent) to 255 (solid, default) -f output image file format (png, tiff, jpeg, or gif) NOTE: the png and tiff are better. -p full path to the directory with fonts on your system (please refer to the documentation for fonts used) * Files for the -s and -q options must correspond to fasta files used to run cross_match Usage: ['./xmatchview-conifer.py'] v1.2.5 -x alignment file (cross_match .rep or Pairwise mApping Format .paf) -s reference genome fasta file -q query contig/genome fasta file -e reference features (eg. exons) coordinates, GFF tsv file - optional -y query features (eg. exons) coordinates, GFF tsv file - optional -m maximum mismatch threshold (e.g. -m 10 allows representation of repeats having up to 10% mismatch -b minimum length (bp) of similarity block to display -c scale (pixel to basepair scale, for displaying the image) -l label for the tree trunk (6 characters or less for best result) -a alpha value, from 0 (transparent) to 255 (solid, default) -f output image file format (png, tiff, jpeg, or gif) NOTE: the png and tiff are better. -p full path to the directory with fonts on your system (please refer to the documentation for fonts used) * Files for the -s and -q options must correspond to fasta files used to run cross_match Usage: ['./xmatchview-hive.py'] v1.2.5 -x alignment file [1 vs. 2] (cross_match .rep or Pairwise mApping Format .paf) -y alignment file [1 vs. 3] (cross_match .rep or Pairwise mApping Format .paf) -z alignment file [3 vs. 2] (cross_match .rep or Pairwise mApping Format .paf) -q genome text file 1 (format NAME:LENGTH) -r genome text file 2 (format NAME:LENGTH) -s genome text file 3 (format NAME:LENGTH) -d features (eg. exons) coordinates GFF tsv file 1 (start end) - optional -e features (eg. exons) coordinates GFF tsv file 2 (start end) - optional -f features (eg. exons) coordinates GFF tsv file 3 (start end) - optional -i sequence identity threshold (e.g. -i 90 will show colinear blocks >= 90% sequence identity) -b minimum length (bp) of similarity block to display -c scale (pixel to basepair scale, for displaying the image) -a alpha value, from 0 (transparent) to 1 (solid, default) * Files for the -q, -r and -s options must include header_names:base_length, with names that correspond to those in fasta files used to run cross_match or minimap2
! Ensure the config.txt file exists in your run directory
Notes:
xmatchview-hive; you must create a config.txt file with the following information: axis_number:name:length(bp)
1:name1 2:name2 3:name3Keep names short (< 10 characters)
An example config.txt is provided for your convenience in the "test-hive" folder (See TEST section below) Axis labels are shown in the diagram below. Also, when running cross_match/minimap2, align in this order:
1 vs. 2 1 vs. 3 3 vs. 21 | / \ 3 2
The text files supplied in -q -r -s are simply: sequence name:base length
The sequence name must correspond to those in the fasta files used to run cross_match or minimap2
example perl-oneliner to convert:
cat genome1.fa | perl -ne 'chomp;if(/^>(\S+)/){ $head = $1;if($prev ne $head && $prev ne ""){print "$prev:$seq\n";$seq="";} $prev=$head;}else{ $seq += length($_); }if(eof()){ print "$prev:$seq\n"; } ' > genome1.txt
run as : -q genome1.txt
xmatchview and xmatchview-conifer;
Files for the -s and -q options must correspond to fasta files used to run cross_match or minimap2. Those files should each contain a single, non-justified, sequence (no line breaks).
fasta file format:
yourSequence AATAGCAGCTACGACGACGCAGCGCGACGTTTCATCAA...AATACAGACGCGACGACGCAGCATCATCGAGAC
- A 10th column may be added to the GFF files supplied via -e/-y to specify the color of a feature (default feature color is yellow or black, for xmatchview and xmatchview-conifer, respectively). Users may specify any of these color names: yellow, blue, cyan, green, lime, red, sarin, forest, dirtyred, dirtyyellow, grey, lightgrey, orange, beige, black, white. Examples of .gff files are provided in the accompanied ./test folder
Users can control whether to show the position of sequence features on the reference and query (-e and -y options), show co-linear blocks of a certain length (-b option) when their mismatch rates are below a threshold (-m option). The histogram is generated by moving a sliding window with a step length (-r recommended between 10-50). The color space in xmatchview is RGBA and the alpha channel is used for visualizing the relationship between co-linear blocks (-a option, transparent to solid, 0 to 255).
RUNNING THE cross_match/xmatchview/xmatchview-conifer PIPELINES
Demo shell scripts that pipeline cross_match and xmatchview/xmatchview-conifer are included with the distribution and provided for guidance (runCompareTwoGenomesColinear.sh and runSpruceView.sh, respectively).
Refer to:
./runCompareTwoGenomesColinear.shUsage: runCompareTwoGenomesColinear.sh QUERY FASTA REFERENCE/SUBJECT/TARGET FASTA ALPHA TRANSPARENCY 0-255 MISMATCH THRESHOLD BLOCK LENGTH (bp) LEAP LENGTH (bp) SCALE (1:n) QUERY features GFF .tsv REFERENCE features GFF .tsv cross_match/minimap2 PATH-TO-FONTS
and:
./runSpruceView.sh
Usage: runSpruceView.sh QUERY FASTA REFERENCE/SUBJECT/TARGET FASTA ALPHA TRANSPARENCY 0-255 MISMATCH THRESHOLD BLOCK LENGTH (bp) LABEL SCALE (1:n) QUERY features GFF .tsv REFERENCE features GFF .tsv cross_match/minimap2 PATH-TO-FONTS
Examples on how to run: ./runCompareTwoGenomesColinear.sh FTL1pa.fa FTL1ss.fa 200 99 100 1 2 FTL1pa.gff FTL1ss.gff crossmatch ../../tarballs/fonts ./runCompareTwoGenomesColinear.sh FTL1pa.fa FTL1ss.fa 200 99 100 1 2 FTL1pa.gff FTL1_ss.gff minimap2 ../../tarballs/fonts
./runSpruceView.sh FTL1pa.fa FTL1ss.fa 200 99 100 FTL1-test 2 FTL1pa.gff FTL1ss.gff crossmatch ../../tarballs/fonts ./runSpruceView.sh FTL1pa.fa FTL1ss.fa 200 99 100 FTL1-test 2 FTL1pa.gff FTL1_ss.gff minimap2 ../../tarballs/fonts
TEST xmatchview.py / xmatchview-conifer.py / xmatchview-hive.py
To test your xmatchview install on your system, we provide a test folder where you can run xmatchview and xmatchview-conifer, using the shell script commands below. If all goes well, and you used the arial fonts provided, the image you generate should be identical to:
./test/xmv-FTL1pa.favsFTL1ss.fa.repm10b10r1c2success.png ./test/xmvconifer-FTL1pa.favsFTL1ss.fa.repm10b10c2_success.png
Once you confirmed that it works as expected, you may explore the full range of parameters to test functionality (see USAGE above)
At the unix prompt, once the package is installed, change directory to:
cd ./testOnce you have downloaded pyhon and PIL and changed the paths to fonts in the xmatchview.py and xmatchview-conifer.py Execute:
./runXMV.sh FTL1pa.favsFTL1ss.fa.rep FTL1pa.fa FTL1ss.fa 200 10 2 FTL1pa.gff FTL1ss.gff and ./runXMV-conifer.sh FTL1pa.favsFTL1ss.fa.rep FTL1pa.fa FTL1ss.fa 200 10 2 FTL1pa.gff FTL1ss.gff
To test xmatchview-hive:
cd ./test-hivecross_match (.rep output)
../xmatchview-hive.py -q 2019-nCoV.txt -r SARS-CoV.txt -s MERS-CoV.txt -x 2019-nCoV.favsSARS-CoV.fa.rep -y 2019-nCoV.favsMERS-CoV.fa.rep -z MERS-CoV.favsSARS-CoV.fa.rep -e SARScds.gff -i 0 -b 1 -c 30 -a 0.75
If all went well, images such as those provided in the test folder should be generated
CITING xmatchview/xmatchview-conifer/xmatchview-hive
Thank you for your and for using, developing and promoting this free software!
If you use xmatchview/xmatchview-conifer/xmatchview-hive for you research, please cite:
Warren, RL (2018). Visualizing genome synteny with xmatchview. Journal of Open Source Software, 3(21):497.
Hive plots
Krzywinski M, Birol I, Jones S, Marra M (2011). Hive Plots — Rational Approach to Visualizing Networks. Briefings in Bioinformatics (early access 9 December 2011, doi: 10.1093/bib/bbr069)
WHAT'S NEW in v1.2.5
-Adjustments to suggested scaling to factor in border around genomic features
WHAT'S NEW in v1.2.4
-Included error handling when scaling isn't sufficient to keep genomic features within plot bounds
WHAT'S NEW in v1.2.3
-Code refactored for python3
WHAT'S NEW in v1.2.2
-Initial support for 3-way comparisons (hive plot)
WHAT'S NEW in v1.2.0
-Bug fixes -Aesthetic improvements to both xmatchview and xmatchview-conifer -Support for GFF files -Support for Multi-FASTA files
WHAT'S NEW in v1.1.1
-Bug fixes (will now return an error when the alignment files are empty [instead of plotting empty graphs])
WHAT'S NEW in v1.1
-Bug fixes (the forward synteny blocks were always printed, regardless of specified filters. This is fixed in both xmatchview and xmatchview-conifer v1.1)
WHAT'S NEW in v1.0
-Published (JOSS peer-review) version
WHAT'S NEW in v0.3.3
-Initial support for .paf (Pairwise mApping Format) alignment files.
WHAT'S NEW in v0.3.2
-Made options consistent between xmatchview.py and xmatchview-conifer.py -Included new option to specify font path (-p) -Added option to specify feature colors in the tsv files provided with the -e and -y options. A third column may be used to specify the color of a feature (default feature color is yellow or black, for xmatchview and xmatchview-conifer, respectively). Users may specify any of these color names: yellow, blue, cyan, green, lime, red, sarin, forest, dirtyred, dirtyyellow, grey, lightgrey, orange, beige, black, white.
WHAT'S NEW in v0.3
-Plot colinear blocks and sequence relationships with transparent color (alpha, supplied with -a) -Plot the position of exons on the reference and query DNA segments (-e and -y arguments, optional) -Plot the position of Ns in query and reference sequences -Bug fixes
Please find the v0.2 release in the corresponding subdirectory
Owner
- Name: BC Cancer Canada's Michael Smith Genome Sciences Centre
- Login: bcgsc
- Kind: organization
- Email: rwarren@bcgsc.ca
- Location: Vancouver, BC, Canada
- Website: http://www.bcgsc.ca/
- Repositories: 86
- Profile: https://github.com/bcgsc
Sequencing centre for genomics and bioinformatics research. NOTE 01/2019: FTP now closed. Cited data available: http://www.bcgsc.ca/downloads/supplementary
GitHub Events
Total
- Watch event: 3
- Member event: 3
Last Year
- Watch event: 3
- Member event: 3
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Rene Warren | r****n@b****a | 138 |
| Arfon Smith | a****n | 4 |
| lcoombe | l****e@g****m | 3 |
| Keiran Raine | k****2@s****k | 3 |
| Asan Emirsaleh | a****i@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 5
- Total pull requests: 10
- Average time to close issues: 3 months
- Average time to close pull requests: about 17 hours
- Total issue authors: 3
- Total pull request authors: 5
- Average comments per issue: 4.6
- Average comments per pull request: 1.2
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- keiranmraine (3)
- asan-emirsaleh (1)
- brook-milligan (1)
Pull Request Authors
- arfon (3)
- lcoombe (3)
- keiranmraine (2)
- zyxue (1)
- asan-emirsaleh (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/stale v4 composite
- ubuntu 20.04 build