https://github.com/asereewit/vadr_hsv
HSV1 and HSV2 annotation using VADR
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
HSV1 and HSV2 annotation using VADR
Basic Info
- Host: GitHub
- Owner: asereewit
- Language: Batchfile
- Default Branch: main
- Size: 9.1 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Human alphaherpesvirus 1 (HSV1) and human alphaherpesvirus 2 (HSV2) genome annotation using VADR v1.5.1
HSV1 and HSV2 VADR models
Example annotation of HSV sequences
Additional VADR documentation
References
Human alphaherpesvirus 1 (HSV1) and human alphaherpesvirus 2 (HSV2) genome annotation using VADR v1.5.1:
Steps for using VADR for HSV1 and HSV2 annotation:
- Download and install the latest version of VADR (v1.5.1), following the instructions on this page.
VADR runs on a single processor by default and it can generate
a lot of intermediate files. The default maximum number of files a
single processor can open on Mac is 256 and sometimes VADR exceeds
that. To solve this issue, open the ~/.zshrc file in a text editor
and add this line to the .zshrc file ulimit -n 1024. Then run this
command source ~/.zshrc in the terminal.
Download the latest HSV1 (NC001806) and HSV2 (NC001798) VADR models from this github repository.
Run the following
v-annotate.plcommand (using HSV1 as an example):
v-annotate.pl --mdir <path-to-HSV1-VADR-model-directory> --mkey NC_001806.vadr -s --glsearch -r --alt_pass dupregin,discontn,indfstrn,indfstrp,lowsimis,lowsimil,indf5pst,indf3pst,lowsim5s,lowsim3s --r_lowsimok --alt_mnf_yes insertnp --nmiscftrthr 10 -f --keep <fasta-file-to-annotate> <output-directory-to-create>
HSV1 and HSV2 VADR models
The VADR model library for HSV1 was built based on NC_001806 and for HSV2 NC_001798.
The HSV1 VADR model was built using the following v-build.pl command:
v-build.pl --forcelong --skipbuild -f --keep NC_001806 NC_001806
The HSV2 VADR model was built using the following v-build.pl command:
v-build.pl --forcelong --skipbuild -f --keep NC_001798 NC_001798
Additional configuration on HSV VADR models
- The genes and CDS in the inverted repeat regions (LAT, RL1, RL2, RS1)
have
misc_not_failure:"1"in their FEATURE lines in the .minfo file. The HSV genomes are high in GC% (68% in HSV1 RefSeq NC001806 and 70% HSV2 RefSeq NC001798) and the number of repeat sequences, particularly in LAT, RL1, RL2, and RS1, that make the sequencing and genome assembly of HSV genomes difficult.misc_not_failure:"1"allows most alerts to annotate those genes and CDS as misc_feature instead of causing failure.
FEATURE NC_001806 type:"gene" coords:"7569..1:-" parent_idx_str:"GBNULL" gene:"LAT" misc_not_failure:"1"
FEATURE NC_001806 type:"gene" coords:"513..1540:+" parent_idx_str:"GBNULL" gene:"RL1" misc_not_failure:"1"
FEATURE NC_001806 type:"CDS" coords:"513..1259:+" parent_idx_str:"GBNULL" gene:"RL1" product:"neurovirulence protein ICP34.5" misc_not_failure:"1"
- Certain CDS have alternative forms/stop codons. To allow this in the VADR model:
- Add extra FEATURE lines for the gene and CDS in the .minfo file like the following:
FEATURE NC_001806 type:"gene" coords:"145198..144124:-" parent_idx_str:"GBNULL" gene:"US10" alternative_ftr_set:"virion protein US10(gene)" alternative_ftr_set_subn:"163"
FEATURE NC_001806 type:"gene" coords:"145198..144144:-" parent_idx_str:"GBNULL" gene:"US10" alternative_ftr_set:"virion protein US10(gene)" alternative_ftr_set_subn:"164"
FEATURE NC_001806 type:"CDS" coords:"145100..144162:-" parent_idx_str:"GBNULL" gene:"US10" product:"virion protein US10" alternative_ftr_set:"virion protein US10(cds)"
FEATURE NC_001806 type:"CDS" coords:"145100..144144:-" parent_idx_str:"GBNULL" gene:"US10" product:"virion protein US10" alternative_ftr_set:"virion protein US10(cds)"
The original CDS for virion protein US10 in HSV1 RefSeq NC_001806 is from genomic position 145100 to 144162 and the alternative is from 145100 to 144144. Note that the "gene" FEATURE line for the alternative CDS has the same coordinates as that of its CDS FEATURE line
The "163" in alternative_ftr_set_subn:"163" refers to the line number minus 1, i.e.
that particular FEATURE line is line 164 in the .minfo file.
- Add the alternative amino acid sequence to the .vadr.protein.fa blastx database using
this
build-add-to-blast.db.plscript from the VADR github repository:
perl build-add-to-blast-db.pl <path-to-minfo-file> <path-to-blast-db-dir> NC_001806 MG999843 144762..143806:- 145100..144144:- <name-for-output-directory>
In this case, NC_001806 is the VADR model accession. MG999843 is the sequence that
has the alternative CDS at 144762..143806:- corresponding to the 145100..144144:-
genomic positions in the VADR model accession.
After successfully running the build-add-to-blast.db.pl script, you should see this
in the .vadr.protein.fa file:
``` ...snip...
NC_001806.2/145100..144162:- MIKRRGNVEIRVYYESVRTLRSRSHLKPSDRQQSPGHRVFPGSPGFRDHPENLGNPEYRE LPETPGYRVTPGIHDNPGLPGSPGLPGSPGLPGSPGPHAPPANHVRLAGLYSPGKYAPLA SPDPFSPQHGAYARARVGIHTAVRVPPTGSPTHTHLRQDPGDEPTSDDSGLYPLDARALA HLVMLPADHRAFFRTVVEVSRMCAANVRDPPPPATGAMLGRHARLVHTQWLRANQETSPL WPWRTAAINFITTMAPRVQTHRHMHDLLMACAFWCCLTHASTCSYAGLYSTHCLHLFGAF
...snip...
MG999843.1:144762..143806:-/145100..144144:- MIKRRGNVEIRVYYESVRTLRSRSHLKPSDHQQSPGHRVFPGSPGFRDHPENLGNPEYRE LPETPGYRVTPGIHDSPGLPGSPGLHXSPGLPGSHGPHAPPANHVRLAGLYSPGKYAPLA SPDPFSPQDGAYARARVGIHTAVRVPPTGSPTHTHLRHDPGDEPTSDDSGLYPLDARALA HLVMLPADHRAFFRTVVDVSRMCAANVRDPPPPATGAMLGRHARLVHTQWLRANQETSPL WPWRTAAINFITTMAPRVQTHRHMHDLLMACAFWCCLTHASTCSYAGLYSTHCLHLFGAF GCGAPALTPPLCQDNLYP ```
Example VADR output
Below is an example of VADR output when annotating the sequence MG999860 (HSV1).
The VADR log output includes the options used in the v-annotate.pl command,
alert codes and their descriptions, and files saved to output directory.
```
v-annotate.pl :: classify and annotate sequences using a model library
VADR 1.5.1 (Feb 2023)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
date: Fri Sep 1 16:40:00 2023
$VADRBIOEASELDIR: /Users/greningerlab/Documents/vadr-install-dir/Bio-Easel
$VADRBLASTDIR: /Users/greningerlab/Documents/vadr-install-dir/ncbi-blast/bin
$VADREASELDIR: /Users/greningerlab/Documents/vadr-install-dir/infernal/binaries
$VADRFASTADIR: /Users/greningerlab/Documents/vadr-install-dir/fasta/bin
$VADRINFERNALDIR: /Users/greningerlab/Documents/vadr-install-dir/infernal/binaries
$VADRMODELDIR: /Users/greningerlab/Documents/vadr-install-dir/vadr-models-calici
$VADRSCRIPTSDIR: /Users/greningerlab/Documents/vadr-install-dir/vadr
sequence file: MG999860.fasta
output directory: ../vadroutput09012023/MG999860vadroutput_09012023
force directory overwrite: yes [-f]
leaving intermediate files on disk: yes [--keep]
specify that alert codes in do not cause FAILure: dupregin,discontn,indfstrp,lowsimis,lowsimil,indf5pst,indf3pst [--alt_pass]
alert codes in for 'miscnotfailure' features cause miscfeature-ization, not failure: insertnp [--altmnf_yes]
.cm, .minfo, blastn .fa files in $VADRMODELDIR start with key , not 'vadr': NC_001806.vadr [--mkey]
model files are in directory , not in $VADRMODELDIR: /Users/greningerlab/AS/vadrhsv/NC001806 [--mdir]
nmiscftr/TOOMANYMISCFEATURES reported if or more misc features: 10 [--nmiscftrthr]
align with glsearch from the FASTA package, not to a cm with cmalign: yes [--glsearch]
use top-scoring HSP from blastn to seed the alignment: yes [-s]
replace stretches of Ns with expected nts, where possible: yes [-r]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Validating input ... done. [ 0.1 seconds]
Preprocessing for N replacement: blastn classification (1 seq) ... done. [ 1.9 seconds]
Preprocessing for N replacement: coverage determination from blastn results (1 seq) ... done. [ 0.0 seconds]
Replacing Ns based on results of blastn-based pre-processing ... done. [ 0.0 seconds]
Classifying sequences with blastn (1 seq) ... done. [ 2.0 seconds]
Determining sequence coverage from blastn results (1 seq) ... done. [ 0.0 seconds]
Aligning sequences (NC_001806: 2 seqs) ... done. [ 144.7 seconds]
Joining alignments from glsearch and blastn for model NC_001806 (1 seq) ... done. [ 0.1 seconds]
Determining annotation ... done. [ 2.1 seconds]
Validating proteins with blastx (NC_001806: 1 seq) ... done. [ 26.1 seconds]
Generating feature table output ... done. [ 0.0 seconds]
Generating tabular output ... done. [ 0.0 seconds]
Summary of classified sequences:
num num num
idx model group subgroup seqs pass fail
--- --------- ----- -------- ---- ---- ----
1 NC_001806 - - 1 0 1
--- --------- ----- -------- ---- ---- ----
- all - - 1 0 1
- none - - 0 0 0 #--- --------- ----- -------- ---- ---- ---- # # Summary of reported alerts: # # alert causes short per num num long #idx code failure description type cases seqs description #--- -------- ------- --------------------------- -------- ----- ---- ----------- 1 dupregin no DUPLICATEREGIONS sequence 10 1 similarity to a model region occurs more than once 2 discontn no DISCONTINUOUSSIMILARITY sequence 1 1 not all hits are in the same order in the sequence and the homology model 3 lowsimis no LOWSIMILARITY sequence 2 1 internal region without significant similarity 4 ambgntrp no NRICHREGIONNOTREPLACED sequence 4 1 N-rich region of unexpected length not replaced 5 indf5pst no INDEFINITEANNOTATIONSTART feature 2 1 protein-based alignment does not extend close enough to nucleotide-based alignment 5' endpoint 6 indf3pst no INDEFINITEANNOTATIONEND feature 2 1 protein-based alignment does not extend close enough to nucleotide-based alignment 3' endpoint 7 indfstrp no INDEFINITESTRAND feature 1 1 strand mismatch between protein-based and nucleotide-based predictions 8 lowsimic no LOWFEATURESIMILARITY feature 1 1 region overlapping annotated feature that is or matches a CDS lacks significant similarity 9 lowsimil no LOWFEATURESIMILARITY feature 1 1 long region overlapping annotated feature that does not match a CDS lacks significant similarity 10 ambgnt3f no AMBIGUITYATFEATUREEND feature 2 1 final nucleotide of non-CDS feature is an ambiguous nucleotide #--- -------- ------- --------------------------- -------- ----- ---- ----------- 11 mutendex yes* MUTATIONATEND feature 1 1 expected stop codon could not be identified, first in-frame stop codon exists 3' of predicted stop position 12 unexleng yes* UNEXPECTEDLENGTH feature 3 1 length of complete coding (CDS or matpeptide) feature is not a multiple of 3 13 cdsstopn yes* CDSHASSTOPCODON feature 2 1 in-frame stop codon exists 5' of stop position predicted by homology to reference 14 fstukcft yes* POSSIBLEFRAMESHIFT feature 3 1 possible frameshift in CDS (frame not restored before end) 15 deletinp yes* DELETIONOFNT feature 1 1 too large of a deletion in protein-based alignment #--- -------- ------- --------------------------- -------- ----- ---- ----------- # # Output printed to screen saved in: MG999860vadroutput09012023.vadr.log # List of executed commands saved in: MG999860vadroutput09012023.vadr.cmd # List and description of all output files saved in: MG999860vadroutput09012023.vadr.filelist # copy of input fasta file saved in: MG999860vadroutput09012023.vadr.in.fa # copy of input fasta file with descriptions removed for blastn saved in: MG999860vadroutput09012023.vadr.blastn.fa # esl-seqstat -a output for input fasta file saved in: MG999860vadroutput09012023.vadr.seqstat # Stockholm file for >= 1 possible frameshifts for CDS.42.1 for model NC001806 saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.42.1.frameshift.stk # Stockholm file for >= 1 possible frameshifts for CDS.79.1 for model NC001806 saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.79.1.frameshift.stk # Stockholm file for >= 1 possible frameshifts for CDS.80.1 for model NC001806 saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.80.1.frameshift.stk # Stockholm file for >= 1 possible frameshifts for CDS.81.1 for model NC001806 saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.81.1.frameshift.stk # model NC001806 feature gene#1 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.1.fa # model NC001806 feature gene#2 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.2.fa # model NC001806 feature CDS#1 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.1.fa # model NC001806 feature gene#3 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.3.fa # model NC001806 feature CDS#2 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.2.fa # model NC001806 feature gene#4 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.4.fa # model NC001806 feature CDS#3 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.3.fa # model NC001806 feature gene#5 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.5.fa # model NC001806 feature CDS#4 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.4.fa # model NC001806 feature gene#6 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.6.fa # model NC001806 feature CDS#5 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.5.fa # model NC001806 feature gene#7 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.7.fa # model NC001806 feature gene#8 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.8.fa # model NC001806 feature CDS#6 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.6.fa # model NC001806 feature CDS#7 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.7.fa # model NC001806 feature gene#9 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.9.fa # model NC001806 feature CDS#8 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.8.fa # model NC001806 feature gene#10 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.10.fa # model NC001806 feature CDS#9 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.9.fa # model NC001806 feature gene#11 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.11.fa # model NC001806 feature gene#12 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.12.fa # model NC001806 feature CDS#10 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.10.fa # model NC001806 feature CDS#11 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.11.fa # model NC001806 feature gene#13 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.13.fa # model NC001806 feature CDS#12 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.12.fa # model NC001806 feature gene#14 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.14.fa # model NC001806 feature gene#15 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.15.fa # model NC001806 feature gene#16 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.16.fa # model NC001806 feature gene#17 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.17.fa # model NC001806 feature gene#18 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.18.fa # model NC001806 feature CDS#13 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.13.fa # model NC001806 feature CDS#14 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.14.fa # model NC001806 feature CDS#15 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.15.fa # model NC001806 feature CDS#16 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.16.fa # model NC001806 feature CDS#17 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.17.fa # model NC001806 feature gene#19 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.19.fa # model NC001806 feature gene#20 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.20.fa # model NC001806 feature CDS#18 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.18.fa # model NC001806 feature CDS#19 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.19.fa # model NC001806 feature gene#21 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.21.fa # model NC001806 feature gene#22 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.22.fa # model NC001806 feature gene#23 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.23.fa # model NC001806 feature CDS#20 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.20.fa # model NC001806 feature CDS#21 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.21.fa # model NC001806 feature CDS#22 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.22.fa # model NC001806 feature gene#24 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.24.fa # model NC001806 feature gene#25 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.25.fa # model NC001806 feature gene#26 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.26.fa # model NC001806 feature CDS#23 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.23.fa # model NC001806 feature CDS#24 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.24.fa # model NC001806 feature CDS#25 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.25.fa # model NC001806 feature gene#27 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.27.fa # model NC001806 feature CDS#26 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.26.fa # model NC001806 feature gene#28 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.28.fa # model NC001806 feature CDS#27 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.27.fa # model NC001806 feature gene#29 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.29.fa # model NC001806 feature CDS#28 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.28.fa # model NC001806 feature gene#30 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.30.fa # model NC001806 feature CDS#29 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.29.fa # model NC001806 feature gene#31 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.31.fa # model NC001806 feature CDS#30 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.30.fa # model NC001806 feature gene#32 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.32.fa # model NC001806 feature CDS#31 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.31.fa # model NC001806 feature gene#33 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.33.fa # model NC001806 feature CDS#32 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.32.fa # model NC001806 feature gene#34 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.34.fa # model NC001806 feature gene#35 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.35.fa # model NC001806 feature CDS#33 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.33.fa # model NC001806 feature CDS#34 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.34.fa # model NC001806 feature gene#36 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.36.fa # model NC001806 feature CDS#35 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.35.fa # model NC001806 feature gene#37 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.37.fa # model NC001806 feature CDS#36 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.36.fa # model NC001806 feature gene#38 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.38.fa # model NC001806 feature gene#39 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.39.fa # model NC001806 feature CDS#37 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.37.fa # model NC001806 feature CDS#38 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.38.fa # model NC001806 feature gene#40 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.40.fa # model NC001806 feature CDS#39 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.39.fa # model NC001806 feature gene#41 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.41.fa # model NC001806 feature CDS#40 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.40.fa # model NC001806 feature gene#42 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.42.fa # model NC001806 feature CDS#41 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.41.fa # model NC001806 feature gene#43 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.43.fa # model NC001806 feature CDS#42 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.42.fa # model NC001806 feature gene#44 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.44.fa # model NC001806 feature CDS#43 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.43.fa # model NC001806 feature gene#45 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.45.fa # model NC001806 feature CDS#44 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.44.fa # model NC001806 feature gene#46 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.46.fa # model NC001806 feature CDS#45 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.45.fa # model NC001806 feature gene#47 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.47.fa # model NC001806 feature CDS#46 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.46.fa # model NC001806 feature gene#48 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.48.fa # model NC001806 feature CDS#47 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.47.fa # model NC001806 feature gene#49 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.49.fa # model NC001806 feature gene#50 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.50.fa # model NC001806 feature CDS#48 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.48.fa # model NC001806 feature CDS#49 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.49.fa # model NC001806 feature gene#51 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.51.fa # model NC001806 feature gene#52 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.52.fa # model NC001806 feature CDS#50 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.50.fa # model NC001806 feature CDS#51 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.51.fa # model NC001806 feature gene#53 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.53.fa # model NC001806 feature CDS#52 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.52.fa # model NC001806 feature gene#54 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.54.fa # model NC001806 feature CDS#53 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.53.fa # model NC001806 feature gene#55 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.55.fa # model NC001806 feature gene#56 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.56.fa # model NC001806 feature CDS#54 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.54.fa # model NC001806 feature CDS#55 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.55.fa # model NC001806 feature gene#57 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.57.fa # model NC001806 feature CDS#56 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.56.fa # model NC001806 feature gene#58 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.58.fa # model NC001806 feature gene#59 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.59.fa # model NC001806 feature CDS#57 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.57.fa # model NC001806 feature CDS#58 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.58.fa # model NC001806 feature gene#60 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.60.fa # model NC001806 feature CDS#59 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.59.fa # model NC001806 feature gene#61 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.61.fa # model NC001806 feature CDS#60 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.60.fa # model NC001806 feature gene#62 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.62.fa # model NC001806 feature CDS#61 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.61.fa # model NC001806 feature gene#63 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.63.fa # model NC001806 feature CDS#62 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.62.fa # model NC001806 feature gene#64 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.64.fa # model NC001806 feature CDS#63 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.63.fa # model NC001806 feature gene#65 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.65.fa # model NC001806 feature CDS#64 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.64.fa # model NC001806 feature gene#66 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.66.fa # model NC001806 feature CDS#65 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.65.fa # model NC001806 feature gene#67 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.67.fa # model NC001806 feature gene#68 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.68.fa # model NC001806 feature CDS#66 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.66.fa # model NC001806 feature gene#69 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.69.fa # model NC001806 feature CDS#67 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.67.fa # model NC001806 feature gene#70 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.70.fa # model NC001806 feature CDS#68 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.68.fa # model NC001806 feature gene#71 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.71.fa # model NC001806 feature CDS#69 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.69.fa # model NC001806 feature gene#72 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.72.fa # model NC001806 feature CDS#70 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.70.fa # model NC001806 feature gene#73 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.73.fa # model NC001806 feature CDS#71 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.71.fa # model NC001806 feature gene#74 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.74.fa # model NC001806 feature CDS#72 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.72.fa # model NC001806 feature gene#75 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.75.fa # model NC001806 feature CDS#73 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.73.fa # model NC001806 feature gene#76 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.76.fa # model NC001806 feature CDS#74 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.74.fa # model NC001806 feature gene#77 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.77.fa # model NC001806 feature CDS#75 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.75.fa # model NC001806 feature gene#78 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.78.fa # model NC001806 feature CDS#76 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.76.fa # model NC001806 feature gene#79 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.79.fa # model NC001806 feature CDS#77 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.77.fa # model NC001806 feature gene#80 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.80.fa # model NC001806 feature CDS#78 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.78.fa # model NC001806 feature gene#81 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.81.fa # model NC001806 feature gene#82 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.82.fa # model NC001806 feature gene#83 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.83.fa # model NC001806 feature gene#84 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.84.fa # model NC001806 feature CDS#79 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.79.fa # model NC001806 feature CDS#80 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.80.fa # model NC001806 feature CDS#81 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.81.fa # model NC001806 feature CDS#82 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.82.fa # model NC001806 feature gene#85 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.gene.85.fa # model NC001806 feature CDS#83 predicted seqs saved in: MG999860vadroutput09012023.vadr.NC001806.CDS.83.fa # model NC001806 replaced sequence alignment (stockholm) saved in: MG999860vadroutput09012023.vadr.NC001806.align.rpstk # model NC001806 full original sequence alignment (stockholm) saved in: MG999860vadroutput09012023.vadr.NC001806.align.stk # model NC001806 full replaced sequence alignment (afa) saved in: MG999860vadroutput09012023.vadr.NC001806.align.rpafa # model NC001806 full original sequence alignment (afa) saved in: MG999860vadroutput09012023.vadr.NC001806.align.afa # 5 column feature table output for passing sequences saved in: MG999860vadroutput09012023.vadr.pass.tbl # 5 column feature table output for failing sequences saved in: MG999860vadroutput09012023.vadr.fail.tbl # list of passing sequences saved in: MG999860vadroutput09012023.vadr.pass.list # list of failing sequences saved in: MG999860vadroutput09012023.vadr.fail.list # list of alerts in the feature tables saved in: MG999860vadroutput09012023.vadr.alt.list # fasta file with passing sequences saved in: MG999860vadroutput09012023.vadr.pass.fa # fasta file with failing sequences saved in: MG999860vadroutput09012023.vadr.fail.fa # per-sequence tabular annotation summary file saved in: MG999860vadroutput09012023.vadr.sqa # per-sequence tabular classification summary file saved in: MG999860vadroutput09012023.vadr.sqc # per-feature tabular summary file saved in: MG999860vadroutput09012023.vadr.ftr # per-model-segment tabular summary file saved in: MG999860vadroutput09012023.vadr.sgm # per-model tabular summary file saved in: MG999860vadroutput09012023.vadr.mdl # per-alert tabular summary file saved in: MG999860vadroutput09012023.vadr.alt # alert count tabular summary file saved in: MG999860vadroutput09012023.vadr.alc # alignment doctoring tabular summary file saved in: MG999860vadroutput09012023.vadr.dcr # seed alignment summary file (-s) saved in: MG999860vadroutput09012023.vadr.sda # replaced stretches of Ns summary file (-r) saved in: MG999860vadroutput09012023.vadr.rpn # # All output files created in directory ./../vadroutput09012023/MG999860vadroutput_09012023/ # # Elapsed time: 00:02:57.07 # hh:mm:ss # [ok] ```
Detailed information on how to interpret VADR alerts are here.
Particularly useful files:\
.pass.fa: fasta sequence that passes annotation\
.pass.tbl: feature table format of annotation for sequence that passes annotation\
.fail.fa: fasta sequence that fails annotation\
.fail.tbl: feature table format of annotation for sequence that fails annotation (note that the end of the file has error messages)\
.alt.list: list of alert codes that cause annotation failure\
.alt: list of all alert codes\
.align.stk: stockholm alignment of input sequence to VADR model sequence\
Explanation of options used in example HSV1 annotation command
The options used in the example command are recommended, but other options might
be more appropriate depending on genome assembly methods and quality of the
genomes. A complete list of options for v-annotate.pl can be found here.
|option|explanation|
|------|-----------|
| --mdir <path-to-HSV1-VADR-model-directory> | specify that the model to use are in directory path-to-HSV1-VADR-model-directory |
| --mkey NC_001806.vadr | use the model files with prefix NC_001806.vadr in the directory from --mdir |
| -s | turn on the seed acceleration heuristic: use the max length ungapped region from blastn to seed the alignment, required for HSV annotation |
| --glsearch | use the glsearch program from the FASTA package for the alignment stage, required for HSV annotation |
| -r | turn on the replace-N strategy: replace stretches of Ns with expected nucleotides, where possible |
| --alt_pass dupregin,discontn,indfstrn,indfstrp,lowsimis,lowsimil,indf5pst,indf3pst | specify that these alerts, mostly due to inverted repeats and Ns in the sequence, do NOT cause annotation failure |
| --r_lowsimok | with -r, do not report lowsim{5,3,i}s within identified N-rich regions |
| --alt_mnf_yes insertnp | specify that features with misc_not_failure:"1" in .minfo file (LAT, RL1, RL2, RS1) also allow insertnp alert; misc_not_failure:"1" allows insertnn alert but not insertnp alert, which are raised together usually |
| --nmiscftrthr 10 | increase the number of miscfeature annotation allowed in a sequence to 10 due to the use of `miscnot_failure:"1"in LAT, RL1, RL2, RS2 |
|-f| overwrite output directory if it exists |
|--keep| keep all intermediate files, useful for debugging |
|| input fasta sequence to be annotate |
|
Additional VADR documentation
- VADR README
- VADR installation instructions
v-build.plexample usage and command-line optionsv-annotate.plexample usage, command-line options and alert information- Explanations and examples of
v-annotate.pldetailed alert and error messages- Output fields with detailed alert and error messages
- Explanation of sequence and model coordinate fields in
.altfiles toy50toy model used in examples of alert messages- Examples of different alert types and corresponding
.altoutput - Posterior probability annotation in VADR output Stockholm alignments
- VADR output file formats
Reference
- The recommended citation for using VADR is: Alejandro A Schäffer, Eneida L Hatcher, Linda Yankie, Lara Shonkwiler, J Rodney Brister, Ilene Karsch-Mizrachi, Eric P Nawrocki; VADR: validation and annotation of virus sequence submissions to GenBank. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3
- https://github.com/ncbi/vadr/wiki/Mpox-virus-annotation
Owner
- Login: asereewit
- Kind: user
- Repositories: 1
- Profile: https://github.com/asereewit