Releases | Open Source Science

funannotate - funannotate v1.8.17

Bug fix release.

fix FASTA splitter in iprscan local 83ceee5e4119b296aa6c2c33e796747d592dea54
pin docker and pip installs to biopython<1.80; so we try to avoid any of the other deprecated or breaking code
update to support EVM v1 or v2; back-port training data from funannotate2 (for testing)
run signalp6 if in PATH
fix pandas compatibility >v2
add config-window parameter to protein2genome script

- Python
Published by nextgenusfs over 2 years ago

funannotate - funannotate v1.8.15

Bug fix release

Trying hard to support augustus v3.5
fix sorting by length #896
update dbCAN Cazyme filtering #890
fixes for parsing eggnog mapper #868 #892
signalp6 fixes #822
and many others

- Python
Published by nextgenusfs about 3 years ago

funannotate - funannotate v1.8.13

Bug fix release

fix to internal BUSCO code to support Augustus v3.4 #742 @IanDMedeiros
several fixes to funannotate update #727 #740
several patches to Docker build

- Python
Published by nextgenusfs almost 4 years ago

funannotate - funannotate v1.8.11

Bug fix release

support for signalp v6 parsing #650 #716
improve protein2genome performance #652 #654
add --tmpdir options to most scripts (defaults to /tmp), ie use if have access to SSD to improve speed
upgrade eggnog-mapper to newer version, use in memory if enough is available to improve speed
fix version check #655
fix for abnormal contig identifiers #672
fix for EVM contig memory parsing #646
add -m,--mito-pass-thru to funannotate annotate to add back mitochondrial genome, mitochondrial contigs should be removed prior to running funannotate as uses different genetic code
antismash6 support #692
update download urls to https from ftp
fix for EVM if last line of GFF3 is blank #709
simplify runSuprocess functions #707 #710 @mglubber
move temporary files for check for soft masking to output folder #722
conda augustus is apparently broken, fix in docker#722 #724, so need to install augustus a different way (ie apt-get on Debian)
add missing help menu options #714
fix for funannotate update compare annotations function #727
add --ml_model to funannotate compare to control ML model selection in iqtree, default is to run modelfinder which is very slow.

- Python
Published by nextgenusfs almost 4 years ago

funannotate - funannotate v1.8.9

fix for EvidenceModeler partitioning -- on rare cases genes/contigs were not getting partitioned correctly
fix logic for PASA training of glimmer/snap #591
fix logic for pre-trained datasets in funannotate predict #597
re-organize tbl2asn and provide single threaded backup if failure #599 and fix log error #621
remove perl script generating AGP file --> now python. which removes bioperl as dependency #617
fix cds-transcripts coordinates (not used for any other processing) but on partial transcripts they were incorrect #614
fix bug in EVM filtering if no genes to remove #620
update eggnog parser to support versions >2.1.2 #566
fix tmpdir error in funannotate clean #594
python2 is no longer supported.

- Python
Published by nextgenusfs almost 5 years ago

funannotate - funannotate v1.8.7

Bug fix release

bug fix for eggnog mapper v2 parsing, occasionally some records have more than 1 OG in the "best_OG" field -- this is probably a bug in emapper.py, but funannotate will now ignore those entries as there is no way to tell which is actually the best OG.
make fasta2agp accept more IUPAC characters #532
fix for diamond >v2.0.8 (yet another change in the database info format resulted in error).
remove support for python 2

- Python
Published by nextgenusfs about 5 years ago

funannotate - funannotate v1.8.5

Bug fix release, recommended for all users.

use uuid unique identifier for tmp file names (previous uses process id)
fix diamond command in mapping proteins to genome #529
add some contig name checking for inputs to predict -- some users erroneously pass GFF3 files that reference transcripts or some other assembly, warn user #528
add support for antiSMASH v6 #531 #539
check genome for IUPAC errors #532
re-write long reads mapping to trinity transcripts #539
allow for fewer busco models for training #556
expose some EVM parameters to predict #558
modify EVM partitioning -- some gene models close to partition were not being called correctly #558.
support newly released eggnog-mapper >= v2.0.5 #566, note that v2.x - 2.0.4 will not be supported as the output is broken in the sense that it is impossible to parse the proper EggNog reference ID. If you are using any eggnog results from this code, strongly advised to re-run with new release.
add _pasa to PASA database names #398. Note -- this change could potentially cause pre-existing databases run with older funannotate code to not be recognized properly after v1.8.5.

- Python
Published by nextgenusfs about 5 years ago

funannotate - funannotate v1.8.3

Bug fix release

remove auto fix for RNA-seq data if it does not have Trinity compatible headers -- warn user to fix instead #485
fix antiSMASH v5 parsing #490, update secmet GBK format of clusters
add funannotate util gff-rename script
use natsort for sorting contigs/names so gene names are correctly auto incremented #501
fix for signalp v5 #494
support fasta headers with spaces for clean #504
rstrip(*) from proteins if GFF/FASTA passed; #508
expose --p2g_prefilter to use tblastn for mapping protein evidence #495
add --anysymbol to mafft calls #514
use frameshift diamond if >2.0.5 #503
improve glimmerhmm parsing error report #519
add --no-progress option to clean up terminal output (ie log files from cluster)
use sorted bedtools intersection to reduce memory usage #522
add --trnascan option to predict #523

- Python
Published by nextgenusfs over 5 years ago

funannotate - funannotate v1.8.1

highly recommend all users upgrade
Code now python 2/3 compatible (only 9 months late....)
EVM now uses interlap for slicing input files, significant speed increase for larger genomes
if --organism other is passed, codingquarry is now disabled by default -- you can turn it on by specifying a valid weight, ie --weights codingquarry:2
support for signalp version 5
move resources to JSON download via GitHub, will allow for updating/changing resources without re-installing a new version of funannotate
several fixes for more robust GFF3 parsing
fix for cazyme assignments where results were duplicated; also upgraded to database version 8
fix bug where PASA results not properly passed to EVM if funannotate predict was re-run
many other bug fixes

- Python
Published by nextgenusfs over 5 years ago

funannotate - funannotate v1.7.4

Okay, so apparently I lied about bug fixes before py3 release. Here are some quick ones.

Also -- some users have said that conda recipe fails to find a solution, with the help of @reslp this is perhaps caused by ete3 package as a dependency. Since ete3 is only used in funannotate for the dN/dS calculation in funannotate compare, I'm going to remove it from conda recipe as a dependency. Users that would like to continue using the dN/dS function in compare will need to manually install ete3 (ie conda could work on your system or pip, etc).

update the log file copy/remove in funannotate annotate which causes a problem in singularity container
encoding bug fixes for some PFAM results
fix for funannotate iprscan which was not correctly combining XML files with newer versions of Interproscan.

- Python
Published by nextgenusfs over 6 years ago

funannotate - funannotate v1.7.3

bug fix release. This will be the last release before updating the code for py3
bug fix for remote #379
improve GFF3 parsing in funannotate annotate allow for non-funannotate identifiers (hopefully this catches most of them)
--aligners option was missing from help menu in train/update
add --debug flag to funannotate iprscan -- there is potentially still an issue here combining results from the newest release of Interproscan -- I need some intermediate results to see what the problem is
fix GO annotations parsing for more versions of goatools #363
fix for bug in the "gene start or end in a gap" function
make sure funannotate annotate re-runs tbl file generation if script is re-run
fix in PASA function logic where PASA db not found if using --gff and --fasta input.

- Python
Published by nextgenusfs over 6 years ago

funannotate - funannotate v1.7.2

busco internal run now uses the local augustus config path, so hopefully write access to $AUGUSTUS_CONFIG_PATH is no longer required
long reads are now processed to remove forward slashes from names if present -- this was causing problems with PASA #326 #250
added augustus hints file generation when using RNA-seq data. Also added a check for --transcript_evidence if found in --transcript_alignments, ie so extra transcripts can be added instead of only those used to generate alignments in funannotate train. #360
fix typo in funannotate check
fix for proteinortho issue in funannotate compare #350

- Python
Published by nextgenusfs over 6 years ago

funannotate - funannotate v1.7.1

A few updates to support bioconda integration -- namely either $TRINITYHOME or $TRINITY_HOME can be used. Also don't trust trinity packaged trimmomatic, use the conda version.

- Python
Published by nextgenusfs over 6 years ago

funannotate - funannotate v1.7.0

Code has been repackaged to conform to a "proper" python package -- which means it now also resides on PyPi and a Bioconda package can be built. Along with the repackaging there are many improvements/fixes.

funannotate now keeps track of "trained species" for all of the ab-initio gene predictors (Augustus, gene mark (optional), snap, GlimmerHMM, codingquarry). This requires all users to update their database, ie funannotate setup command. After running funannotate predict the software will output a JSON file containing the paths to the trained parameter files -- this can be used again for a different genome via the funannotate predict --parameters options. This parameter file can also be added to the database with the funannotate species -s genus_species -a parameters.json command. Running the funannotate species command will output a table in the command line of which species have training data. Addressed #320
antiSMASH remote script fixed and parser updated for v5 output.
added filtering for gene models that start/end in a gap that can sometimes show up after running funannotate update
added a check for diamond version of the database and current copy -- this results in many hidden errors by users, ie diamond databases were created with an older/incompatible version than what is running currently.
updated Augustus functional check
removed RepeatModeler/RepeatMasker as strict dependencies. Due to RepBase change in usage license, repeatmasker/modeler are not available to most users. The funannotate mask command can still run this routine if you have the necessary dependencies installed, however, the current default is simply to run tantan masking. This is probably not sufficient for most genomes, thus happy to integrate a robust solution once one exists for repeat masking.
augustus parameter training now done in the local output folder, so no longer need write access to $AUGUSTUSCONFIGPATH

- Python
Published by nextgenusfs over 6 years ago

funannotate - funannotate v1.6.0

support for antiSMASH v5.0 output #292 #299
add snap and glimmerhmm request from #240. BUSCO is now run by default in funannotate predict if there is no PASA data -- BUSCO results used to train glimmerhmm and snap
improved Phobius results parsing #259
multi-threaded funannotate clean, thanks to @bogemad
fix database links #300
multi-threaded hisat2-build #303
write all output files directly from tbl format -- fix bug associated with multi-transcript parsing from GenBank files
bug fix for protein2genome exonerate mapping
bug fixes for funannotate train and funannotate update
added --min_coverage option to trinity workflow and set default to 5
bug fixes for codingquarry predictions (RNA-seq only)
improved error message for repeatmodeler/masker #298
bug fix for remote searches
bug fix for parsing input folders in annotate #302
updated conda install docs --> which seems futile to keep this current....

- Python
Published by nextgenusfs almost 7 years ago

funannotate - funannotate v1.5.3

updated the CAZYme dbCAN link
fix string formatting in GeneMark-ET function

- Python
Published by nextgenusfs about 7 years ago

funannotate - funannotate v1.5.2

restructure augustus accessory scripts calls so that they don't have to be in same location as the exe, i.e. this used to be a bug if you used conda installed augustus as it puts the scripts in a different location than the default augustus release folder
added a check to busco routine to default to a single thread if tblastn version is found that might have multithread issues
allow a min number of gene models to use for Augustus training, --min_training_models in funannotate predict
limit genemark to 64 cpus -- it will die if you try to give it more
fix shortBAM declaration in funannotate update
for genemark-ET set score of introns to 500, otherwise seems to be dying.
updated weights for different gene models -- would be nice to have this be a customizable option....

- Python
Published by nextgenusfs over 7 years ago

funannotate - funannotate v1.5.1

updated dbCAN links
important bug fix for RNA-seq analysis. the bam2gff3 function was not outputting the proper coordinates for crick stranded alignments for PASA, resulting valid minimap2 alignments being thrown out.
several other minor bug fixes

- Python
Published by nextgenusfs over 7 years ago

funannotate - funannotate v1.5.0

add funannotate test as a unit test script to validate installation #212
add CodingQuarry and StringTie integration into funannotate. Note these are "silent" dependencies, meaning if not installed this method will be skipped. If both tools are installed and RNA-seq data is used they will be run automatically #200
fix contig number reporting in funannotate clean #210
fix bug in a few of the accessory tools funannotate util
update database to MiBIG v1.4

- Python
Published by nextgenusfs almost 8 years ago

funannotate - funannotate v1.4.2

fix bug in train and update where script would die due to symlink error #189
for predict the --protein_alignments option now takes GFF3 input (not exonerate output). This is to make consistent with the --transcript_alignments. Scripts now write hints file for augustus from the GFF3 file.
similar to above, added funannotate util prot2genome which will run the diamond/exonerate mapping of proteins to the genome --> output is GFF3 file compatible with EVM. These data can then be passed to --protein_alignments

- Python
Published by nextgenusfs almost 8 years ago

funannotate - funannotate v1.4.1

bug fix for funannotate predict during parsing the soft masked genome -- for large genomes this was slow and used too much memory. it is now multithreaded and has lower memory footprint. #197
bug fix for ncRNA models are now listed as full length, should no longer cause NCBI errors #195
support multiple inputs to --other_gff #191
make augustus use --softmasking=1 option
default value for --soft_mask is now set to 2 kb 2000
output fasta files are now wrapped at 80 characters
tbl2asn is now multithreaded on large genomes or those with more than 10000 contigs
several updates to parsing of GenBank files to deal with unexpected formatting #196

- Python
Published by nextgenusfs almost 8 years ago

funannotate - funannotate v1.4.0

support for long-read RNA-seq data: funannotate train and funannotate update can take PacBio isoSeq (--pacbio_isoseq), Nanopore cDNA reads (--nanopore_cdna), and Nanopore direct mRNA (--nanopore_mrna).
fix for important bug in transcript alignments in funannotate predict -- bug in previous versions related to multi-exon crick alignments not getting correctly parsed into GFF3 alignments
soft masking is now decoupled from funannotate predict, this is now done with funannotate mask. Reason for this switch is to allow more flexibility in how the assembly is soft masked -- this can be done externally with another program. This change will allow users that don't have access to RepBase to use an alternative from RepeatMasker/RepeatModeler. One alternative is RED -- I wrote a wrapper for called RedMask
funannotate predict can now run without GeneMark being installed -- again to accommodate users that may be unable to use GeneMark due to licensing. Note you can pass gene predictions from any external program to --other_gff and they will be handed off to Evidence Modeler.
spaces in either strain or isolate name will be stripped #180
default program for funannotate clean changed to minimap2 #176
fix errors in partial gene models derived from using EVM script to generate proteins, this is now done internally using exact coordinates #184
added --soft_mask option to funannotate predict which will control the option with same name in GeneMark, i.e. default is --soft_mask 5000 which means that repeat regions less than 5 kb will be ignored for GeneMark prediction, those greater than 5 kb will be fed to Genemark. #185
bug fixes for tbl file generation. all tRNA models will be partial #184
improvement to how data from funannotate train is used in prediction steps
Slight changes for clarity to funannotate predict flags for evidence alignments: --protein_evidence Proteins to map to genome (prot1.fa prot2.fa uniprot.fa). Default: uniprot.fa --protein_alignments Pre-computed exonerate protein alignments (see docs for format) --transcript_evidence mRNA/ESTs to align to genome (trans1.fa ests.fa trinity.fa). Default: none --transcript_alignments Pre-computed transcript alignments in GFF3 format
added funannotate util bam2gff3 script to convert coordsorted RNA-seq BAM alignments to GFF3 compatible alignment file.
fix bug for input of files+weight in funannotate predict -- script would get hung up if you passed --other_gff snap_alignemnts.gff3:5 #191
allow for non-standard LocusTags - will now split on last underscore #191

- Python
Published by nextgenusfs almost 8 years ago

funannotate - funannotate v1.3.4

bug fixes for sec met cluster output files and corresponding MiBIG cluster mapping
add tRNAscan-SE to funannotate check and predict
update menu with some params that were missing

- Python
Published by nextgenusfs about 8 years ago

funannotate - funannotate v1.3.3

bug fix for funannotate compare where GO enrichment not being run in parallel from last update
use diamond blastp search for ortholog detection --> speed increased.
don't run seqclean if file present
update docker release to newest version of funannotate as well as newest version of Trinity, PASA

- Python
Published by nextgenusfs about 8 years ago

funannotate - funannotate v1.3.2

added several utility scripts --> accessible by funannotate util submenu. This includes funannotate util compare which will compare multiple annotations to a reference. ``` $ funannotate util

Usage: funannotate util version: 1.3.2

Commands: compare Compare annotations to reference (GFF3 or GBK annotations) tbl2gbk Convert TBL format to GenBank format gbk2parts Convert GBK file to individual components gff2proteins Convert GFF3 + FASTA files to protein FASTA gff2tbl Convert GFF3 format to NCBI annotation table (tbl) ``* bug fix forfunannotate remotemoving logfile * bug fix for mapping proteins to genome where tmp folder wasn't being properly removed * run GO enrichment in parallel infunannotate compare* update colors in some graphs fromfunannotate compareto 24-pack Crayola colors * add option to useiqtreeto draw ML phylogeny infunannotate compare* bug fix forfunannotate database` command where it was not displaying table correctly.

- Python
Published by nextgenusfs about 8 years ago

funannotate - funannotate v1.3.1

bug fix for funannotate setup added missing shutil library import

- Python
Published by nextgenusfs about 8 years ago

funannotate - funannotate v1.3.0

bug fix for weights being set for Augustus HiQ models in funannotate predict
bug fix for download_buscos function
bug fix for funannotate annotate where tbl file was occasionally not being parsed correctly --> re-write of parsing function
fix bug in antiSMASH/MiBIG parsing
add method to try to recover from failed GeneMark run
several bug fixes for funannotate update related to UTRs and multiple transcripts per locus.
added missing dependencies to funannotate check
updated code to work with PASA > v2.3 - this is important PASA update that allows SQLite usage instead of MySQL
improved terminal log output to tell user which files (with locations) are being re-used if they are found.

- Python
Published by nextgenusfs about 8 years ago

funannotate - funannotate v1.2.0

v1.2.0 now supports multiple transcripts per gene locus. The funannotate pipeline will only generate multiple transcripts per locus if given evidence in the form of RNA-seq data, this is done in the funannotate update command. It should also now support input with multiple transcripts as well.
move installation of busco models to funannotate setup
added annotation edit distance (AED) to funannotate update to record the changes in annotation. As well the PASA annotation update text file is changed to incorporate these changes as well
accessory script util/compare2annotations.py can compare multiple annotations in either GFF3 or GBK format to a reference, generating summary stats as well as individual gene stats (AED per mRNA and CDS)
added a --drop option to funannotate fix that you can remove unwanted gene model annotations, to use pass a file containing locus_tag (1 per line) to the --drop parameter
fix bug in finding high-quality Augustus predictions (HiQ) models in funannotate predict
funannotate predict will now detect if a training folder exists in output directory, if it does it will find the correct PASA, BAM, and Trinity output and use automatically during the prediction step.

- Python
Published by nextgenusfs over 8 years ago

funannotate - funannotate v1.1.1

fix for braker to work on docker. For some reason (I don't know why) the symlinks that braker tries to create cause an error when run on docker. The error references too many levels of symlinks essentially. To circumvent this, I modified braker.pl code to copy instead of symlink. Also fixed the braker.pl --version option which was broke in most recent release.
Note for a "normal" system, v1.1.0 should work fine. The updated braker code was run on both docker and Mac native and runs fine on those, hopefully also working well on linux.

- Python
Published by nextgenusfs over 8 years ago

funannotate - funannotate v1.1.0

bumping version to 1.1.0 to highlight that v1.0.X versions have a bug in the tbl annotation file and will not pass GenBank specs. This was derived from dropping GAG from funannotate I had the tbl spec wrong for adding transcriptid and proteinid to both CDS and mRNA features.
fixes for funannotate update and properly filtering overlapping genes
fix for funannotate annotate that was switching the 5' and 3' partial gene designations on crick orientated gene models, causing them to look correct after predict step and then become errors after annotate step
added Braker 2.0.3 to funannotate.... this was necessary as braker.pl --version doesn't display the version number so I can't enforce a version requirement. The larger issue has to do with how the different versions of braker save the output data, there are at least 3 different behaviors in the last 4 or 5 versions which makes impossible for funannotate to determine where output will be.

- Python
Published by nextgenusfs over 8 years ago

funannotate - funannotate v1.0.2

update to GFF to TBL parser to catch some "common" errors in GFF files
added funannotate iprscan which will run Docker InterProScan searches or also local searches. It will split the job into chunks and run those in parallel which seems to be a faster way to run InterProScan. By default it will chunk the proteins into 1000 protein bins and then run 4 cpus each up to as many cpus as you give the script.
fix to docker build (hopefully)
bug fixes for parsing the ncbi error report, properly outputting which genes are causing errors
fix for antiSMASH parsing of plantismash data

- Python
Published by nextgenusfs over 8 years ago

funannotate - funannotate v1.0.1

Wrote a new GFF to TBL parser to accommodate running funannotate annotate on a fasta + GFF file.
Added COGs output to funannotate compare, these annotations are parsed from eggnog-mapper data
several minor bug fixes

- Python
Published by nextgenusfs over 8 years ago

funannotate - funannotate v1.0.0

Major update to funannotate with new RNA-seq modules, new database download and management, new gene name/product definition module, many bug fixes.

RNA-seq modules: 1. funannotate train: Module will run RNA-seq mediated methods for training of GeneMark/Augustus in gene prediction. It will take single or PE RNA-seq FASTQ files, run Trimmomatic quality trimming, run Trinity-mediated read normalization, run Trinity genome-guided RNAseq assembly, run PASA alignment methods. Output is BAM file, trinity transcripts, and PASA GFF3 for use in funannotate predict. 2. funannotate update: Module will run PASA mediated gene model updates. It can be run after running train --> predict --> update, which will add UTR models and refine gene models. The script can also be run on a pre-existing GenBank assembly where it will run the funannotate train methods (quality trimming, normalization, Trinity, PASA) and then followed by the update specific methods to add UTRs, refine models, etc.

funannotate predict enhancements: 1. Dropped use of GAG to write NCBI tbl file and wrote functions to do this natively in funannotate --> which was making mistakes on some partial gene models. 2. Simplified NCBI tbl generation and gene model filtering --> only running tbl2asn a single time now as bad gene models are properly filtered (previously a regex search was not working perfectly resulting in some gene models being removed arbitrarily) 3. tRNA gene length filter is now in compliance with NCBI rules (you can safely ignore tbl2asn tRNA gene length warnings --> they will eventually update tbl2asn source code) 4. Numbers of gene models for each "source" are now printed to terminal prior to running Evidence Modeler. 5. Script parses the NCBI error reports and show user which gene models need to be manually fixed, after the tbl file is updated, the GBK output files can be regenerated with the new funannotate fix command.

funannotate annotate enhancements: 1. Diamond search has replaced Blast wherever possible, results in large increase in speed. 2. HMMer searches are now split across multiple CPUs, results in increase speed. 3. Gene names and product definitions are now parsed from UniProtKb/SwissProt results and EggNog-Mapper results. The product definitions are cross references to a community resource called gene2product which will serve as a database of curated gene product definitions. 4. Native NCBI tbl generation results in proper annotation of partial gene models. 5. Script will parse tbl2asn errors and alert user of gene models that need to be fixed.

New Database Management modules: 1. Environmental variable addition: FUNANNOTATE_DB allows user to install databases locally, i.e. in a users home directly on an HPC.
2. funannotate setup script has been re-written from scratch to control the databases, keep track of versions, and allow user to update database. 3. funannotate database is a new command that shows you currently installed databases. 4. Databases have been trimmed down, occupy ~ 4 GB of space.

I would recommend that all users upgrade. After upgrading, you will need to re-download the databases from scratch. As always, many bugs have been fixed and likely some new ones introduced. Please let me know if you encounter errors.

Docs/Manual/Tutorials will be available soon at http://funannotate.readthedocs.io

- Python
Published by nextgenusfs over 8 years ago

funannotate - funannotate v0.7.2

fix bug in funannotate compare, string conversion to int failed on a check for number of genes
added better error message for duplicate locus_tag ids in funannotate compare

- Python
Published by nextgenusfs almost 9 years ago

funannotate - funannotate v0.7.1

fix menu in funannotate annotate that still had --email as an option -> it is not longer an option, all remotes searches moved to funannotate remote
fix eggnog parsing issue where COG and Description are blank -> this happens if you run diamond search with eggnog-mapper. You should run HMM search with the appropriate EggNog database, i.e. for fungi that is the fuNOG database.

- Python
Published by nextgenusfs almost 9 years ago

funannotate - funannotate v0.7.0

Release v0.7.0 notes:

funannotate predict

unified genbank conversion method
added support for repeatmasker_species option
added support for strain flag for genbank conversion
improved filtering of problematic gene models

funannotate annotate

removed all remote searches from script (now funannotate remote see below)
dropped EggNog search, instead —eggnog option will parse the results from eggnog-mapper. Eggnog-mapper does a more comprehensive search and provides some more functional annotation information than the simple HMMer search of EggNog 4.5 database
now outputs a tsv annotation file into the annotate_results output folder
improved functional annotation for Gene and Product names
added support for strain flag for genbank conversion

funannotate compare

increased speed of parsing GBK files
remove EggNog description mapping
fix links to MEROPS database in html output

funannotate remote

new sub command that will run remote searches
currently support Phobius, antiSMASH, and InterProScan
Note: these searches are a free service, don't abuse them. If you can install these software locally it will significantly decrease your run time. They are included here as some are Linux only and/or setup is very difficult.

funannotate setup

Eggnog 4.5 database no longer required

- Python
Published by nextgenusfs almost 9 years ago

funannotate - funannotate v0.6.2

added support to funannotate predict for an --other_gff option that will pass annotation directly to EVM. You can control the weight for EVM, like this --other_gff my_predictions.gff3:10, which would give the gene models a weight of 10 in EVM
better support for --pasa_gff passed to funannotate predict where now input is not hardcoded to have transdecoder in column 2 of the gff file. You can also control the EVM weight like this: --pasa_gff my_pasa.gff:10 to give it a weight of 10
BRAKER1 method now pulls out high quality Augustus models (HiQ) that have >90% exon supported by evidence, these are given a weight of 5 in EVM
Added a few stats for repeat masking genome as well as number of transcripts mapped
updated funannotate so it is compatible with new version of GAG v2.01.

- Python
Published by nextgenusfs about 9 years ago

funannotate - funannotate v0.6.1

Numerous bug fixes
- Strip asterisks from protein fasta files to avoid problems with InterProScan
- logfiles folder was not being created if --genbank was passed to funannotate annotate
- Linux bug where last step of funannotate predict was terminating prematurely resulting in partial output files
Re-write of the InterProscan parsing scripts. Now script will parse IPR Domains and GO terms directly from XML file, instead of splitting XML file and then parsing 1 by 1.
Great update by John Longinotto on his pybam native BAM parser which is integrated into funannotate predict to quickly check BAM headers to make sure they match FASTA headers for input into Braker

- Python
Published by nextgenusfs about 9 years ago

funannotate - funannotate v0.6.0

fix tRNA gene model filtering to deal with the tbl2asn >150 error
improve XML parsing in funannotate compare
add diamond alternative for exonerate pre-filtering in funannotate predict
make funannotate docker compatible and create docker image
EggNog and BUSCO2 database are now not downloaded in the initial setup, but you can manage EggNog databases with funannotate eggnog . This was due to problems in building docker image downloaded the large databases. The scripts will download on the fly if default database is not available.
added some external dependency versions in funannotate check

- Python
Published by nextgenusfs about 9 years ago

funannotate - funannotate v0.5.7

bug fixes for logging
bug fix when multiple protein evidence files are passed
add phobius to funannotate annotate to predict secreted proteins in combination with signalp
add test data genome4.fasta that can be used to test the BUSCO2 augustus training method
added support for checking BAM reference sequence headers if they match the genome FASTA headers, this only happens if BAM file passed to --rna_bam

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.5.5

typo fixes for log file names
typo fix for fuNOG annotations in secondary metabolism module, this was now fixed to use the proper --eggnog_dboption
test for dN/dS ratio test to assert that the tree that was drawn by Phyml has the correct number of proteins
new feature for BUSCO models if --ploidy is greater than 1 in funannotate predict that duplicated BUSCO models are also parsed, the one that is picked has the highest score
Support for bypassing RepeatModeler/RepeatMasker, you can now enter --masked_genome and --repeatmasker_gff3 options to skip that step. Note that both options are required.

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.5.4

update to tblastn/exonerate protein mapping for better speed and more thorough searches
added --ploidy option to funannotate predict which controls the max number of hits for tblastn filter to pass to exonerate, which is set at 2 x ploidy. You should likely only increase this if your assembly is more than haploid - so perhaps newer assemblies with nanopore/pacbio may be able to resolve diploid chromosomes. It shouldn't have negative consequences in increase --ploidy, but will increase run time for protein mapping.
happy holidays...

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.5.3

re-organize output so that temporary folders are created in the "final" resting place and not in the current directory
modification of the multiprocessing function to include a simple progress percentage output
bug fixes in funannotate compare and the orthology dN/dS output hanging when the dN/dS calculation failed
modification to the logging to capture STDERR/STDOUT from many external tools into the log file, hopefully this will result in catching more errors than piping them to os.devnull
bug fix in funannotate annotate where output folders not being created if the input was GFF, proteins, and fasta.
made it a requirement to pass --species argument to funannotate annotate if you do not pass in a GenBank file. This is to prevent downstream problems in funannotate compare with how the scripts name the genome/isolates.
made a FAQ section in the docs that includes how to manually adjust gene models using the included tools
all internal tests passed, however guaranteed there are more bugs. please let me know when you find them.

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.5.2

update to dN/dS function to have two options: 1) --rundnds estimate (which runs the M0 model only), and 2) --rundnds full (which runs M0, M1, M2, M7, M8 and calculates the LTR of M1/M2 and M7/M8).
update to the multiprocessing progress function - a simple progress meter is used on most multiprocessing functions to let user know how many processes have finished
bug fix where transcripts and proteins were getting written to same file in funannotate predict
minor bug fix in funannotate clean where input number of scaffolds was not printed out correctly
change the default location of DB as per requested by some users, now defaults to $HOME/funannotate. Note you can set this to whatever directory you want.

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate 0.5.1

bug fixes for funannotate compare
bug fix for funannotate predict during gene model filtering of large genomes occasional parent:child features would get missed
added feature of calculating dN/dS ratios in funannotate compare

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.4.0

integration of BUSCO2 script and models. Can see the BUSCO2 distribution here, funannotate uses a slightly modified version to be compatible with the BUSCO->EVM workflow.
BUSCO2 models have changed a bit, there are now a lot more options for various taxonomic groups. Something to keep in mind though is that the model names for dikarya are not the same as pezizomycotina so if you use an --outgroup option be sure that the outgroup was generated with same BUSCO DB
The funannotate setup script will remove previous BUSCO DB models and download the new ones because of the extensive change in the BUSCO2 structure.
addition of funannotate outgroups to help you mange the outgroups available to funannotate compare
the scripts will download and format any of the available BUSCO2 eukaryote models, to see a list in a taxonomic tree format you can type funannotate outgroups --show_buscos

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.3.14

funannotate annotate will now support a single XML file for InterProScan5, you either pass a folder of single XML files 1 per protein, or a single XML file containing all of the annotations to the --iprscan option
fix in funannotate annotate that did alert user that --email is required if using remote IPR5 search; this is default setting

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.3.12

bug fix for path issue when running EVM; discovered on new install on Mac - not sure why it wasn't found earlier, but resulted in failed EVM run
some improved logging for EVM module

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.3.11

bug fix for funannotate compare where the genome stats was not printing for all genomes
goatools changed their headers on the output of the GO enrichment (again), so re-wrote how the data is parsed, hopefully this fix applies to all versions.

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.3.10

explicitly run rmblast/ncbi engine for RepeatMasker to avoid problems if user has default setup as something else, i.e. DFAM. Note you still need to install RepBase Libraries, e.g.

``` wget --user name --password pass http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20150807.tar.gz tar zxvf repeatmaskerlibraries-20150807.tar.gz -C #{HOMEBREW_PREFIX}/opt/repeatmasker/libexec

    cd #{HOMEBREW_PREFIX}/opt/repeatmasker/libexec
    ./configure <config.txt

```

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.3.9

bug fix to funannotate compare that was not pulling orthology groups for the final annotation table
added transcription factors to output of all annotation table.

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.3.8

bug fix in funannotate compare that was calculating MEROPS summary stats incorrectly
added --minlen option to funannotate sort to discard short contigs

- Python
Published by nextgenusfs over 9 years ago

funannotate - funannotate v0.3.7

move install to the funannotate wrapper, funannotate setup
fix bug with custom input folders in funannotate annotate
output proteins/transcript files for both funannotate predict and funannotate annotate

- Python
Published by nextgenusfs almost 10 years ago

funannotate - funannotate v0.3.6

bug fix in funannotate predict when using BUSCO the EVM input was pulling entire gene models instead of sliced models
bug fix where BUSCO models were within 100 bp of the start or end of contig resulting in a bedtools range slicing error
remove slicing of hints file for parallel AUGUSTUS method as splitting the hints file was a slow, faster to just pass the entire hints model to each contig chunk and let AUGUSTUS filter it.

- Python
Published by nextgenusfs almost 10 years ago

funannotate - funannotate v0.3.5

update to the way that funannotate predict parses maker2 results, now using maker models directly as opposed to pulling out annotation from each predictor.
bug fix if running funannotate compare with a single species

- Python
Published by nextgenusfs almost 10 years ago

funannotate - funannotate v0.3.4

fix to the braker1 method where augustus output was not properly found
minor update to --optimize_augustus training to align with method used in braker1

- Python
Published by nextgenusfs almost 10 years ago

funannotate - funannotate v0.3.3

fix issue with parallel augustus where very large scaffolds would cause large memory usage, script now chunks the data into 500 kb sections with 10 kb overlaps on each side, runs in parallel, and then combines the results.
re-ordered transcript evidence in funannotate predict to address providing hints to augustus
some minor bug fixes

- Python
Published by nextgenusfs almost 10 years ago

funannotate - funannotate v0.3.2

build a check for augustus version and test if it will function with busco and braker1
revamped busco mediated training of augustus to run busco quickly, filter evidence data corresponding to busco models, filter genemark-ES data, run evidence modeler to get high quality gene sets, filter EVM output with busco to build a final augustus training dataset, and finally train augustus
improved system info reporting
due to problems with installing augustus on different operating systems, augustus is not installed default via brew install funannotate. However, running funannotate predict without a version of augustus installed will give you some hints on how to install it for your system.

- Python
Published by nextgenusfs almost 10 years ago

funannotate - funannotate v0.3.1

important bug fix for augustus, previous versions were running augustus with the stop codon inside the prediction, which results in the gene models to fail validation in evidence modeler, thus this update is recommended for all users
added high quality augustus models to be pulled out of annotation if they are represented by evidence using the --hintsfile, these models are passed to EVM with additional weight
fixed genemark bug where a single contig resulted in an error, thus funanntoate predict can now handle a single contig as input correctly.

- Python
Published by nextgenusfs almost 10 years ago

funannotate - funannotate v0.3.0

improvements to the gene model filtering in funannotate predict, ability to keep gene models without proper stop codons if desired, --keep_no_stops
augustus is now multi-threaded
upgrade of packaged BUSCO to v1.2, slightly faster runtime and simplified code
non-fungal options are now included in funannotate, however use with non fungal genomes has not been extensively tested. Options of note are --organism, --busco_db, --eggnog_db
secondary metabolism enzymes added to funannotate compare if genomes were annotated with antiSMASH

- Python
Published by nextgenusfs almost 10 years ago