mikado - v2.3.4

Fix #426: Allow codon usage in Mikado prepare (#433) Fix #431: Pin marshmallow dependency versions (#433)

Full Changelog: https://github.com/EI-CoreBioinformatics/mikado/compare/v.2.3.3...v2.3.4

- Python
Published by gemygk about 4 years ago

Bug fix release

Fixes an issue when all inputs are tagged reference and padding is activated.

Fixes #422

What's Changed

HSP uniqueness could fail in rare cases by @ljyanesm in https://github.com/EI-CoreBioinformatics/mikado/pull/423

Full Changelog: https://github.com/EI-CoreBioinformatics/mikado/compare/v2.3.2...v.2.3.3

- Python
Published by ljyanesm over 4 years ago

mikado - Version 2.3.2

Fix issue processing empty blast tsv files.

- Python
Published by ljyanesm almost 5 years ago

mikado - Version 2.3.1

This release improves the performance of Mikado for dense loci when processing long read data. Also, makes Mikado compatible with SQLAlchemy 1.4.0.

Bugs fixed: - #415 - #413 - #412

- Python
Published by ljyanesm almost 5 years ago

Version 2.3.0

Fix #404: Error in extractpromoterregions.py helper script

Fix #405: Error in remove_utrs.py helper script

Also fixes a name clashing issue in daijin's configuration object and other small bugs

- Python
Published by ljyanesm about 5 years ago

mikado - Version 2.2.5

Fix for class2 in daijin pipeline

Fix #401: Cannot run Class2 in Daijin pipeline

- Python
Published by ljyanesm about 5 years ago

mikado -

Fix the Github Action tests, moving to use only Python 3.7+ (due to AsyncIO), and updated the documentation.

Fix #385: clarified and corrected the tutorial for Daijin. Fix #389: Mikado now can theoretically output BED12 files from Mikado prepare. This is still Fix #395: corrected an issue in mikado prepare that left models with incorrect proteins in the input annotations. This caused mikado pick to crash during the padding procedure. Fix #396: corrected and clarified the errors related to missing configuration files (e.g. incorrect scoring files being provided). Fix #397: corrected a small bug in mikado prepare, when providing the input files through the CLI rather than through the configuration file.

- Python
Published by lucventurini about 5 years ago

mikado - v2.2.3: OSX

Testing Mikado also on OSX, and adding OSX to the supported OSes in Conda.

Fix #392: Mikado was having trouble with pipes in the sequence IDs (either present in the first place or added by NCBI+ when using makeblastdb -parse_seqids).

- Python
Published by lucventurini about 5 years ago

mikado - Version 2.2.2

Using TMPDIR by default when creating/reading sqlite databases, this is faster than running directly on NFS Added an option to use /dev/shm for mikado serialise and compare

- Python
Published by ljyanesm over 5 years ago

mikado - SQLA patch

Pinning sqlalchemy to <1.4.0 until sqlalchemy_utils is updated (see #388).

Moreover, solved a small bug in prepare: setting prepare.exclude_redundant to True in the configuration file had no effect. Now it is equivalent to use the -er switch on the CLI (h/t @swarbred).

- Python
Published by lucventurini over 5 years ago

mikado - Version v2.2.0

Version 2.2.0

Removed Cython from the requirements.txt file. This allows to perform the tests correctly in a Conda environment (as Conda disallows installing Cython as part of a distributed package). As a result of this change, the preferred installation procedure from source has to be slightly amended: - either install using pip wheel -w dist . && pip install dist/Mikado*whl - or install with python setup.py bdist_wheel after having forcibly installed Cython, with pip install Cython or the like.

Other changes: - Fix #381: now Mikado will be able to guess correctly the input file format, instead of relying on the file name extension or user's settings. Sniffing for files provided as a stream is disabled though. - Fix #382: now Mikado can accept generic BED12 files as input junctions, not just Portcullis junctions. This allows e.g. a user to provide a set of gene models in BED12 format as sources of valid junctions. - Fix #384: now Mikado convert deals properly with unsorted GTFs/GFFs. - Fix #386: dealing better with unsorted GFFs/GTFs for the stats utility. - Fix #387: now Mikado will always use a static seed, rather than generating a new one per call unless specifically instructed to do so. The old behaviour can still be replicated by either setting the seed parameter to null (ie None) in the configuration file, or by specifying --random-seed during the command invocation. - General increase in code unit-test coverage; in particular:
- Slightly increased the unit-test coverage for the locus classes, e.g. properly covering the as_dict and load_dict methods. Minor bugfixes related to the introduction of these unit-tests. - Mikado.parsers.to_gff has been renamed to Mikado.parsers.parser_factory. - The code related to the transcript padding has been moved to the submodule Mikado.transcripts.pad, rather than being part of the Mikado.loci.locus submodule. - Mikado will error informatively if the scoring configuration file is malformed.

- Python
Published by lucventurini over 5 years ago

mikado - Patch release

Hotfix release: - IMPORTANT Mikado now uses correctly the scores associated to a given source. - IMPORTANT Mikado was not forwarding the original source to transcripts derived by chimera splitting. This compounded the issue above. - Corrected the issue that caused the issues above, ie transcripts where not dumping and reloading all relevant fields. Now implemented properly and tested with specific new routines. - Corrected an issue that caused Mikado to erroneously calculate twice the metrics and scores of loci, therefore reporting some wrong ones in the output files. - affected metrics where e.g. selected_cds_intron_fraction and combined_cds_intron_fraction. - Removed quicksect from the requirements.

- Python
Published by lucventurini over 5 years ago

mikado -

Bugfix and speed improvement release.

Fix a bug that prevented Mikado from reporting the correct metrics/scores in the output of loci files. This bug only affected reporting, not the results themselves. See issue 376
Fix a bug in printing out the statistics for an annotation file with mikado util stats (issue 378)
When doing serialising, Mikado now by default will drop and reload everything. The previous default behaviour results in hard-to-parse errors and is not what is usually desired anyway.
Improved the performance of pick in multiple ways (issue 375):
- now only external metrics that are requested in the scoring file will be printed out in the final metrics files. This reduces runtime in e.g. Minos. The new CLI switch --report-all-external-metrics (both in configure and pick) can be used to revert to the old behaviour.
- the external table in the Mikado database now is indexed properly, increasing speed.
- batch and compress the results before sending them through a queue (@ljyanesm)
- @brentp enhanced the bcbio intervaltree.pyx into quicksect. Copied this new version of interval tree and adapted it to Mikado.
- Using sqlalchemy bakeries for the SQLite queries, as well as LRU caches in various parts of Mikado.
- Removed excessive copying in multiple parts of the program, especially regarding the configuration objects and during padding.
- Using operator.attrgetter instead of a custom (and slower) recursive getattr function.
Removed unsafe calls to tempfile.mktemp and the like, for increased security according to CodeQL.

- Python
Published by lucventurini over 5 years ago

mikado - 2.0.2

Bugfix release.

Fix infinite recursion bug when trying to recover lost transcripts
Fix performance regression by passing the configuration to Excluded locus objects.

- Python
Published by lucventurini over 5 years ago

mikado - Marshmallow mate

Fixed a bug that caused Mikado configure (but not daijin configure, or "mikado configure --daijin") to print out invalid configuration files.
Restored the functionality of "--full" - now Mikado can print out both partial (but still valid) or fully-fledged configuration files.
Ported also the scoring configuration to MarshMallow dataclass. As a direct results, removed from the dependencies jsonschema.
Configured bumpversion
Corrected a small bug in parsing EnsEMBL GFF3
Cured some deprecation warning messages from marshmallow and numpy
Small bug fix in the CLIs of mikado/daijin configure.
Default value of the seed is now 0 (ie: undefined, a random one will be selected). Only integers are allowed values.
Small bugfixes/extensions in the test suite.
Minor code reorganisation, without changes to the API.

- Python
Published by lucventurini over 5 years ago

mikado - Mikado version 2

Official second release of Mikado. All users are advised to update as soon as possible.

See https://github.com/EI-CoreBioinformatics/mikado/milestone/22?closed=1 for a non-comprehensive list of all the issues closed in relation to this release.

- Python
Published by lucventurini over 5 years ago

mikado - Mikado 2, public release candidate 2

Minor amendments to 2.0rc1 - in order to get Mikado to install properly in BioConda.

- Python
Published by lucventurini about 6 years ago

mikado - Mikado 2, public release candidate 1

This version of Mikado is finally ready to go into Conda, DockerHub, PyPI and Singularity Hub. Many thanks to @ljyanesm, thanks to whom Mikado has become much more performant.

Most notable changes: - Mikado serialise will now accept tabular BLAST files (with the extra columns ppos and btop). Both XML and TSV loading have parts written in Cython. Thank you to @srividya22 for first asking about improvements in this sense. #280 - Mikado prepare now will remove redundancies based on intron chains, not perfect to-the-base identity. This should massively reduce the input data. The redundancy filter can be controlled per-source: ie, Mikado is able to keep all transcripts from certain input files (reference annotations, ab initio predictions, transcript assemblies, etc) while removing any redundant transcript from others (long-read alignments). Thanks to @lijing28101. #270 - Mikado prepare now will try to split transcripts with very long introns, rather than outright discard them. - Mikado pick will now operate in stringent mode by default (ie: only split transcripts when there is strong evidence of them being chimeras, as per the BLAST data). - Mikado now uses TOML as default configuration language, as it is much more human-readable than either YAML or JSON (#239). - Various bugfixes.

- Python
Published by lucventurini about 6 years ago

mikado - Version 2.0, release candidate 6

#216: now mikado prepare will explicitly tell users to use the mikado_prepared.fasta for the serialise step. Moreover, mikado serialise will informatively crash if users try to do something different (a common mistake seems to be to use a FASTA file derived directly from the input assemblies).
#220: Fixed a bug in mikado serialise
#222: now daijin will make prodigal or TransDecoder use alternative genetic codes, upon request. IMPORTANT: TransDecoder does not support all of the known genetic codes listed by NCBI.
#223: fixed the start-adjustment method in the ORF module.
#226: mikado compare, mikado util stats and mikado util grep are now compatible with non-standard NCBI GFF3 files (having e.g. pseudogene features without any associated transcript but associated exons, or rRNA transcript features without any parent gene)
#227: now mikado compare will always consider valid transcripts, even if they are multiexonic yet missing a defined strand orientation.
#229:
- mikado pick will now:
- report the padding as INFO, not as WARNING
- report on finishing the analysis of a chromosome, not the parsing
- report the temporary analysis directory
- provide --max-intron-length as a command line option
- fixed a small bug in mikado serialise
- fixed a bug in the ORF module that caused a crash when the sequence was not completely uppercase
#230: fixed some bugs related to the daijin conda environments and to updates to the snakemake code upstream.
Fix a small bug in reference_gene.py and transcript.py, related to sys.intern
#232: typo in the help for mikado serialise.

- Python
Published by lucventurini over 6 years ago

mikado - Version 2.0, release candidate 5

Switched from ujson to rapidjson (actively maintained and as performant)
Fix #209: daijin has been debugged and it is now properly tested. Also, when using daijin mikado, the number of XMLs will be equal or greater than the number of requested threads.
#177: mikado serialise is now completely parallelised. This allows for very significant speed-ups, especially when loading a large number of ORFs.
Speedups for mikado pick: now the GTF will be parsed much more quickly, by avoiding to create a full GTFline object for each line during the parsing (which was extra-slow).
daijin can now optionally use conda environments, using the conda directive of snakemake.
Speedup in mikado pick: now everything is written to databases (#218). This allows for cleaner temporary directories and parsing of the partial outputs.
mikado pick now will not, by default, print out the subloci file.
Speed up in mikado pick: now using a lightweight graph also for the splicing.
Amend #134 - now the minimum CDS overlap is 50%, not 75%.
Fixed a bug for mikado compare in multiprocessing mode
Fixed a bug in mikado configure - the scoring file will not be embedded within the printed file (otherwise it will be impossible to change it dynamically).

- Python
Published by lucventurini over 6 years ago

mikado - Version 2.0, release candidate 4

Users are advised to update as soon as possible. This release fixes a bug that had removed chimera splitting capabilities from Mikado since version 1.2.

In this RC: - solved an extremely serious bug which caused Mikado not to perform the chimera splitting during pick. The behaviour is now properly tested to avoid regressions. - Removed serious bottlenecks in the creation of splicing graphs - now the algorithm is less than quadratic. This should make Mikado more amenable to denser inputs. - #206: now mikado serialise will crash informatively when trying to add transcript ORFs from transcripts that are not present in the mikado_prepared.fasta file. This should prevent a common user error. - Solved #207: improved performance of Mikado - Mikado prepare will correctly keep the CDS of transcripts - Mikado pick will not overload the ORF (or coding/non-coding status) of a transcript if it is marked as reference - redundant class codes (=, _ and n) are now valid splice codes for the alternative splicing stage in pick. This is to allow mikado pick to include e.g. the transcripts from which an ab initio prediction was derived.

General improvements - Now the superlocus class has been revamped a bit: - made the definition of transcript graphs in superloci a O(nlogn) rather than O(n*2) algorithm. - removed the third method to reduce complex loci - rewrote for speed the method one to reduce complex loci - now both reduction methods will consider whether a transcript is reference - The definition of alternative splicing events has also been moved into a O(nlogn) algorithm. - Mikado pick was not leveraging correctly the multiple processors. This was due to the fact that the main process was taking up the job of checking transcripts and creating loci - expensive operations that acted as bottlenecks. Now the main process will only collate transcripts as GTF rows, do a minimal check on the fact that they do not have introns longer that the maximum size, and then and only then dispatch them.

- Python
Published by lucventurini almost 7 years ago

mikado - Version 2.0, release candidate 3

Fixed #203 and #205.

- Python
Published by lucventurini almost 7 years ago

mikado - Version 2.0, release candidate 2

Solved #196, #197 and #198. Inching towards 2.0.

- Python
Published by lucventurini almost 7 years ago

mikado - Version 2.0, release candidate 1

Same as the previous pre-release, differences: - decided to switch to v2 rather than 1.5, due to too many incompatibilities with version 1 from over a year ago - Fixed #194

=====

Please see the CHANGELOG file for details.

Major notes: - this release fixes a bug (#139) whereupon cDNAs completely or partially in letters different from ATGCNn (eg. lowercase, ie soft-masked nucleotides) would not have been reversed-complemented correctly. Therefore, any run on soft-masked genomes with prior releases would be invalid. - this release changes the format of the mikado database. As such, old mikado databases have to be regenerated with Mikado serialise in order for the run not to fail. - this release has completely overhauled the scoring files. We now provide only two ("plant.yaml" and "mammalian.yaml"). "Plant.yaml" should function also for insect or fungal species, but we have not tested it extensively. Old scoring files can be found under "HISTORIC". - this release completes the "padding" functionality. Briefly, if instructed to do so, now Mikado will be able to uniform the ends of transcripts within a single locus (similar to what was done for the last Arabidopsis thaliana annotation release). The behaviour is controlled by the "pad" boolean switch, and by the "tsmaxsplices" and "tsdistance" parameters under "pick". Please note that now "tsdistance" refers to the transcriptomic distance, ie, long introns are not considered for this purpose. Moreover, padding is now enabled by default. - general improvements in speed and multiprocessing, as well as flexibility, for the Mikado compare utility.

With this release, we are also officially dropping support for Python 3.4. Python 3.5 will not be automatically tested for, as many Conda dependencies are not up-to-date, complicating the TRAVIS setup.

- Python
Published by lucventurini almost 7 years ago

mikado - Version 1.5, release candidate

Please see the CHANGELOG file for details.

Major notes: - this release fixes a bug (#139) whereupon cDNAs completely or partially in letters different from ATGCNn (eg. lowercase, ie soft-masked nucleotides) would not have been reversed-complemented correctly. Therefore, any run on soft-masked genomes with prior releases would be invalid. - this release changes the format of the mikado database. As such, old mikado databases have to be regenerated with Mikado serialise in order for the run not to fail. - this release has completely overhauled the scoring files. We now provide only two ("plant.yaml" and "mammalian.yaml"). "Plant.yaml" should function also for insect or fungal species, but we have not tested it extensively. Old scoring files can be found under "HISTORIC". - this release completes the "padding" functionality. Briefly, if instructed to do so, now Mikado will be able to uniform the ends of transcripts within a single locus (similar to what was done for the last Arabidopsis thaliana annotation release). The behaviour is controlled by the "pad" boolean switch, and by the "tsmaxsplices" and "tsdistance" parameters under "pick". Please note that now "tsdistance" refers to the transcriptomic distance, ie, long introns are not considered for this purpose. Moreover, padding is now enabled by default. - general improvements in speed and multiprocessing, as well as flexibility, for the Mikado compare utility.

With this release, we are also officially dropping support for Python 3.4. Python 3.5 will not be automatically tested for, as many Conda dependencies are not up-to-date, complicating the TRAVIS setup.

- Python
Published by lucventurini almost 7 years ago

mikado -

[Beta, the finalised version will be released soon]

One of the major highlights of this release is the completion of the "padding" functionality. Briefly, if instructed to do so, now Mikado will be able to uniform the ends of transcripts within a single locus (similar to what was done for the last Arabidopsis thaliana annotation release). The behaviour is controlled by the "pad" boolean switch, and by the "tsmaxsplices" and "ts_distance" parameters under "pick".

Bugfixes and improvements:

Fixed a bug which caused some loci to crash at the last part of the picking stage
Now coding and non-coding transcripts will be in different loci.
Mikado prepare now can accept models that lack any exon features but still have valid CDS/UTR features
Fixed #34: now Mikado can specify a valid codon table among those provided by NCBI through BioPython. The default is "0", ie the Standard table but with only the canonical "ATG" being accepted as valid start codon.
Fixed #123: now addtranscriptto_feature.gtf automatically splits chimeric transcripts and corrects mistakes related the intron size.
Fixed #126: now reversing the strand of a model will cause its CDS to be stripped.
Fixed #127: previously, Mikado prepare only considered cDNA coordinates when determining the redundancy of two models. In some edge cases, two models could be identical but have a different ORF called. Now Mikado will also consider the CDS before deciding whether to discard a model as redundant.
#129: Mikado is now capable of correctly padding the transcripts so to uniform their ends in a single locus. This will also have the effect of trying to enlarge the ORF of a transcript if it is truncated to begin with.
#130: it is now possible to specify a different metric inside the "filter" section of scoring.
#131: in rare instances, Mikado could have missed loci if they were lost between the sublocus and monosublocus stages. Now Mikado implements a basic backtracking recursive algorithm that should ensure no locus is missed.
#132: Mikado will now evaluate the CDS of transcripts during Mikado prepare.

- Python
Published by lucventurini over 7 years ago

mikado - BED12 galore

Enhancement release. Following version 1.2.3, now Mikado can accept BED12 files as input for convert, compare and stats (see #122). This is becoming necessary as many long-reads alignment tools are preferentially outputting (or can be easily converted to) this format.

- Python
Published by lucventurini almost 8 years ago

mikado - BugFix and BED12

Mainly this is a bug fix release. It has a key advancement though, as now Mikado can accept BED12 files as input assemblies. This makes it compatible with Minimap2 PAF > BED12 system.

- Python
Published by lucventurini almost 8 years ago

mikado - BugFix for 1.2

Minor bugfixes: - Now Daijin should handle correctly the lack of DRMAA - Now Dajin should treat correctly single-end short reads

- Python
Published by lucventurini about 8 years ago

mikado -

Highlights for this version:

The version of the algorithm for retained introns introduced in 1.1 was too stringent compared to previous versions. The code has been updated so that the new version of Mikado will produce results comparable to those of versions 1 and earlier. ALL MIKADO USERS ARE ADVISED TO UPDATE THE SOFTWARE.
Daijin now supports Scallop.
Now Mikado will print out also the alias in the final picking tables, to simplify lookup of final Mikado models with their original assembly (previously, the table for the .loci only contained the Mikado ID).
Various changes on the BED12 internal representation. Now Mikado can also convert a genomic BED12 into a transcriptomic BED12.
Updated the documentation, including a tutorial on how to create scoring files, and how to adapt Daijin to different user cases.
Now finalised transcripts will always contain a dictionary containing the phases of the various CDS exons.
Mikado prepare now will always reverse the strand for mixed-splicing events.
Added unit-tests to keep in check the regression in calling retained introns, and for the new BED12 features.
Minor bugfixes.

- Python
Published by lucventurini about 8 years ago

mikado -

Skip to 1.2.1, see Changelog there

- Python
Published by lucventurini about 8 years ago

mikado - PyPI 1.1.1

Added LICENSE.txt to the setup.py.

- Python
Published by lucventurini over 8 years ago

mikado - PyPI 1.0.2

Updated the release to contain the LICENSE.txt file.

- Python
Published by lucventurini over 8 years ago

mikado - Prodigal

Highlights for this release are the swithing by default to Prodigal in lieu of TransDecoder and to DIAMOND instead of NCBI BLASTX. The rationale behind the change is that the two former default programs scale poorly with the size of datasets, as neither was designed to maintain a good execution speed with potentially million sequences. Prodigal and DIAMOND fare much better with big datasets, and do speed up significantly the execution of the whole Daijin pipeline.

Changes in this release:

Mikado is now compatible with NetworkX v. 2x.
Mikado now accepts ORFs calculated by Prodigal, in GFF3 format, instead of only those by TransDecoder in BED format.
Mikado compare indices now are SQLite3 databases, not compressed JSON files as in previous versions. This should allows for a faster loading and potentially, down the line, the chance to parallelise compare.
By default, Daijin now uses Prodigal and DIAMOND instead of TransDecoder and BLAST. This should lead to massive speed-ups during the pipeline, although at the cost of slightly reduced accuracy.
Improved the algorithm for finding retained introns, using a graph structure instead of potentially checking against every other transcript in the locus.
Mikado configure now has a new flag, "--daijin", which instructs the program to create a Daijin-compatible configuration file, rather than a Mikado-only one.
Fixed some bugs in Daijin regarding the handling of Long Reads.
Fixed a bug in Daijin regarding the calculation of Trinity parameters - previously, Daijin could potentially ask Trinity for parameters for N times, where N is the number of required assemblies, lengthening the startup time.
Solved a bug that created incompatibility with BioPython >= 1.69
Solved some bugs that prevented Daijin from functioning correctly on a local computer
Now Daijin by default recovers the information to load external software from an external configuration file. This allows for having a standard configuration file for external programs, without having to modify the main configuration all the time.
Now Daijin creates and/or load a standard configuration file for the cluster, "daijin_hpc.yaml".

- Python
Published by lucventurini over 8 years ago

mikado - BugFix

BugFix release.

Fixed a bug which caused Mikado to go out of memory with very dense loci, when calculating the AS events.
Fixed a bug which caused the log not to be saved correctly during the indexing for Mikado compare.
Fixed a bug which caused Mikado pick to crash at the end, on rare cases.
Data to be transmitted to picking process children is now stored in a temporary SQLITE3 database, to lessen memory usage and queue hangups.
Fixed a bug while that caused a crash while Mikado was trying to approximate complex loci.
Switched away from using clique-based algorithms, as they tend to be very memory intensive.

- Python
Published by lucventurini about 9 years ago

mikado - Version 1

Version 1.0

Changes in this release:

MAJOR: solved a bug which caused a failure of clustering into loci in rare occasions. Through the graph clustering, now Mikado is guaranteed to group monoloci correctly.
MAJOR: When looking for fragments, now Mikado will consider transcripts without a strand as being on the opposite strand of neighbouring transcripts. This prevents many monoexonic, non-coding fragments from being retained in the final output.
MAJOR: now Mikado serialise also stores the frame information of transcripts. Hits on the opposite strand will be ignored. This requires to regenerate all Mikado databases.
MAJOR: Added the final configuration files used for the article.
Added three new metrics, "blasttargetcoverage", "blastquerycoverage", "blast_identity"
Changed the default repertoire of valid AS events to J, j, G, h (removed C and g).
Bug fix: now Mikado will consider the cDNA/CDS overlap also for monoexonic transcripts, even when the "simpleoverlapformonoexonicloci" flag is set to true.
Solved some issues with the Daijin schemas, which prevented correct referencing.
Bug fix for finding retained introns - Mikado was not accounting for cases where an exon started within an intron and crossed multiple subsequent junctions.
BF: Loci will never purge transcripts
After creating the final loci, now Mikado will check for, and remove, any AS event transcript which would cross into the AS event.

- Python
Published by lucventurini over 9 years ago

mikado - IWGSC1

Changes in this release: - MAJOR: re-written the clustering algorithm for the MonosublocusHolder stage. Now a holder will accept another monosublocus if: - the cDNA and CDS overlap is over a user-specified threshold OR OR - there is some intronic overlap OR - one intron of either transcript is completely contained within an exon of the other. OR - at least one of the transcripts is monoexonic and there is some overlap of any kind. This behaviour (which was the default until this release) can be switched off through pick/clustering/simpleoverlapformonoexonic (default true). - MAJOR: changed slightly the anatomy of the configuration files. Now "pick" has two new subsections, "clustering" and "fragments". - Clustering: dedicated to how to cluster the transcripts in the different steps. Currently it contains the keys: - "flank" - "mincdnaoverlap" and "mincdsoverlap" (for the second clustering during the monosublocusHolder phase) - "cdsonly": to indicate whether we should only consider the CDS for clustering after the initial merging in the Superlocus. - "simpleoverlapformonoexonic": to switch on/off the old default behaviour with monoexonic transcripts - "purge": whether to completely exclude failed loci, previously under "runoptions" - Fragments: dedicated to how to identify and treat putative fragments. Currently it contains the keys: - "remove": whether to exclude fragments, previously under "runoptions" - "validclasscodes": which class codes constitute a fragment match. Only class codes in the "Intronic", "Overlap" (inclusive of _) and "Fragment" categories are allowed. - maxdistance: for non-overlapping fragments (ie p and P), maximum distance from the gene. - Solved a long-standing bug which caused Mikado compare to consider as fusion also hits. - Mikado compare now also provides the location of the matches in TMAP and REFMAP files. - Introduced a new utility, "classcodes", to print out the information of the class codes. The definition of class codes is now contained in a subpackage of "scales". - The "metrics" utility now allows for interactive querying based on category or metric name. - The class code repertoire for putative fragments has been expanded, and made configurable through the "fragments" section. - When printing out putative fragments, now Mikado will indicate the class code of the fragment, the match against which it was deemed a fragment of, and the distance of said fragment (if they are not overlapping). - Deprecated the "discarddefinition" flag in Mikado serialise. Now Mikado will infer on its own whether to use the definition or the ID for serialising BLAST results. - Now AbstractLocus implementations have a private method to check the correctness of the jsonconf. As a corollary, Transcript and children have been moved to their own subpackage ("transcripts") in order to break the circular dependency Mikado.loci.Abstractlocus <- Mikado.configurator <- Mikado.loci.Transcript. _Technical note: checking the consinstency of the configuration is an expensive operation, so it will be executed on demand rather than automatically. - The methods to calculate scores and metrics have been moved to the AbstractLocus class, so to minimize the incidence of bugs due to code duplication and diversion. - Made the checks for the scoring files more robust. - Re-written the "findretainedintrons" method of AbstractLocus, to solve some bugs found during the utilisation of last beta. As a corollary, expanded the intervaltree module to allow searches for "tagged" intervals. - Now the "monolociout" files contain the MonosublocusHolder step, not the Monosublocus step. This should help during fine-tuning. - Minimal requirements for alternative splicing events are now specified with a syntax analogous to that of minimal requirements, and that for not considering a locus as a putative fragment, under the tag "asrequirements". - Fixed a bug which caused transcript requirements to be ignored if pick/clustering/purge was set to False. - Mikado now supports also Python3.6.

- Python
Published by lucventurini over 9 years ago

mikado - External scores

Changes in this release: - Mikado prepare, stats and compare are capable of using GFFs with "match/matchpart" or "cDNAmatch" features as input. This allows eg. to obtain sensible statistics for the alignments of Trinity transfrags or long reads when using GMAP inside Daijin. - When the option "sublocifromcdsonly" is set to true (or the equivalent command line switch is invoked), AS events class codes will be calculated on the coding portion only of the transcript. - Mikado now allows to perform unittest on the installed module, through the function "Mikado.test". This is simply a wrapper to NoseTest, through a call to the numpy tester.
- IMPORTANT: in strand-specific mode, now Mikado prepare will not flip transcripts which have been misassigned to the opposite strand. Those transcripts will be kept on the original strand, and tagged. - IMPORTANT: now Mikado will use all the verified introns found in a Superlocus to determine the fraction of verified introns per locus in each transcript. At the stage of Locus, ie creation of genes, this will revert to check only the verified introns in the locus. Also, solved a bug related to this metric. - IMPORTANT: at the stage of the creation of monosubloci-holders, now Mikado groups together also transcripts for which at least one intron is completely contained within the exon of another. This should solve spurious cases where we called multiple loci instead of one, while probably avoiding trans-splicing. - IMPORTANT: otionally now Mikado can perform the second clustering of transcripts based on simple overlap (either of the CDS or of all exonic features). The option can be invoked from the command line. - Two new metrics: - "suspicioussplicing" will allow to identify transcripts with mixedsplices, or which would have at least one canonical intron if moved to the opposite strand. - "onlynoncanonicalsplicing" will allow to identify transcripts whose splicing sites are all non-canonical. - It is now possible to give Mikado a tab-delimited file of pre-calculated metrics (which must be numeric), during serialise. The file should have the transcript ids in the first column and have a header as first line; this header must have "TID" as first field, and no repeated fields afterwards. External metrics can be specified in the scoring configuration using the syntax "external.{name of the score}". If an inexistent metric is asked for, Mikado will assign a default value of 0 to it. - It is now possible to use metrics with values between 0 and 1, inclusive directly as scoring, by specifying the parameter "useraw: True". This is available only for metrics which have been tagged as being "usable raw", or with externally provided metrics. The option is valid only when looking for the maximum or minimum value for a metric, not when looking for a target. If an incorrect configuration is specified, Mikado will crash. - Minimal requirements for alternative splicing events are now specified with a syntax analogous to that of minimal requirements, and that for not considering a locus as a putative fragment, under the tag "asrequirements". - Mikado prepare in "lenient" mode will keep also transcripts with a mixture of strands for the splicing junctions. - Mikado prepare can be asked to keep all transcripts, even if they are redundant. The new behaviour (disabled by default) is switched on by the boolean parameter "prepare/keepredundant". - Mikado pick can consider transcripts with CDS ending within a CDS intron as truncated due to a retained intron event. This potentially allows Mikado to detect retained introns even when only CDSs are provided. The behaviour is disabled by default, and can be switched on using the boolean configuration parameter "pick/runoptions/considertruncatedfor_retained". - Some bugs have been detected and solved thanks to the collaboration with Hugo Darras. - Many tests, including system ones, have been added to the test suite. Tests have been fixed for Python3.4 as well.

- Python
Published by lucventurini over 9 years ago

mikado - BugFix for Beta 7 (2)

Mainly fixes for Daijin, in order to be able to handle different versions of Trinity, GMAP and PortCullis.

- Python
Published by lucventurini over 9 years ago

mikado - BugFix for Beta 7

BugFix release: - Corrected a bug identified by Hugo Darras, whereby Mikado crashed when asked not to report alternative splicing events - Mikado compare now supports compressing the outputs - Bug fix for Mikado util stats - now it functions also when exonic features contain "derived_from" tags - Bug fix for bam2gtf.py

- Python
Published by lucventurini over 9 years ago

mikado - Storing the stacks

Changes: - Now we use ujson and SQLite to store out-of-memory the data loaded during the "prepare" step, massively reducing the memory needed for this step. - TransDecoder is now optional in the Daijin pipeline - Mikado can be instructed to pad transcripts at their ends, to make all transcripts at a locus compatible in terms of their ends (eventually extended the CDS, if possible). The protocol was inspired by the AtRTD2 release: http://biorxiv.org/content/early/2016/05/06/051938

- Python
Published by lucventurini over 9 years ago

mikado - Hope of final

Beta 6, Hopefully final release for the article. Highlights: - Compatibility with DIAMOND - Essential bugfix for handling correctly verified introns - Updated scoring files

- Python
Published by lucventurini over 9 years ago

mikado - Beta 5

First public release of Mikado.

Changelog: - Added a flank option in Daijin - Documentation ready, inclusive of a tutorial for Daijin itself - Bug fixes, especially to Daijin

- Python
Published by lucventurini almost 10 years ago

mikado - Beta 4

Changelog: - Daijin now has a proper schema and a proper configure command. Still to implement: schema check on starting Daijin itself - Documentation is now more complete (see eg sections on Compare and Configuration) - Now mo and O have been renamed to g and G respectively

- Python
Published by lucventurini almost 10 years ago

mikado - Beta 3

Changelog: - Solved a bug which prevented correct loading of ORFs in monoexonic transcripts with an ORF on the negative strand - Dagger is now daijin - Added controls in Daijin for: - Minimum length of TD ORFs - rerun from a particular target - Minor polishes to command line interfaces and the like.

- Python
Published by lucventurini almost 10 years ago

mikado - Beta 2

Changes: - Small bug fixes for DAGGER - Introduced a very basic configuration utility for DAGGER - Now Mikado programs have a sensible help message, including description.

- Python
Published by lucventurini almost 10 years ago

mikado - Beta 1

Finally almost ready! Major changes: - mikado is now completely integrated with DAGGER, h/t Daniel Mapleson - Both mikado and dagger are called as "mikado" and "dagger", without any trailing ".py" - We are now in feature freeze, with the exception of dagger.

- Python
Published by lucventurini almost 10 years ago

mikado - Blade sharpening

This release is mainly focused on two aims: integration with Dagger, and the possibility of performing a good self-comparison with mikado.py compare.

Detailed changes for the release: - scoring files are now inside a proper subfolder, and can be copied during configuration, for user modification - the configuration process has now been widely rehauled, together with the configuration files, to make Mikado compatible with DAGGER - Introduced a functioning deep copy method for gene objects - Now redundant adding of exons into a transcript object will not throw an exception but will be silently ignored - Mikado parsers now support GZIPped and BZIPped files, through python-magic - added the possibility of specifying which subset of input assemblies are strand-specific during the preparation step - configure and prepare now accept a tab-separated text file (format: , ) as input - compare now can perform two types of self-analysis: - "internal", which outputs only a tmap file and performs a comparison all-vs-all within each multi-isoform gene locus - "self", in which each transcript is analysed as it would during a normal run, but removing itself from the reference. - Now serialise can accept FASTA references, pyFaidx is used to recover the correct FAI - pick and configure can now specify the mode of action (nosplit, split, stringent, lenient, permissive) by CL instead of having to modify the configuration file - cleaned up the sample_data directory, plus now the configuration file is created on the spot by Snakemake (ensuring compatibility with the latest version)

- Python
Published by lucventurini almost 10 years ago

mikado - Stats bugfix

Changes for this micro release: - Bug fixes for some debug log messages - Bug fixes for the stats utility - added the flank option for mikado

- Python
Published by lucventurini about 10 years ago

mikado - Faster stats

Changes: - Stats is now much faster and lighter, as objects are discarded upon analysis, not kept around for the whole run duration - Pick now preferentially outputs only one ORF per transcript, behaviour regulated by pick/outputformat/reportall_orfs - BugFix for the detection of fragments during pick

- Python
Published by lucventurini about 10 years ago

mikado - Clap-o-meter

New features: - Added a converter for (GTF <=> GFF3) > BED12 conversions in mikado.py util convert - Now it is possible to provide a predefined malus/bonus to transcripts according to their sources using the following syntax in the configuration file:

pick: source_score: source1: score1 source2: score ...

It is generally not advised to use such an option unless the user really wants to emphasize or penalise a particular set of transcripts, eg. those coming from PacBio data.

- Python
Published by lucventurini about 10 years ago

mikado - FastXML

Transitioned the XML parser to the experimental Bio.SearchIO module, which uses internally C-level APIs and is therefore twice as fast than the canonical implementation. This required some rewriting at the class level in Mikado and a cloning of the original class to make it a proper iterative class. Moreover, now requirements for the scoring can accept also eg boolean values.

- Python
Published by lucventurini about 10 years ago

mikado - Trim galore

Major changes for this release: - Trim is now functioning - ORFs from transdecoder are treated correctly - even truncated and internal ones. No more faux UTRs! - Changes in the DB structure to accomodate the ORF information - note, this means that the databases have to be regenerated

- Python
Published by lucventurini about 10 years ago

mikado - Refmap stats

Now refmap files also include the F1 statistics. Small bug fixes in ResultStorer and Transcript.

- Python
Published by lucventurini about 10 years ago

mikado - Cythonize tree

Changes: - switched to a Cython-based implementation of intervaltree, from https://bitbucket.org/james_taylor/bx-python/wiki/HowToInstall - Python Versions under 3.5 will now use SortedDict from sortedcontainers rather than OrderedDict (as OrderedDict was much slower before its overhaul in 3.5) - Compare now outputs also the number of exons in the prediction/reference in the tmap - Compare now can take as input also GTF files without the transcript feature in (only GTF files, GFF files will - and always will - raise an error in case of such a shenanigan) - BF for retained introns - Now Transcript and Gene objects are passed the logger from the upper level in compare etc, leading to a massive speed-up - Added lenient mode to compare for exon coordinates (like in RGASP) - added a doc folder - now compare is robust to errors/corruption in the index

- Python
Published by lucventurini about 10 years ago

mikado - Bugfixing

Minor release to fix the discarding of CDS features in many human gene models (due to too-stringent-rules on how a good CDS looks like).

- Python
Published by lucventurini about 10 years ago

mikado - Cython me up

Changelog: - compare, overlap and calcf1 are now in the Cython rather than CPython domain, leading to a ~3 fold speed up for Mikado compare - fixed compilation issues due to the presence of an init file in the root directory - Now Transcript objects store tuples, not Interval objects, to define exons etc. This has led to reduced memory footprint and a sizeable speedup, especially during sorting operations - Expanded the Gene class, which now covers all the user cases also for util stat - Mikado compare index now stores the phases as well - Now the CDS phase is checked on the fly while finalizing the transcript, like in GenomeTools, but unlike in GT, it is permissible for a transcript to have a starting phase different from 0 - Now util stats is much more lightweight thanks to a new function to calculate quantiles from counters, reducing the necessary memory for large annotations (eg human) - Createmodel now is part of mikado.py util and can subset the input by proportion or fixed number.

- Python
Published by lucventurini about 10 years ago

mikado - Phasing in

Changelog: - Corrected the definition of enddistancefrom_junction - Now Mikado will store the phase information in a different way, allowing to have eg transcripts with a starting phase of 1, and/or transcript whose CDS length is not a multiple of 3 - as long as the phase is coherent. - Now Mikado raises a warning, not an exception, if a transcript has not the boundaries defined by exons. Necessary for compatibility with Augustus. - BF for prepare - a couple of flags weren't passed on properly from the CLI.