mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.

https://github.com/ei-corebioinformatics/mikado

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
✓
Academic publication links
Links to: ncbi.nlm.nih.gov
✓
Committers with academic emails
7 of 16 committers (43.8%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.

Basic Info

Host: GitHub
Owner: EI-CoreBioinformatics
License: lgpl-3.0
Language: Python
Default Branch: master
Homepage: https://mikado.readthedocs.io/en/stable/
Size: 123 MB

Statistics

Stars: 101
Watchers: 11
Forks: 18
Open Issues: 28
Releases: 86

Created almost 11 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License Authors

Mikado - pick your transcript: a pipeline to determine and select the best RNA-Seq prediction

Mikado is a lightweight Python3 pipeline to identify the most useful or “best” set of transcripts from multiple transcript assemblies. Our approach leverages transcript assemblies generated by multiple methods to define expressed loci, assign a representative transcript and return a set of gene models that selects against transcripts that are chimeric, fragmented or with short or disrupted CDS. Loci are first defined based on overlap criteria and each transcript therein is scored based on up to 50 available metrics relating to ORF and cDNA size, relative position of the ORF within the transcript, UTR length and presence of multiple ORFs. Mikado can also utilize blast data to score transcripts based on proteins similarity and to identify and split chimeric transcripts. Optionally, junction confidence data as provided by Portcullis can be used to improve the assessment. The best-scoring transcripts are selected as the primary transcripts of their respective gene loci; additionally, Mikado can bring back other valid splice variants that are compatible with the primary isoform.

Mikado uses GTF or GFF files as mandatory input. Non-mandatory but highly recommended input data can be generated by obtaining a set of reliable splicing junctions with Portcullis_, by locating coding ORFs on the transcripts using either Transdecoder or Prodigal, and by obtaining homology information through either BLASTX or DIAMOND.

Our approach is amenable to include sequences generated by de novo Illumina assemblers or reads generated from long read technologies such as Pacbio.

Extended documentation is hosted on ReadTheDocs: http://mikado.readthedocs.org/

Installation

Installation

Docker Installation

Mikado can be installed with docker. If you don't have docker, please install docker first. Then you can pull the docker image with mikado installed console VERSION=2.3.5rc2 docker run gemygk/mikado:v${VERSION} mikado -h

Singularity Installation

Mikado can be installed with singularity. If you don't have singularity, please install singularity first. Then you can pull the singularity image with mikado installed. console VERSION=2.3.5rc2 singularity exec docker://gemygk/mikado:v${VERSION} mikado -h Or, we can build and run a singularity image ```console

1. Create a Singularity definition file

$ cat mikado-2.3.5rc2.def bootstrap: docker from: gemygk/mikado:v2.3.5rc2

Build the Singularity image

$ sudo singularity build mikado-2.3.5rc2.sif mikado-2.3.5rc2.def

Execute Mikado

$ singularity exec mikado-2.3.5rc2_CBG.sif mikado -h usage: Mikado [-h] [--version] {configure,prepare,serialise,pick,compare,util} ...

Mikado is a program to analyse RNA-Seq data and determine the best transcript for each locus in accordance to user-specified criteria.

optional arguments: -h, --help show this help message and exit --version Print Mikado current version and exit.

Components: {configure,prepare,serialise,pick,compare,util} These are the various components of Mikado: configure This utility guides the user through the process of creating a configuration file for Mikado. prepare Mikado prepare analyses an input GTF file and prepares it for the picking analysis by sorting its transcripts and performing some simple consistency checks. serialise Mikado serialise creates the database used by the pick program. It handles Junction and ORF BED12 files as well as BLAST XML results. pick Mikado pick analyses a sorted GTF/GFF files in order to identify its loci and choose the best transcripts according to user-specified criteria. It is dependent on files produced by the "prepare" and "serialise" components. compare Mikado compare produces a detailed comparison of reference and prediction files. It has been directly inspired by Cufflinks's cuffcompare and ParsEval. util Miscellaneous utilities ```

Conda/Mamba/Manual Installation

Mikado can be installed with conda. If you don't have conda, please install mamba first. Then you can create a new environment with mikado installed.

Install mamba with PyPy 3.9 in the base environment (https://github.com/conda-forge/miniforge?tab=readme-ov-file#miniforge-pypy3)

Replace /path/to with your installation directory when following the steps below:

console /path/to/src [src]$ wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge-pypy3-Linux-x86_64.sh [src]$ bash Miniforge-pypy3-Linux-x86_64.sh

I have installed the base to /path/to/x86_64/ location

If you have chosen to not have conda modify your shell scripts at all, to activate conda's base environment in your current shell session, please do:

console /path/to/src [src]$ eval "$(/path/to/x86_64/bin/conda shell.bash hook)"

Install Git

console /path/to/src (base) [src]$ mamba install -y git

Clone mikado console /path/to/src (base) [src]$ git clone git@github.com:EI-CoreBioinformatics/mikado.git (base) [src]$ cd mikado

Install Mikado dependencies console /path/to/src/mikado (base) [mikado]$ mamba env create -f environment.yml --prefix /path/to/x86_64/envs/mikado_env

Activate mikadoenv ```console /path/to/src/mikado (base) [mikado]$ conda activate mikadoenv (mikado_env) [mikado]$ ```

Do checks if all dependencies are installed. A full list of library dependencies can be found in the file requirements.txt console /path/to/src/mikado (mikado_env) [mikado]$ pip3 install wheel (mikado_env) [mikado]$ pip3 install -r requirements.txt (mikado_env) [mikado]$ pip3 install Cython - For the above commands - wheel, requirements.txt, Cython should all have the status 'Requirement already satisfied'

We need gcc for bdist_wheel (tested on gcc v5.2.0, v9.4.0)

console /path/to/src/mikado (mikado_env) [mikado]$ python3 setup.py bdist_wheel (mikado_env) [mikado]$ pip3 install dist/*.whl Now that installation is complete, run Mikado help ```console /path/to/src/mikado (mikado_env) [mikado]$ mikado -h usage: Mikado [-h] [--version] {configure,prepare,serialise,pick,compare,util} ...

Mikado is a program to analyse RNA-Seq data and determine the best transcript for each locus in accordance to user-specified criteria.

optional arguments: -h, --help show this help message and exit --version Print Mikado current version and exit.

Additional dependencies

Mikado by itself does require only the presence of a database solution, such as SQLite (although we do support MySQL and PostGRESQL as well). However, the Daijin pipeline requires additional programs to run.

For driving Mikado through Daijin, the following programs are required:

DIAMOND or Blast+ to provide protein homology. DIAMOND is preferred for its speed.
Prodigal or Transdecoder to calculate ORFs. The versions of Transdecoder that we tested scale poorly in terms of runtime and disk usage, depending on the size of the input dataset. Prodigal is much faster and lighter, however, the data on our paper has been generated through Transdecoder - not Prodigal. Currently we set Prodigal as default.
Mikado also makes use of a dataset of RNA-Seq high-quality junctions. We are using Portcullis to calculate this data alongside the alignments and assemblies.

If you plan to generate the alignment and assembly part as well through Daijin, the pipeline requires the following:

SAMTools
If you have short-read RNA-Seq data:
- At least one short-read RNA-Seq aligner, choice between [GSNAP], GMAP, STAR, TopHat2, HISAT2
- At least one RNA-Seq assembler, choice between StringTie, Trinity, [Cufflinks], CLASS2. Trinity additionally requires GMAP.
- Portcullis is optional, but highly recommended to retrieve high-quality junctions from the data
If you have long-read RNA-Seq data:
- At least one long-read RNA-Seq aligner, current choice between STAR and GMAP

Development guide

We provide source trail files (https://www.sourcetrail.com/) to aid in development. As required by the SourceTrail application, these files are present in the master directory, as "Mikado.srctrl*".

Citing Mikado

If you use Mikado in your work, please consider to cite:

Venturini L., Caim S., Kaithakottil G., Mapleson D.L., Swarbreck D. Leveraging multiple transcriptome assembly methods for improved gene structure annotation. GigaScience, Volume 7, Issue 8, 1 August 2018, giy093, doi:10.1093/gigascience/giy093

If you also use Portcullis to provide reliable junctions to Mikado, either independently or as part of the Daijin pipeline, please consider to cite:

Mapleson D.L., Venturini L., Kaithakottil G., Swarbreck D. Efficient and accurate detection of splice junctions from RNAseq with Portcullis. GigaScience, Volume 7, Issue 12, 12 December 2018, giy131, doi:10.1093/gigascience/giy131

Owner

Name: EI-CoreBioinformatics
Login: EI-CoreBioinformatics
Kind: organization

Repositories: 22
Profile: https://github.com/EI-CoreBioinformatics

GitHub Events

Total

Issues event: 11
Watch event: 5
Issue comment event: 9

Last Year

Issues event: 11
Watch event: 5
Issue comment event: 9

Committers

Last synced: over 2 years ago

All Time

Total Commits: 2,331
Total Committers: 16
Avg Commits per committer: 145.688
Development Distribution Score (DDS): 0.384

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Luca Venturini	L**i@t**k	1,436
Luca Venturini	l**i@g**m	363
Luca Venturini	l**i@t**k	325
Luca Venturini	l**i@e**k	58
ljyanesm	y**s@g**m	41
Luca Venturini	l**i@n**k	32
Luca Venturini	l**i@g**k	27
Daniel Mapleson	d**n@t**k	15
Luca Venturini (TGAC)	v**l@v**r	10
lucventurini	G****s	7
Daniel Mapleson (EI)	m**d@v**r	6
Luis Yanes	l**s@e**k	4
Christian Schudoma	c**a@e**k	3
Gemy Kaithakottil	g**k@g**m	2
Daniel Mapleson (EI)	m**d@v**r	1
Luca Venturini	l**i@g**k	1

Committer Domains (Top 20 + Academic)

earlham.ac.uk: 3 tgac.ac.uk: 3 v0267.tgaccluster: 2 genomicsengland.co.uk: 2 v0267.hpccluster: 1 nhm.ac.uk: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 93
Total pull requests: 33
Average time to close issues: 3 months
Average time to close pull requests: 15 days
Total issue authors: 48
Total pull request authors: 5
Average comments per issue: 3.01
Average comments per pull request: 1.12
Merged pull requests: 31
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 10
Pull requests: 0
Average time to close issues: 1 day
Average time to close pull requests: N/A
Issue authors: 8
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

lucventurini (15)
francicco (7)
xiekunwhy (5)
NJeanray (5)
swarbred (4)
sanyalab (3)
louphey (3)
14zac2 (2)
homonecloco (2)
bbista (2)
ljyanesm (2)
EricDeveaud (2)
lijing28101 (2)
cc-prolix (2)
cooketho (2)

Pull Request Authors

lucventurini (22)
ljyanesm (8)
gemygk (2)
dependabot[bot] (2)
MmasterT (1)

Top Labels

Issue Labels

enhancement (17) EI-Internal (11) question (8) bug (8) duplicate (1)

Pull Request Labels

enhancement (2) dependencies (2) EI-Internal (1)

Packages

Total packages: 2
Total downloads:
- pypi 94 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 2
(may contain duplicates)
Total versions: 29
Total maintainers: 3

pypi.org: mikado

A Python3 annotation program to select the best gene model in each locus

Homepage: https://github.com/EI-CoreBioinformatics/mikado
Documentation: https://mikado.readthedocs.io/
License: LGPL3
Latest release: 2.3.4
published about 4 years ago

Versions: 28
Dependent Packages: 0
Dependent Repositories: 2
Downloads: 94 Last month

Rankings

Stargazers count: 7.8%

Forks count: 9.1%

Dependent packages count: 10.0%

Average: 11.2%

Dependent repos count: 11.6%

Downloads: 17.3%

Maintainers (2)

EI-CoreBio LucVen

Last synced: 10 months ago

spack.io: py-mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.

Homepage: https://github.com/EI-CoreBioinformatics/mikado
License: []
Latest release: 1.2.4
published about 4 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 0.0%

Stargazers count: 19.7%

Forks count: 21.6%

Average: 24.7%

Dependent packages count: 57.3%

Maintainers (1)

adamjstewart

Last synced: 10 months ago

Dependencies

requirements.txt pypi

biopython >=1.78
dataclasses *
docutils *
drmaa *
hypothesis *
marshmallow ==3.14.1
marshmallow-dataclass ==8.5.3
msgpack >=1.0.0
networkx >=2.3
numpy >=1.17.2
pandas >=1.0
pip *
pyfaidx >=0.5.8
pysam >=0.15.3
pytest >=5.4.1
python-rapidjson >=1.0.0
pyyaml >=5.1.2
scipy >=1.3.1
snakemake >=5.7.0
sqlalchemy >=1.4.0
sqlalchemy-utils >=0.37
tabulate >=0.8.5
toml >=0.10.0
typeguard *

.github/workflows/codeql-analysis.yml actions

actions/checkout v2 composite
github/codeql-action/analyze v1 composite
github/codeql-action/autobuild v1 composite
github/codeql-action/init v1 composite

.github/workflows/publish-to-test-pypi.yml actions

actions/cache v1 composite
actions/checkout v2 composite
actions/checkout master composite
actions/setup-python v1 composite
conda-incubator/setup-miniconda v2 composite
pypa/gh-action-pypi-publish master composite

.github/workflows/python-package.yml actions

actions/cache v2 composite
actions/checkout v2 composite
codecov/codecov-action v1 composite
conda-incubator/setup-miniconda v2 composite

environment.yml pypi

pyproject.toml pypi

setup.py pypi

mikado

Science Score: 46.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Mikado - pick your transcript: a pipeline to determine and select the best RNA-Seq prediction

Contents

Installation

Docker Installation

Singularity Installation

1. Create a Singularity definition file

Build the Singularity image

Execute Mikado

Conda/Mamba/Manual Installation

Additional dependencies

Development guide

Citing Mikado

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: mikado

Rankings

Maintainers (2)

spack.io: py-mikado

Rankings

Maintainers (1)

Dependencies