mimeo
Scan genomes for internally repeated sequences, elements which are repetitive in another species, or high-identity HGT candidate regions between species.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Repository
Scan genomes for internally repeated sequences, elements which are repetitive in another species, or high-identity HGT candidate regions between species.
Basic Info
- Host: GitHub
- Owner: Adamtaranto
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://adamtaranto.github.io/mimeo/
- Size: 671 KB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 2
- Releases: 4
Metadata Files
README.md
Mimeo
A tool for finding and annotating repeats in whole-genome alignments.
Table of contents
Modules
Mimeo comprises three tools for parsing repeats from whole-genome alignments:
mimeo-self
Internal repeat finder. Mimeo-self aligns a genome to itself and extracts high-identity segments above a coverage threshold. This method is less sensitive to disruption by indels and repeat-directed point mutations than kmer-based methods such as RepeatScout. Reported annotations indicate overlapping segments above the coverage threshold, mimeo-self does not attempt to separate nested repeats. Use this tool to identify candidate repeat regions for curated annotation.
mimeo-x
Cross-species repeat finder. A newly acquired or low-copy transposon may slip past copy-number based annotation tools. Mimeo-x searches for features which are abundant in an external reference genome, allowing for annotation of complete elements as they occur in a horizontal-transfer donor species, or of conserved coding segments of related transposon families.
mimeo-map
Find all high-identity segments shared between genomes. Mimeo-map identifies candidate horizontally transferred segments between sufficiently diverged species. When comparing isolates of a single species, aligned segments correspond to directly homologous sequences and internally repetitive features.
Intra/Inter-genomic alignments from Mimeo-self or Mimeo-x can be reprocessed with Mimeo-map to generate annotations of unfiltered/uncollapsed alignments. These raw alignment annotations can be used to interrogate repetitive-segments for coverage breakpoints corresponding to nested transposons with differing abundances across the genome.
mimeo-filter
An additional tool mimeo-filter is now included to allow post-filtering of SSR-rich sequences from FASTA formatted candidate-repeat libraries.
Installing Mimeo
Requirements:
Install from Bioconda:
bash
conda install mimeo
Install from PyPi:
bash
pip install mimeo
Clone and install from this repository:
```bash git clone https://github.com/Adamtaranto/mimeo.git && cd mimeo
pip install -e '.[dev]' ```
Example usage
Demo: mimeo-self
Annotate features in genome A which are > 100bp and occur with >= 80% identity at least 3 times on other scaffolds OR at least 4 times on the same scaffold.
bash
mimeo self --adir data/A_genome_Split --afasta data/A_genome.fasta \
-d MS_outdir --gffout A_genome_Inter3_Intra4_id80_len_100.gff3 \
--outfile A_genome_Self_Align.tab --label A_Rep3 --prefix A_Self --minIdt 80 \
--minLen 100 --minCov 3 --intraCov 4 --strictSelf
Output:
- MSoutdir/AgenomeInter3Intra4id80len_100.gff3
- MSoutdir/AgenomeSelfAlign.tab
- data/AgenomeSplit/*.fa
Demo: mimeo-x
Annotate features in genome A which are > 100bp and occur with >= 80% identity at least 5 times in genome B.
bash
mimeo x --afasta data/A_genome.fasta --bfasta data/B_genome.fasta \
-d MX_outdir --gffout B_Rep5_in_A.gff3 --outfile B_Reps_in_A_id80_len100.tab \
--label B_Rep5 --prefix B_Rep5 --minIdt 80 --minLen 100 --minCov 5
Output:
- MXoutdir/BRep5inA.gff3
- MXoutdir/BRepsinAid80len100.tab
Demo: mimeo-map
Annotate features in genome A which are > 100bp and occur with >= 90% identity in genome B. No coverage filter, all alignments are reported.
bash
mimeo map --afasta data/A_genome.fasta --bfasta data/B_genome.fasta \
-d MM_outdir --gffout B_in_A_id90.gff3 --outfile B_in_A_id90.tab \
--label B_90 --prefix B_90 --minIdt 90 --minLen 100
Output:
- MMoutdir/BinAid90.gff3
- MMoutdir/BinAid90.tab
mimeo-map + SSR filter
Annotate features in genome A which are > 100bp and occur with >= 98% identity in genome B. Reuse B to A-genome alignment from the previous run.
Filter out hits which are >= 40% tandem repeats. Write filtered hits as tab file and GFF3 annotation.
bash
mimeo map --afasta data/A_genome.fasta --bfasta data/B_genome.fasta \
-d MM_outdir --gffout B_in_A_id98_maxSSR40.gff3 --outfile B_in_A_id98.tab \
--label B_98 --prefix B_98 --minIdt 98 --minLen 100 \
--recycle --maxtandem 40 --writeTRF
Output:
- MMoutdir/BinAid98_maxSSR40.gff3
- MMoutdir/BinAid98.tab.trf
Demo: mimeo-filter
Filter sequences comprised of >= 40% short tandem repeats from a multifasta library of candidate transposons.
bash
mimeo filter --infile data/candidate_TEs.fa
Output:
- candidateTEsfiltered.fa
Standard options
mimeo-self
```code Usage: mimeo self [-h] [--adir ADIR] [--afasta AFASTA] [-r] [-d OUTDIR] [--gffout GFFOUT] [--outfile OUTFILE] [--verbose] [--label LABEL] [--prefix PREFIX] [--lzpath LZPATH] [--bedtools BEDTOOLS] [--minIdt MINIDT] [--minLen MINLEN] [--minCov MINCOV] [--hspthresh HSPTHRESH] [--intraCov INTRACOV] [--strictSelf]
Internal repeat finder. Mimeo-self aligns a genome to itself and extracts high-identity segments above a coverage threshold.
Optional arguments: -h, --help Show this help message and exit. --adir Name of the directory containing sequences from the genome. Write split files here if providing genome as multifasta. --afasta Genome as multifasta. -r, --recycle Use existing alignment "--outfile" if found. -d , --outdir Write output files to this directory. (Default: cwd) --gffout Name of GFF3 annotation file. --outfile Name of alignment result file. --verbose If set report LASTZ progress. --label Set annotation TYPE field in gff. --prefix ID prefix for internal repeats. --lzpath Custom path to LASTZ executable if not in $PATH. --bedtools Custom path to bedtools executable if not in $PATH. --minIdt Minimum alignment identity to report. --minLen Minimum alignment length to report. --minCov Minimum depth of aligned segments to report repeat feature. --hspthresh Set HSP min score threshold for LASTZ. --intraCov Minimum depth of aligned segments from the same scaffold to report feature. Used if "--strictSelf" mode is selected. --strictSelf If set process same-scaffold alignments separately with the option to use higher "--intraCov" threshold. Sometimes useful to avoid false repeat calls from staggered alignments over SSRs or short tandem duplication. ```
mimeo-x
```code Usage: mimeo x [-h] [--adir ADIR] [--bdir BDIR] [--afasta AFASTA] [--bfasta BFASTA] [-r] [-d OUTDIR] [--gffout GFFOUT] [--outfile OUTFILE] [--verbose] [--label LABEL] [--prefix PREFIX] [--lzpath LZPATH] [--bedtools BEDTOOLS] [--minIdt MINIDT] [--minLen MINLEN] [--minCov MINCOV] [--hspthresh HSPTHRESH]
Cross-species repeat finder. Mimeo-x searches for features which are abundant in an external reference genome.
Optional arguments: -h, --help Show this help message and exit. --adir Name of the directory containing sequences from A genome. --bdir Name of the directory containing sequences from B genome. --afasta A genome as multifasta. --bfasta B genome as multifasta. -r, --recycle Use existing alignment "--outfile" if found. -d , --outdir Write output files to this directory. (Default: cwd) --gffout Name of GFF3 annotation file. --outfile Name of alignment result file. --verbose If set report LASTZ progress. --label Set annotation TYPE field in GFF. --prefix ID prefix for B-genome repeats annotated in A-genome. --lzpath Custom path to LASTZ executable if not in $PATH. --bedtools Custom path to bedtools executable if not in $PATH. --minIdt Minimum alignment identity to report. --minLen Minimum alignment length to report. --minCov Minimum depth of B-genome hits to report feature in A-genome. --hspthresh Set HSP min score threshold for LASTZ. ```
mimeo-map
```code Usage: mimeo map [-h] [--adir ADIR] [--bdir BDIR] [--afasta AFASTA] [--bfasta BFASTA] [-r] [-d OUTDIR] [--gffout GFFOUT] [--outfile OUTFILE] [--verbose] [--label LABEL] [--prefix PREFIX] [--keeptemp] [--lzpath LZPATH] [--minIdt MINIDT] [--minLen MINLEN] [--hspthresh HSPTHRESH] [--TRFpath TRFPATH] [--tmatch TMATCH] [--tmismatch TMISMATCH] [--tdelta TDELTA] [--tPM TPM] [--tPI TPI] [--tminscore TMINSCORE] [--tmaxperiod TMAXPERIOD] [--maxtandem MAXTANDEM] [--writeTRF]
Find all high-identity segments shared between genomes.
Optional arguments: -h, --help Show this help message and exit. --adir Name of the directory containing sequences from A genome. --bdir Name of the directory containing sequences from B genome. --afasta A genome as multifasta. --bfasta B genome as multifasta. -r, --recycle Use existing alignment "--outfile" if found. -d, --outdir Write output files to this directory. (Default: cwd) --gffout Name of GFF3 annotation file. If not set, suppress output. --outfile Name of alignment result file. --verbose If set report LASTZ progress. --label Set annotation TYPE field in GFF. --prefix ID prefix for B-genome hits annotated in A-genome. --keeptemp If set does not remove temp files. --lzpath Custom path to LASTZ executable if not in $PATH. --minIdt Minimum alignment identity to report. --minLen Minimum alignment length to report. --hspthresh Set HSP min score threshold for LASTZ. --TRFpath Custom path to TRF executable if not in $PATH. --tmatch TRF matching weight. --tmismatch TRF mismatching penalty. --tdelta TRF indel penalty. --tPM TRF match probability. --tPI TRF indel probability. --tminscore TRF minimum alignment score to report. --tmaxperiod TRF maximum period size to report. --maxtandem Max percentage of an A-genome alignment which may be masked by TRF. If exceeded, the alignment will be discarded. --writeTRF If set write TRF filtered alignment file for use with other mimeo modules. ```
mimeo-filter
```code Usage: mimeo filter [-h] --infile INFILE [-d OUTDIR] [--outfile OUTFILE] [--keeptemp] [--verbose] [--TRFpath TRFPATH] [--tmatch TMATCH] [--tmismatch TMISMATCH] [--tdelta TDELTA] [--tPM TPM] [--tPI TPI] [--tminscore TMINSCORE] [--tmaxperiod TMAXPERIOD] [--maxtandem MAXTANDEM]
Filter SSR containing sequences from FASTA library of repeats.
Optional arguments: -h, --help Show this help message and exit. --infile Name of the directory containing sequences from A genome. -d, --outdir Write output files to this directory. (Default: cwd) --outfile Name of alignment result file. --keeptemp If set does not remove temp files. --verbose If set report LASTZ progress. --TRFpath Custom path to TRF executable if not in $PATH. --tmatch TRF matching weight --tmismatch TRF mismatching penalty. --tdelta TRF indel penalty. --tPM TRF match probability. --tPI TRF indel probability. --tminscore TRF minimum alignment score to report. --tmaxperiod TRF maximum period size to report. Note: Setting this score too high may exclude some LTR retrotransposons. Optimal len to exclude only SSRs is 10-50bp. --maxtandem Max percentage of a sequence which may be masked by TRF. If exceeded, the element will be discarded.
```
Importing alignments
Whole genome alignments generated by alternative tools (i.e. BLAT) can be provided to any of the Mimeo modules as a tab-delimited file with the columns:
code
[1] name1 = Name of target sequence in genome A
[2] strand1 = Strand of alignment in target sequence
[3] start1 = 5-prime position of alignment in target (lower value irrespective of strand)
[4] end1 = 3-prime position of alignment in target (higher value irrespective of strand)
[5] name2 = Name of source sequence in genome B
[6] strand2 = Strand of alignment in source
[7] start2+ = 5-prime position of alignment in source (lower value irrespective of strand)
[8] end2+ = 3-prime position of alignment in source (higher value irrespective of strand)
[9] score = Alignment score as int
[10] identity = Identity of alignment as float
File should be sorted by columns 1,3,4
License
Software provided under MIT license.
Owner
- Name: Adam Taranto
- Login: Adamtaranto
- Kind: user
- Location: Melbourne, Australia
- Company: The University of Melbourne
- Repositories: 38
- Profile: https://github.com/Adamtaranto
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "mimeo"
version: 1.2.0
date-released: 2025-04-06
authors:
- family-names: Taranto
given-names: Adam
orcid: https://orcid.org/0000-0003-4759-3475
affiliation: "The University of Melbourne"
repository-code: "https://github.com/Adamtaranto/mimeo"
license: MIT
abstract: >-
A tool for finding and annotating repeats in whole-genome alignments.
Mimeo uses the LASTZ alignment engine to find regions of similarity within
or between genomes, we apply filtering heuristics to identify candidate
repeats or HGT events in a reference independent manner.
keywords:
- genomics
- transposons
- bioinformatics
preferred-citation:
type: software
authors:
- family-names: Taranto
given-names: Adam
orcid: https://orcid.org/0000-0003-4759-3475
affiliation: "The University of Melbourne"
title: "Mimeo: A tool for finding and annotating repeats in whole-genome alignments."
year: 2017
url: "https://github.com/Adamtaranto/mimeo"
repository-code: "https://github.com/Adamtaranto/mimeo"
# doi: TBA
GitHub Events
Total
- Create event: 5
- Release event: 3
- Issues event: 2
- Delete event: 3
- Push event: 18
- Pull request event: 6
Last Year
- Create event: 5
- Release event: 3
- Issues event: 2
- Delete event: 3
- Push event: 18
- Pull request event: 6
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Adam Taranto | a****o@g****m | 29 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 2
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Adamtaranto (2)
Pull Request Authors
- Adamtaranto (6)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 91 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 8
- Total maintainers: 1
pypi.org: mimeo
Scan genomes for internally repeated sequences, elements which are repetitive in another species, or high-identity HGT candidate regions between species.
- Documentation: https://mimeo.readthedocs.io/
- License: MIT
-
Latest release: 1.2.1
published about 1 year ago