mimeo

Scan genomes for internally repeated sequences, elements which are repetitive in another species, or high-identity HGT candidate regions between species.

https://github.com/adamtaranto/mimeo

Last synced: 7 months ago · JSON representation ·

Repository

Scan genomes for internally repeated sequences, elements which are repetitive in another species, or high-identity HGT candidate regions between species.

Basic Info

Host: GitHub
Owner: Adamtaranto
License: mit
Language: Python
Default Branch: main
Homepage: https://adamtaranto.github.io/mimeo/
Size: 671 KB

Statistics

Stars: 1
Watchers: 2
Forks: 2
Open Issues: 2
Releases: 4

Created over 8 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Mimeo

A tool for finding and annotating repeats in whole-genome alignments.

Modules

Mimeo comprises three tools for parsing repeats from whole-genome alignments:

mimeo-self

Internal repeat finder. Mimeo-self aligns a genome to itself and extracts high-identity segments above a coverage threshold. This method is less sensitive to disruption by indels and repeat-directed point mutations than kmer-based methods such as RepeatScout. Reported annotations indicate overlapping segments above the coverage threshold, mimeo-self does not attempt to separate nested repeats. Use this tool to identify candidate repeat regions for curated annotation.

mimeo-x

Cross-species repeat finder. A newly acquired or low-copy transposon may slip past copy-number based annotation tools. Mimeo-x searches for features which are abundant in an external reference genome, allowing for annotation of complete elements as they occur in a horizontal-transfer donor species, or of conserved coding segments of related transposon families.

mimeo-map

Find all high-identity segments shared between genomes. Mimeo-map identifies candidate horizontally transferred segments between sufficiently diverged species. When comparing isolates of a single species, aligned segments correspond to directly homologous sequences and internally repetitive features.

Intra/Inter-genomic alignments from Mimeo-self or Mimeo-x can be reprocessed with Mimeo-map to generate annotations of unfiltered/uncollapsed alignments. These raw alignment annotations can be used to interrogate repetitive-segments for coverage breakpoints corresponding to nested transposons with differing abundances across the genome.

mimeo-filter

An additional tool mimeo-filter is now included to allow post-filtering of SSR-rich sequences from FASTA formatted candidate-repeat libraries.

Installing Mimeo

Requirements:

LASTZ genome alignment tool from the Miller Lab, Penn State.
bedtools
trf

Install from Bioconda:

bash conda install mimeo

Install from PyPi:

bash pip install mimeo

Clone and install from this repository:

```bash git clone https://github.com/Adamtaranto/mimeo.git && cd mimeo

pip install -e '.[dev]' ```

Example usage

Demo: mimeo-self

Annotate features in genome A which are > 100bp and occur with >= 80% identity at least 3 times on other scaffolds OR at least 4 times on the same scaffold.

bash mimeo self --adir data/A_genome_Split --afasta data/A_genome.fasta \ -d MS_outdir --gffout A_genome_Inter3_Intra4_id80_len_100.gff3 \ --outfile A_genome_Self_Align.tab --label A_Rep3 --prefix A_Self --minIdt 80 \ --minLen 100 --minCov 3 --intraCov 4 --strictSelf

Output:

MSoutdir/AgenomeInter3Intra4id80len_100.gff3
MSoutdir/AgenomeSelfAlign.tab
data/AgenomeSplit/*.fa

Demo: mimeo-x

Annotate features in genome A which are > 100bp and occur with >= 80% identity at least 5 times in genome B.

bash mimeo x --afasta data/A_genome.fasta --bfasta data/B_genome.fasta \ -d MX_outdir --gffout B_Rep5_in_A.gff3 --outfile B_Reps_in_A_id80_len100.tab \ --label B_Rep5 --prefix B_Rep5 --minIdt 80 --minLen 100 --minCov 5

Output:

MXoutdir/BRep5inA.gff3
MXoutdir/BRepsinAid80len100.tab

Demo: mimeo-map

Annotate features in genome A which are > 100bp and occur with >= 90% identity in genome B. No coverage filter, all alignments are reported.

bash mimeo map --afasta data/A_genome.fasta --bfasta data/B_genome.fasta \ -d MM_outdir --gffout B_in_A_id90.gff3 --outfile B_in_A_id90.tab \ --label B_90 --prefix B_90 --minIdt 90 --minLen 100

Output:

MMoutdir/BinAid90.gff3
MMoutdir/BinAid90.tab

mimeo-map + SSR filter

Annotate features in genome A which are > 100bp and occur with >= 98% identity in genome B. Reuse B to A-genome alignment from the previous run.

Filter out hits which are >= 40% tandem repeats. Write filtered hits as tab file and GFF3 annotation.

bash mimeo map --afasta data/A_genome.fasta --bfasta data/B_genome.fasta \ -d MM_outdir --gffout B_in_A_id98_maxSSR40.gff3 --outfile B_in_A_id98.tab \ --label B_98 --prefix B_98 --minIdt 98 --minLen 100 \ --recycle --maxtandem 40 --writeTRF

Output:

MMoutdir/BinAid98_maxSSR40.gff3
MMoutdir/BinAid98.tab.trf

Demo: mimeo-filter

Filter sequences comprised of >= 40% short tandem repeats from a multifasta library of candidate transposons.

bash mimeo filter --infile data/candidate_TEs.fa

Output:

candidateTEsfiltered.fa

Standard options

mimeo-self

```code Usage: mimeo self [-h] [--adir ADIR] [--afasta AFASTA] [-r] [-d OUTDIR] [--gffout GFFOUT] [--outfile OUTFILE] [--verbose] [--label LABEL] [--prefix PREFIX] [--lzpath LZPATH] [--bedtools BEDTOOLS] [--minIdt MINIDT] [--minLen MINLEN] [--minCov MINCOV] [--hspthresh HSPTHRESH] [--intraCov INTRACOV] [--strictSelf]

Internal repeat finder. Mimeo-self aligns a genome to itself and extracts high-identity segments above a coverage threshold.

Optional arguments: -h, --help Show this help message and exit. --adir Name of the directory containing sequences from the genome. Write split files here if providing genome as multifasta. --afasta Genome as multifasta. -r, --recycle Use existing alignment "--outfile" if found. -d , --outdir Write output files to this directory. (Default: cwd) --gffout Name of GFF3 annotation file. --outfile Name of alignment result file. --verbose If set report LASTZ progress. --label Set annotation TYPE field in gff. --prefix ID prefix for internal repeats. --lzpath Custom path to LASTZ executable if not in $PATH. --bedtools Custom path to bedtools executable if not in $PATH. --minIdt Minimum alignment identity to report. --minLen Minimum alignment length to report. --minCov Minimum depth of aligned segments to report repeat feature. --hspthresh Set HSP min score threshold for LASTZ. --intraCov Minimum depth of aligned segments from the same scaffold to report feature. Used if "--strictSelf" mode is selected. --strictSelf If set process same-scaffold alignments separately with the option to use higher "--intraCov" threshold. Sometimes useful to avoid false repeat calls from staggered alignments over SSRs or short tandem duplication. ```

mimeo-x

```code Usage: mimeo x [-h] [--adir ADIR] [--bdir BDIR] [--afasta AFASTA] [--bfasta BFASTA] [-r] [-d OUTDIR] [--gffout GFFOUT] [--outfile OUTFILE] [--verbose] [--label LABEL] [--prefix PREFIX] [--lzpath LZPATH] [--bedtools BEDTOOLS] [--minIdt MINIDT] [--minLen MINLEN] [--minCov MINCOV] [--hspthresh HSPTHRESH]

Cross-species repeat finder. Mimeo-x searches for features which are abundant in an external reference genome.

Optional arguments: -h, --help Show this help message and exit. --adir Name of the directory containing sequences from A genome. --bdir Name of the directory containing sequences from B genome. --afasta A genome as multifasta. --bfasta B genome as multifasta. -r, --recycle Use existing alignment "--outfile" if found. -d , --outdir Write output files to this directory. (Default: cwd) --gffout Name of GFF3 annotation file. --outfile Name of alignment result file. --verbose If set report LASTZ progress. --label Set annotation TYPE field in GFF. --prefix ID prefix for B-genome repeats annotated in A-genome. --lzpath Custom path to LASTZ executable if not in $PATH. --bedtools Custom path to bedtools executable if not in $PATH. --minIdt Minimum alignment identity to report. --minLen Minimum alignment length to report. --minCov Minimum depth of B-genome hits to report feature in A-genome. --hspthresh Set HSP min score threshold for LASTZ. ```

mimeo-map

```code Usage: mimeo map [-h] [--adir ADIR] [--bdir BDIR] [--afasta AFASTA] [--bfasta BFASTA] [-r] [-d OUTDIR] [--gffout GFFOUT] [--outfile OUTFILE] [--verbose] [--label LABEL] [--prefix PREFIX] [--keeptemp] [--lzpath LZPATH] [--minIdt MINIDT] [--minLen MINLEN] [--hspthresh HSPTHRESH] [--TRFpath TRFPATH] [--tmatch TMATCH] [--tmismatch TMISMATCH] [--tdelta TDELTA] [--tPM TPM] [--tPI TPI] [--tminscore TMINSCORE] [--tmaxperiod TMAXPERIOD] [--maxtandem MAXTANDEM] [--writeTRF]

Find all high-identity segments shared between genomes.

Optional arguments: -h, --help Show this help message and exit. --adir Name of the directory containing sequences from A genome. --bdir Name of the directory containing sequences from B genome. --afasta A genome as multifasta. --bfasta B genome as multifasta. -r, --recycle Use existing alignment "--outfile" if found. -d, --outdir Write output files to this directory. (Default: cwd) --gffout Name of GFF3 annotation file. If not set, suppress output. --outfile Name of alignment result file. --verbose If set report LASTZ progress. --label Set annotation TYPE field in GFF. --prefix ID prefix for B-genome hits annotated in A-genome. --keeptemp If set does not remove temp files. --lzpath Custom path to LASTZ executable if not in $PATH. --minIdt Minimum alignment identity to report. --minLen Minimum alignment length to report. --hspthresh Set HSP min score threshold for LASTZ. --TRFpath Custom path to TRF executable if not in $PATH. --tmatch TRF matching weight. --tmismatch TRF mismatching penalty. --tdelta TRF indel penalty. --tPM TRF match probability. --tPI TRF indel probability. --tminscore TRF minimum alignment score to report. --tmaxperiod TRF maximum period size to report. --maxtandem Max percentage of an A-genome alignment which may be masked by TRF. If exceeded, the alignment will be discarded. --writeTRF If set write TRF filtered alignment file for use with other mimeo modules. ```

mimeo-filter

```code Usage: mimeo filter [-h] --infile INFILE [-d OUTDIR] [--outfile OUTFILE] [--keeptemp] [--verbose] [--TRFpath TRFPATH] [--tmatch TMATCH] [--tmismatch TMISMATCH] [--tdelta TDELTA] [--tPM TPM] [--tPI TPI] [--tminscore TMINSCORE] [--tmaxperiod TMAXPERIOD] [--maxtandem MAXTANDEM]

Filter SSR containing sequences from FASTA library of repeats.

Optional arguments: -h, --help Show this help message and exit. --infile Name of the directory containing sequences from A genome. -d, --outdir Write output files to this directory. (Default: cwd) --outfile Name of alignment result file. --keeptemp If set does not remove temp files. --verbose If set report LASTZ progress. --TRFpath Custom path to TRF executable if not in $PATH. --tmatch TRF matching weight --tmismatch TRF mismatching penalty. --tdelta TRF indel penalty. --tPM TRF match probability. --tPI TRF indel probability. --tminscore TRF minimum alignment score to report. --tmaxperiod TRF maximum period size to report. Note: Setting this score too high may exclude some LTR retrotransposons. Optimal len to exclude only SSRs is 10-50bp. --maxtandem Max percentage of a sequence which may be masked by TRF. If exceeded, the element will be discarded.

```

Importing alignments

Whole genome alignments generated by alternative tools (i.e. BLAT) can be provided to any of the Mimeo modules as a tab-delimited file with the columns:

code [1] name1 = Name of target sequence in genome A [2] strand1 = Strand of alignment in target sequence [3] start1 = 5-prime position of alignment in target (lower value irrespective of strand) [4] end1 = 3-prime position of alignment in target (higher value irrespective of strand) [5] name2 = Name of source sequence in genome B [6] strand2 = Strand of alignment in source [7] start2+ = 5-prime position of alignment in source (lower value irrespective of strand) [8] end2+ = 3-prime position of alignment in source (higher value irrespective of strand) [9] score = Alignment score as int [10] identity = Identity of alignment as float

File should be sorted by columns 1,3,4

License

Software provided under MIT license.

Owner

Name: Adam Taranto
Login: Adamtaranto
Kind: user
Location: Melbourne, Australia
Company: The University of Melbourne

Repositories: 38
Profile: https://github.com/Adamtaranto

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "mimeo"
version: 1.2.0
date-released: 2025-04-06
authors:
  - family-names: Taranto
    given-names: Adam
    orcid: https://orcid.org/0000-0003-4759-3475
    affiliation: "The University of Melbourne"
repository-code: "https://github.com/Adamtaranto/mimeo"
license: MIT
abstract: >-
  A tool for finding and annotating repeats in whole-genome alignments.
  Mimeo uses the LASTZ alignment engine to find regions of similarity within
  or between genomes, we apply filtering heuristics to identify candidate
  repeats or HGT events in a reference independent manner.
keywords:
  - genomics
  - transposons
  - bioinformatics
preferred-citation:
  type: software
  authors:
    - family-names: Taranto
      given-names: Adam
      orcid: https://orcid.org/0000-0003-4759-3475
      affiliation: "The University of Melbourne"
  title: "Mimeo: A tool for finding and annotating repeats in whole-genome alignments."
  year: 2017
  url: "https://github.com/Adamtaranto/mimeo"
  repository-code: "https://github.com/Adamtaranto/mimeo"
  # doi: TBA

GitHub Events

Total

Create event: 5
Release event: 3
Issues event: 2
Delete event: 3
Push event: 18
Pull request event: 6

Last Year

Create event: 5
Release event: 3
Issues event: 2
Delete event: 3
Push event: 18
Pull request event: 6

Committers

Last synced: over 2 years ago

All Time

Total Commits: 29
Total Committers: 1
Avg Commits per committer: 29.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Adam Taranto	a**o@g**m	29

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 2
Total pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 2 days
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 2 days
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Adamtaranto (2)

Pull Request Authors

Adamtaranto (6)

Top Labels

Issue Labels

enhancement (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 91 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 8
Total maintainers: 1

pypi.org: mimeo

Scan genomes for internally repeated sequences, elements which are repetitive in another species, or high-identity HGT candidate regions between species.

Documentation: https://mimeo.readthedocs.io/
License: MIT
Latest release: 1.2.1
published about 1 year ago

Versions: 8
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 91 Last month

Rankings

Dependent packages count: 10.0%

Forks count: 19.1%

Dependent repos count: 21.7%

Average: 27.0%

Stargazers count: 31.9%

Downloads: 52.1%

Maintainers (1)

adamtaranto

Last synced: 8 months ago

mimeo

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Mimeo

Table of contents

Modules

mimeo-self

mimeo-x

mimeo-map

mimeo-filter

Installing Mimeo

Example usage

Demo: mimeo-self

Demo: mimeo-x

Demo: mimeo-map

mimeo-map + SSR filter

Demo: mimeo-filter

Standard options

mimeo-self

mimeo-x

mimeo-map

mimeo-filter

Importing alignments

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: mimeo

Rankings

Maintainers (1)