derip2

Reconstruct ancestral state sequences of fungal repeat families by correcting for RIP-like mutations. Mask RIP or deamination events from alignments.

https://github.com/adamtaranto/derip2

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary

Keywords

bioinformatics fungi phylogenetics
Last synced: 6 months ago · JSON representation ·

Repository

Reconstruct ancestral state sequences of fungal repeat families by correcting for RIP-like mutations. Mask RIP or deamination events from alignments.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 0
  • Open Issues: 3
  • Releases: 4
Topics
bioinformatics fungi phylogenetics
Created almost 9 years ago · Last pushed 9 months ago
Metadata Files
Readme Contributing License Citation

README.md

License: MIT PyPI version codecov BioConda Install

code ██████╗ ███████╗██████╗ ██╗██████╗ ██████╗ ██╔══██╗██╔════╝██╔══██╗██║██╔══██╗╚════██╗ ██║ ██║█████╗ ██████╔╝██║██████╔╝ █████╔╝ ██║ ██║██╔══╝ ██╔══██╗██║██╔═══╝ ██╔═══╝ ██████╔╝███████╗██║ ██║██║██║ ███████╗ ╚═════╝ ╚══════╝╚═╝ ╚═╝╚═╝╚═╝ ╚══════╝

deRIP2 scans aligned sequences for evidence of un-RIP'd precursor states, allowing for improved RIP-correction across large repeat families in which members are independently RIP'd.

Use deRIP2 to:

  • Predict ancestral fungal transposon sequences by correcting for RIP-like mutations (CpA --> TpA) and cytosine deamination (C --> T) events.

  • Mask RIP or deamination events as ambiguous bases to remove RIP signal from phylogenetic analyses.

Table of contents

Installation

Install from PyPi.

bash pip install derip2

Pip install latest development version from GitHub.

bash pip install git+https://github.com/Adamtaranto/deRIP2.git

Test installation.

```bash

Print version number and exit.

derip2 --version

Get usage information

derip2 --help ```

Setup Development Environment

If you want to contribute to the project or run the latest development version, you can clone the repository and install the package in editable mode.

```bash

Clone repository

git clone https://github.com/Adamtaranto/deRIP2.git && cd deRIP2

Create virtual environment

conda env create -f environment.yml

Activate environment

conda activate derip2-dev

Install package in editable mode

pip install -e '.[dev]' ```

Example usage

For aligned sequences in 'mintest.fa':

  • Any column with >= 70% gap positions will not be corrected and a gap inserted in corrected sequence.
  • Bases in column must be >= 80% C/T or G/A
  • At least 50% bases in a column must be in RIP dinucleotide context (C/T as CpA / TpA) for correction.
  • Default: Inherit all remaining uncorrected positions from the least RIP'd sequence.
  • Mask all substrate and product motifs from corrected columns as ambiguous bases (i.e. CpA to TpA --> YpA)

Basic usage with masking

bash derip2 -i tests/data/mintest.fa \ --max-gaps 0.7 \ --max-snp-noise 0.2 \ --min-rip-like 0.5 \ --mask \ -d results \ --prefix derip_output

Output:

  • results/derip_output.fasta - Corrected sequence
  • results/derip_output_alignment.fasta - Alignment with masked corrections
  • results/derip_output_masked_alignment.fasta - Alignment with masked corrections

With vizualization

The --plot option will create a visualization of the alignment with RIP markup. The --plot-rip-type option can be used to specify the type of RIP events to be displayed in the alignment visualization product, substrate, or both.

bash derip2 -i tests/data/mintest.fa \ --max-gaps 0.7 \ --max-snp-noise 0.2 \ --min-rip-like 0.5 \ --plot \ --plot-rip-type both \ -d results \ --prefix derip_output

Output:

  • results/derip_output.fasta - Corrected sequence
  • results/derip_output_masked_alignment.fasta - Alignment with masked corrections
  • results/derip_output_visualization.png - Visualization of the alignment with RIP markup

Visualization of the alignment with RIP markup

Using maximum GC content for filling

By default uncorrected positions in the output sequence are filled from the sequence with the lowest RIP count. If the --fill-max-gc option is set, remaining positions are filled from the sequence with the highest G/C content sequence instead.

bash derip2 -i tests/data/mintest.fa \ --max-gaps 0.7 \ --max-snp-noise 0.2 \ --min-rip-like 0.5 \ --fill-max-gc \ -d results \ --prefix derip_gc_filled

Alternatively, the --fill-index option can be used to force selection of alignment row to fill uncorrected positions from by row index number (indexed from 0). Note: This will override the --fill-max-gc option.

Correcting all deamination events

If the --reaminate option is set, all deamination events will be corrected, regardless of RIP context.

--plot-rip-type product is used to highlight the product of RIP events in the visualization. Non-RIP deamination events are also highlighted.

bash derip2 -i tests/data/mintest.fa \ --max-gaps 0.7 \ --reaminate \ -d results \ --plot \ --plot-rip-type product \ --prefix derip_reaminated

Output:

  • results/derip_reaminated.fasta - Corrected sequence using highest GC content sequence for filling
  • results/derip_reaminated_alignment.fasta - Alignment with corrected sequence appended
  • results/derip_reaminated_vizualization.png - Visualization of the alignment with RIP markup

Visualization of the alignment with RIP markup

Standard Options

code --version Show the version and exit. -i, --input TEXT Multiple sequence alignment. [required] -g, --max-gaps FLOAT Maximum proportion of gapped positions in column to be tolerated before forcing a gap in final deRIP sequence. [default: 0.7] -a, --reaminate Correct all deamination events independent of RIP context. --max-snp-noise FLOAT Maximum proportion of conflicting SNPs permitted before excluding column from RIP/deamination assessment. i.e. By default a column with >= 0.5 'C/T' bases will have 'TpA' positions logged as RIP events. [default: 0.5] --min-rip-like FLOAT Minimum proportion of deamination events in RIP context (5' CpA 3' --> 5' TpA 3') required for column to deRIP'd in final sequence. Note: If 'reaminate' option is set all deamination events will be corrected. [default: 0.1] --fill-max-gc By default uncorrected positions in the output sequence are filled from the sequence with the lowest RIP count. If this option is set remaining positions are filled from the sequence with the highest G/C content. --fill-index INTEGER Force selection of alignment row to fill uncorrected positions from by row index number (indexed from 0). Note: Will override '--fill-max-gc' option. --mask Mask corrected positions in alignment with degenerate IUPAC codes. --no-append If set, do not append deRIP'd sequence to output alignment. -d, --out-dir TEXT Directory for deRIP'd sequence files to be written to. -p, --prefix TEXT Prefix for output files. Output files will be named prefix.fasta, prefix_alignment.fasta, etc. [default: deRIPseq] --plot Create a visualization of the alignment with RIP markup. --plot-rip-type [both|product|substrate] Specify the type of RIP events to be displayed in the alignment visualization. [default: both] --loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL] Set logging level. [default: INFO] --logfile TEXT Log file path. -h, --help Show this message and exit.

Algorithm overview

For each column in input alignment:

  • Check if number of gapped rows is greater than max gap proportion. If true, then a gap is added to the output sequence.
  • Set invariant column values in output sequence.
  • If at least X proportion of bases are C/T or G/A (i.e. max-snp-noise = 0.4, then at least 0.6 of positions in column must be C/T or G/A).
  • If reaminate option is set then revert T-->C or A-->G.
  • If reaminate is not set then check for number of positions in RIP dinucleotide context (C/TpA or TpG/A).
  • If proportion of positions in column in RIP-like context => min-rip-like threshold, AND at least one substrate and one product motif (i.e. CpA and TpA) is present, perform RIP correction in output sequence.
  • For all remaining positions in output sequence (not filled by gap, reaminate, or RIP-correction) inherit sequence from input sequence with the fewest observed RIP events (or greatest GC content if RIP is not detected or multiple sequences sharing min-RIP count).

Issues

Submit feedback to the Issue Tracker

License

Software provided under MIT license.

Owner

  • Name: Adam Taranto
  • Login: Adamtaranto
  • Kind: user
  • Location: Melbourne, Australia
  • Company: The University of Melbourne

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "deRIP2"
version: 0.4.0
date-released: 2025-04-04
authors:
  - family-names: Taranto
    given-names: Adam
    orcid: https://orcid.org/0000-0003-4759-3475
    affiliation: "The University of Melbourne"
repository-code: "https://github.com/Adamtaranto/derip2"
license: MIT
abstract: >-
  deRIP2 analyzes DNA sequence alignments to detect and correct
  Repeat-Induced Point mutations (RIP) in fungal genomes. The tool
  identifies C→T transitions in RIP context and reconstructs the
  ancestral (pre-RIP) sequence, providing visualizations of the
  corrected mutations.
keywords:
  - genomics
  - fungi
  - repeat-induced-point-mutation
  - RIP
  - bioinformatics
preferred-citation:
  type: software
  authors:
    - family-names: Taranto
      given-names: Adam
      orcid: https://orcid.org/0000-0003-4759-3475
      affiliation: "The University of Melbourne"
  title: "deRIP2: A tool for detecting and correcting RIP mutations in fungal DNA sequences"
  year: 2025
  url: "https://github.com/Adamtaranto/derip2"
  repository-code: "https://github.com/Adamtaranto/derip2"
  # doi: TBA

GitHub Events

Total
  • Create event: 11
  • Release event: 3
  • Issues event: 9
  • Delete event: 4
  • Issue comment event: 6
  • Push event: 64
  • Pull request event: 8
Last Year
  • Create event: 11
  • Release event: 3
  • Issues event: 9
  • Delete event: 4
  • Issue comment event: 6
  • Push event: 64
  • Pull request event: 8

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 15
  • Total pull requests: 8
  • Average time to close issues: 4 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.53
  • Average comments per pull request: 0.5
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 4
  • Average time to close issues: 5 days
  • Average time to close pull requests: about 1 month
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.6
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Adamtaranto (15)
Pull Request Authors
  • Adamtaranto (12)
Top Labels
Issue Labels
enhancement (5) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 36 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 8
  • Total maintainers: 1
pypi.org: derip2

Predict ancestral sequence of fungal repeat elements by correcting for RIP-like mutations in multi-sequence DNA alignments.

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 36 Last month
Rankings
Dependent packages count: 9.9%
Dependent repos count: 21.8%
Forks count: 29.8%
Average: 31.3%
Stargazers count: 38.9%
Downloads: 56.1%
Maintainers (1)
Last synced: 6 months ago