derip2
Reconstruct ancestral state sequences of fungal repeat families by correcting for RIP-like mutations. Mask RIP or deamination events from alignments.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary
Keywords
Repository
Reconstruct ancestral state sequences of fungal repeat families by correcting for RIP-like mutations. Mask RIP or deamination events from alignments.
Basic Info
- Host: GitHub
- Owner: Adamtaranto
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://adamtaranto.github.io/deRIP2/
- Size: 1.51 MB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 3
- Releases: 4
Topics
Metadata Files
README.md
code
██████╗ ███████╗██████╗ ██╗██████╗ ██████╗
██╔══██╗██╔════╝██╔══██╗██║██╔══██╗╚════██╗
██║ ██║█████╗ ██████╔╝██║██████╔╝ █████╔╝
██║ ██║██╔══╝ ██╔══██╗██║██╔═══╝ ██╔═══╝
██████╔╝███████╗██║ ██║██║██║ ███████╗
╚═════╝ ╚══════╝╚═╝ ╚═╝╚═╝╚═╝ ╚══════╝
deRIP2 scans aligned sequences for evidence of un-RIP'd precursor states, allowing for improved RIP-correction across large repeat families in which members are independently RIP'd.
Use deRIP2 to:
Predict ancestral fungal transposon sequences by correcting for RIP-like mutations (CpA --> TpA) and cytosine deamination (C --> T) events.
Mask RIP or deamination events as ambiguous bases to remove RIP signal from phylogenetic analyses.
Table of contents
Installation
Install from PyPi.
bash
pip install derip2
Pip install latest development version from GitHub.
bash
pip install git+https://github.com/Adamtaranto/deRIP2.git
Test installation.
```bash
Print version number and exit.
derip2 --version
Get usage information
derip2 --help ```
Setup Development Environment
If you want to contribute to the project or run the latest development version, you can clone the repository and install the package in editable mode.
```bash
Clone repository
git clone https://github.com/Adamtaranto/deRIP2.git && cd deRIP2
Create virtual environment
conda env create -f environment.yml
Activate environment
conda activate derip2-dev
Install package in editable mode
pip install -e '.[dev]' ```
Example usage
For aligned sequences in 'mintest.fa':
- Any column with >= 70% gap positions will not be corrected and a gap inserted in corrected sequence.
- Bases in column must be >= 80% C/T or G/A
- At least 50% bases in a column must be in RIP dinucleotide context (C/T as CpA / TpA) for correction.
- Default: Inherit all remaining uncorrected positions from the least RIP'd sequence.
- Mask all substrate and product motifs from corrected columns as ambiguous bases (i.e. CpA to TpA --> YpA)
Basic usage with masking
bash
derip2 -i tests/data/mintest.fa \
--max-gaps 0.7 \
--max-snp-noise 0.2 \
--min-rip-like 0.5 \
--mask \
-d results \
--prefix derip_output
Output:
results/derip_output.fasta- Corrected sequenceresults/derip_output_alignment.fasta- Alignment with masked correctionsresults/derip_output_masked_alignment.fasta- Alignment with masked corrections
With vizualization
The --plot option will create a visualization of the alignment with RIP markup. The --plot-rip-type option can be used to specify the type of RIP events to be displayed in the alignment visualization product, substrate, or both.
bash
derip2 -i tests/data/mintest.fa \
--max-gaps 0.7 \
--max-snp-noise 0.2 \
--min-rip-like 0.5 \
--plot \
--plot-rip-type both \
-d results \
--prefix derip_output
Output:
results/derip_output.fasta- Corrected sequenceresults/derip_output_masked_alignment.fasta- Alignment with masked correctionsresults/derip_output_visualization.png- Visualization of the alignment with RIP markup

Using maximum GC content for filling
By default uncorrected positions in the output sequence are filled from the sequence with the lowest RIP count. If the --fill-max-gc option is set, remaining positions are filled from the sequence with the highest G/C content sequence instead.
bash
derip2 -i tests/data/mintest.fa \
--max-gaps 0.7 \
--max-snp-noise 0.2 \
--min-rip-like 0.5 \
--fill-max-gc \
-d results \
--prefix derip_gc_filled
Alternatively, the --fill-index option can be used to force selection of alignment row to fill uncorrected positions from by row index number (indexed from 0). Note: This will override the --fill-max-gc option.
Correcting all deamination events
If the --reaminate option is set, all deamination events will be corrected, regardless of RIP context.
--plot-rip-type product is used to highlight the product of RIP events in the visualization.
Non-RIP deamination events are also highlighted.
bash
derip2 -i tests/data/mintest.fa \
--max-gaps 0.7 \
--reaminate \
-d results \
--plot \
--plot-rip-type product \
--prefix derip_reaminated
Output:
results/derip_reaminated.fasta- Corrected sequence using highest GC content sequence for fillingresults/derip_reaminated_alignment.fasta- Alignment with corrected sequence appendedresults/derip_reaminated_vizualization.png- Visualization of the alignment with RIP markup

Standard Options
code
--version Show the version and exit.
-i, --input TEXT Multiple sequence alignment. [required]
-g, --max-gaps FLOAT Maximum proportion of gapped positions in
column to be tolerated before forcing a gap
in final deRIP sequence. [default: 0.7]
-a, --reaminate Correct all deamination events independent
of RIP context.
--max-snp-noise FLOAT Maximum proportion of conflicting SNPs
permitted before excluding column from
RIP/deamination assessment. i.e. By default
a column with >= 0.5 'C/T' bases will have
'TpA' positions logged as RIP events.
[default: 0.5]
--min-rip-like FLOAT Minimum proportion of deamination events in
RIP context (5' CpA 3' --> 5' TpA 3')
required for column to deRIP'd in final
sequence. Note: If 'reaminate' option is set
all deamination events will be corrected.
[default: 0.1]
--fill-max-gc By default uncorrected positions in the
output sequence are filled from the sequence
with the lowest RIP count. If this option is
set remaining positions are filled from the
sequence with the highest G/C content.
--fill-index INTEGER Force selection of alignment row to fill
uncorrected positions from by row index
number (indexed from 0). Note: Will override
'--fill-max-gc' option.
--mask Mask corrected positions in alignment with
degenerate IUPAC codes.
--no-append If set, do not append deRIP'd sequence to
output alignment.
-d, --out-dir TEXT Directory for deRIP'd sequence files to be
written to.
-p, --prefix TEXT Prefix for output files. Output files will
be named prefix.fasta,
prefix_alignment.fasta, etc. [default:
deRIPseq]
--plot Create a visualization of the alignment with
RIP markup.
--plot-rip-type [both|product|substrate]
Specify the type of RIP events to be
displayed in the alignment visualization.
[default: both]
--loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL]
Set logging level. [default: INFO]
--logfile TEXT Log file path.
-h, --help Show this message and exit.
Algorithm overview
For each column in input alignment:
- Check if number of gapped rows is greater than max gap proportion. If true, then a gap is added to the output sequence.
- Set invariant column values in output sequence.
- If at least X proportion of bases are C/T or G/A (i.e.
max-snp-noise= 0.4, then at least 0.6 of positions in column must be C/T or G/A). - If reaminate option is set then revert T-->C or A-->G.
- If reaminate is not set then check for number of positions in RIP dinucleotide context (C/TpA or TpG/A).
- If proportion of positions in column in RIP-like context =>
min-rip-likethreshold, AND at least one substrate and one product motif (i.e. CpA and TpA) is present, perform RIP correction in output sequence. - For all remaining positions in output sequence (not filled by gap, reaminate, or RIP-correction) inherit sequence from input sequence with the fewest observed RIP events (or greatest GC content if RIP is not detected or multiple sequences sharing min-RIP count).
Issues
Submit feedback to the Issue Tracker
License
Software provided under MIT license.
Owner
- Name: Adam Taranto
- Login: Adamtaranto
- Kind: user
- Location: Melbourne, Australia
- Company: The University of Melbourne
- Repositories: 38
- Profile: https://github.com/Adamtaranto
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "deRIP2"
version: 0.4.0
date-released: 2025-04-04
authors:
- family-names: Taranto
given-names: Adam
orcid: https://orcid.org/0000-0003-4759-3475
affiliation: "The University of Melbourne"
repository-code: "https://github.com/Adamtaranto/derip2"
license: MIT
abstract: >-
deRIP2 analyzes DNA sequence alignments to detect and correct
Repeat-Induced Point mutations (RIP) in fungal genomes. The tool
identifies C→T transitions in RIP context and reconstructs the
ancestral (pre-RIP) sequence, providing visualizations of the
corrected mutations.
keywords:
- genomics
- fungi
- repeat-induced-point-mutation
- RIP
- bioinformatics
preferred-citation:
type: software
authors:
- family-names: Taranto
given-names: Adam
orcid: https://orcid.org/0000-0003-4759-3475
affiliation: "The University of Melbourne"
title: "deRIP2: A tool for detecting and correcting RIP mutations in fungal DNA sequences"
year: 2025
url: "https://github.com/Adamtaranto/derip2"
repository-code: "https://github.com/Adamtaranto/derip2"
# doi: TBA
GitHub Events
Total
- Create event: 11
- Release event: 3
- Issues event: 9
- Delete event: 4
- Issue comment event: 6
- Push event: 64
- Pull request event: 8
Last Year
- Create event: 11
- Release event: 3
- Issues event: 9
- Delete event: 4
- Issue comment event: 6
- Push event: 64
- Pull request event: 8
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 15
- Total pull requests: 8
- Average time to close issues: 4 months
- Average time to close pull requests: about 2 months
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.53
- Average comments per pull request: 0.5
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 4
- Average time to close issues: 5 days
- Average time to close pull requests: about 1 month
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.6
- Average comments per pull request: 1.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Adamtaranto (15)
Pull Request Authors
- Adamtaranto (12)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 36 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 8
- Total maintainers: 1
pypi.org: derip2
Predict ancestral sequence of fungal repeat elements by correcting for RIP-like mutations in multi-sequence DNA alignments.
- Documentation: https://derip2.readthedocs.io/
- License: MIT
-
Latest release: 0.4.0
published 11 months ago