repeatfinder

fast code for searching for direct and indirect repeats in DNA sequences.

https://github.com/linsalrob/repeatfinder

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

fast code for searching for direct and indirect repeats in DNA sequences.

Basic Info
  • Host: GitHub
  • Owner: linsalrob
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 102 KB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 1
  • Open Issues: 1
  • Releases: 15
Created about 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Edwards Lab License: MIT GitHub language count DOI

Repeat Finder - Finding Repeats in DNA sequences

Repeatfinder is a stand alone program to quickly find repeats in DNA sequences. You might find that it is remarkably similar to an essential module in PhiSpy.

How to find repeats

Using the command line

You can use the pydna_repeatfinder command to find repeats in a fasta sequence.

By default, pydna_repeatfinder just prints the repeats in a simple tab-separated format that is easy to read and includes the DNA sequence.

For example:

$ pydna_repeatfinder -f tests/test_long.fasta random_sequence Number:1 Len1:52 Len2:52 92 144 1946 1998 TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT random_sequence Number:2 Len1:52 Len2:52 92 144 197 145 TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT CACCGTATACTCCATACTTCCGCGATACCCTTAGGCTAACCTACGACAACTG random_sequence Number:3 Len1:53 Len2:53 145 198 1998 1945 CACCGTATACTCCATACTTCCGCGATACCCTTAGGCTAACCTACGACAACTGA ATCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT

There are two long repeats here. The first from 92-144 is repeated in the same orientation (a direct repeat) at position 1946-1998.

We can also output the results formated so you can paste them directly into a GenBank file. This is perhaps the easiest way to visualise the repeats.

$ pydna_repeatfinder -f tests/test_long.fasta -o genbank repeat_region join(92..144,1946..1998) /note="direct repeat number 1 of length 53" /rpt_type="direct" repeat_region join(92..144,complement(197..145)) /note="inverted repeat number 2 of length 53" /rpt_type="inverted" repeat_region join(145..198,complement(1998..1945)) /note="inverted repeat number 3 of length 54" /rpt_type="inverted"

In your own code

You can import the pydna_repeatfinder module and use it in your own code:

``` from PyRepeatFinder import find_repeats

r = findrepeats(dnaseq, gaplen, minlen, 0) for rpt in r: # rpt is a dictionary with keys: # repeatnumber # firststart # firstend # secondstart # second_end ```

We are happy to add more output formats, please post a GitHub issue and tag it as an enhancement.

Installing pydna_repeatfinder

You should install it using bioconda:

mamba create -n pydna_repeatfinder -c bioconda pydna_repeatfinder mamba activate pydna_repeatfinder

Citing pydna_repeatfinder

Please see the citation file.

Owner

  • Name: Rob Edwards
  • Login: linsalrob
  • Kind: user
  • Location: Adelaide, Australia
  • Company: Flinders University

Professor of CS and Biology Writing bioinformatics code to study viruses, phages, and metagenomes.

Citation (citation.cff)

cff-version: 1.2.0
title: pydna_repeatfinder
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Robert
    name-particle: Robert
    family-names: Edwards
    email: raedwards@gmail.com
    affiliation: Flinders University
    orcid: 'https://orcid.org/0000-0001-8383-8949'
identifiers:
  - type: doi
    value: 10.5281/zenodo.11565326
    description: Zenodo repository of release 0.2.9
repository-code: 'https://github.com/linsalrob/repeat_finder'
abstract: >-
  pydna_repeatfinder is a module and C++ code for finding exact
  and inexact repeats in DNA sequences.
keywords:
  - DNA sequencing
  - microbiome
  - bacteria
  - virus
license: MIT

GitHub Events

Total
  • Watch event: 1
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Watch event: 1
  • Pull request event: 1
  • Fork event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • rizo (2)
Top Labels
Issue Labels
Pull Request Labels