repeatfinder
fast code for searching for direct and indirect repeats in DNA sequences.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Repository
fast code for searching for direct and indirect repeats in DNA sequences.
Basic Info
- Host: GitHub
- Owner: linsalrob
- License: mit
- Language: Python
- Default Branch: main
- Size: 102 KB
Statistics
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 1
- Releases: 15
Metadata Files
README.md
Repeat Finder - Finding Repeats in DNA sequences
Repeatfinder is a stand alone program to quickly find repeats in DNA sequences. You might find that it is remarkably similar to an essential module in PhiSpy.
How to find repeats
Using the command line
You can use the pydna_repeatfinder command to find repeats in a fasta sequence.
By default, pydna_repeatfinder just prints the repeats in a simple tab-separated format that is easy to read and includes the DNA sequence.
For example:
$ pydna_repeatfinder -f tests/test_long.fasta
random_sequence Number:1 Len1:52 Len2:52 92 144 1946 1998 TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT
random_sequence Number:2 Len1:52 Len2:52 92 144 197 145 TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT CACCGTATACTCCATACTTCCGCGATACCCTTAGGCTAACCTACGACAACTG
random_sequence Number:3 Len1:53 Len2:53 145 198 1998 1945 CACCGTATACTCCATACTTCCGCGATACCCTTAGGCTAACCTACGACAACTGA ATCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT
There are two long repeats here. The first from 92-144 is repeated in the same orientation (a direct repeat) at position 1946-1998.
We can also output the results formated so you can paste them directly into a GenBank file. This is perhaps the easiest way to visualise the repeats.
$ pydna_repeatfinder -f tests/test_long.fasta -o genbank
repeat_region join(92..144,1946..1998)
/note="direct repeat number 1 of length 53"
/rpt_type="direct"
repeat_region join(92..144,complement(197..145))
/note="inverted repeat number 2 of length 53"
/rpt_type="inverted"
repeat_region join(145..198,complement(1998..1945))
/note="inverted repeat number 3 of length 54"
/rpt_type="inverted"
In your own code
You can import the pydna_repeatfinder module and use it in your own code:
``` from PyRepeatFinder import find_repeats
r = findrepeats(dnaseq, gaplen, minlen, 0) for rpt in r: # rpt is a dictionary with keys: # repeatnumber # firststart # firstend # secondstart # second_end ```
We are happy to add more output formats, please post a GitHub issue and tag it as an enhancement.
Installing pydna_repeatfinder
You should install it using bioconda:
mamba create -n pydna_repeatfinder -c bioconda pydna_repeatfinder
mamba activate pydna_repeatfinder
Citing pydna_repeatfinder
Please see the citation file.
Owner
- Name: Rob Edwards
- Login: linsalrob
- Kind: user
- Location: Adelaide, Australia
- Company: Flinders University
- Website: http://edwards.flinders.edu.au/
- Twitter: linsalrob
- Repositories: 31
- Profile: https://github.com/linsalrob
Professor of CS and Biology Writing bioinformatics code to study viruses, phages, and metagenomes.
Citation (citation.cff)
cff-version: 1.2.0
title: pydna_repeatfinder
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Robert
name-particle: Robert
family-names: Edwards
email: raedwards@gmail.com
affiliation: Flinders University
orcid: 'https://orcid.org/0000-0001-8383-8949'
identifiers:
- type: doi
value: 10.5281/zenodo.11565326
description: Zenodo repository of release 0.2.9
repository-code: 'https://github.com/linsalrob/repeat_finder'
abstract: >-
pydna_repeatfinder is a module and C++ code for finding exact
and inexact repeats in DNA sequences.
keywords:
- DNA sequencing
- microbiome
- bacteria
- virus
license: MIT
GitHub Events
Total
- Watch event: 1
- Pull request event: 1
- Fork event: 1
Last Year
- Watch event: 1
- Pull request event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- rizo (2)