doubletrouble

An R package to identify and classify duplicated genes from whole-genome protein sequence data

https://github.com/almeidasilvaf/doubletrouble

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary

Keywords

bioinformatics comparative-genomics gene-duplication molecular-evolution rstats whole-genome-duplication
Last synced: 6 months ago · JSON representation

Repository

An R package to identify and classify duplicated genes from whole-genome protein sequence data

Basic Info
Statistics
  • Stars: 27
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics comparative-genomics gene-duplication molecular-evolution rstats whole-genome-duplication
Created over 3 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Contributing Code of conduct Support

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-",
    out.width = "100%"
)
```

# doubletrouble 


[![GitHub issues](https://img.shields.io/github/issues/almeidasilvaf/doubletrouble)](https://github.com/almeidasilvaf/doubletrouble/issues)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![R-CMD-check-bioc](https://github.com/almeidasilvaf/doubletrouble/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/almeidasilvaf/doubletrouble/actions)
[![Codecov test
coverage](https://codecov.io/gh/almeidasilvaf/doubletrouble/branch/devel/graph/badge.svg)](https://codecov.io/gh/almeidasilvaf/doubletrouble?branch=devel)


The major goal of __doubletrouble__ is to identify duplicated genes from
whole-genome protein sequences and classify them based on their modes
of duplication. Duplicates can be classified using four different 
classification schemes, which increase the complexity and level of details
in a stepwise manner. The classification schemes and the duplication modes
they can classify are:


| Scheme   | Duplication modes           |
|:---------|:----------------------------|
| binary   | SD, SSD                     |
| standard | SD, TD, PD, DD              |
| extended | SD, TD, PD, TRD, DD         |
| full     | SD, TD, PD, rTRD, dTRD, DD  |

*Legend:* **SD**, segmental duplication. **SSD**, small-scale duplication.
**TD**, tandem duplication. **PD**, proximal duplication. 
**TRD**, transposon-derived duplication. 
**rTRD**, retrotransposon-derived duplication.
**dTRD**, DNA transposon-derived duplication. **DD**, dispersed duplication.


Besides classifying gene pairs, users can also classify genes, so that
each gene is assigned to a unique mode of duplication.

Users can also calculate substitution rates per substitution site (i.e., 
$K_a$, $K_s$ and their ratios $\frac{K_a}{K_s}$) from duplicate pairs, 
find peaks in Ks distributions with Gaussian Mixture Models (GMMs), 
and classify gene pairs into age groups based on Ks peaks.

## Installation instructions

Get the latest stable `R` release from [CRAN](http://cran.r-project.org/). 
Then install __doubletrouble__ from [Bioconductor](http://bioconductor.org/) 
using the following code:

```{r 'install', eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("doubletrouble")
```

And the development version from [GitHub](https://github.com/almeidasilvaf/doubletrouble) with:

```{r 'install_dev', eval = FALSE}
BiocManager::install("almeidasilvaf/doubletrouble")
```

## Citation

Below is the citation output from using `citation('doubletrouble')` in R. Please
run this yourself to check for any updates on how to cite __doubletrouble__.

```{r 'citation', eval = requireNamespace('doubletrouble')}
print(citation('doubletrouble'), bibtex = TRUE)
```

Please note that the __doubletrouble__ was only made possible thanks to many other R and bioinformatics software authors, which are cited either in the vignettes and/or the paper(s) describing this package.

## Code of Conduct

Please note that the __doubletrouble__ project is released with 
a [Contributor Code of Conduct](http://bioconductor.org/about/code-of-conduct/). 
By contributing to this project, you agree to abide by its terms.

## Development tools

* Continuous code testing is possible thanks to [GitHub actions](https://www.tidyverse.org/blog/2020/04/usethis-1-6-0/)  through `r BiocStyle::CRANpkg('usethis')`, `r BiocStyle::CRANpkg('remotes')`, and `r BiocStyle::CRANpkg('rcmdcheck')` customized to use [Bioconductor's docker containers](https://www.bioconductor.org/help/docker/) and `r BiocStyle::Biocpkg('BiocCheck')`.
* Code coverage assessment is possible thanks to [codecov](https://codecov.io/gh) and `r BiocStyle::CRANpkg('covr')`.
* The [documentation website](http://almeidasilvaf.github.io/doubletrouble) is automatically updated thanks to `r BiocStyle::CRANpkg('pkgdown')`.
* The code is styled automatically thanks to `r BiocStyle::CRANpkg('styler')`.
* The documentation is formatted thanks to `r BiocStyle::CRANpkg('devtools')` and `r BiocStyle::CRANpkg('roxygen2')`.

For more details, check the `dev` directory.

This package was developed using `r BiocStyle::Biocpkg('biocthis')`.


Owner

  • Name: Fabrício Almeida-Silva
  • Login: almeidasilvaf
  • Kind: user
  • Location: Technologiepark 71, Ghent, Belgium
  • Company: VIB-UGent Center for Plant Systems Biology

Bioinformatics. Network biology. Plant genomics and evolution. #rstats addict.

GitHub Events

Total
  • Issues event: 6
  • Watch event: 15
  • Issue comment event: 31
  • Push event: 8
  • Create event: 1
Last Year
  • Issues event: 6
  • Watch event: 15
  • Issue comment event: 31
  • Push event: 8
  • Create event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 61
  • Total Committers: 1
  • Avg Commits per committer: 61.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 12
  • Committers: 1
  • Avg Commits per committer: 12.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
almeidasilvaf f****a@h****m 61

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 10
  • Total pull requests: 1
  • Average time to close issues: 8 days
  • Average time to close pull requests: 8 minutes
  • Total issue authors: 8
  • Total pull request authors: 1
  • Average comments per issue: 6.7
  • Average comments per pull request: 2.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 7
  • Pull requests: 0
  • Average time to close issues: 13 days
  • Average time to close pull requests: N/A
  • Issue authors: 7
  • Pull request authors: 0
  • Average comments per issue: 7.57
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ilaydagulmez (3)
  • sjfleck (1)
  • shun-oit (1)
  • WWz33 (1)
  • Chenglin20170390 (1)
  • Nieto-CaballeroVE (1)
  • Deeptivarshney (1)
  • veronneaupy (1)
Pull Request Authors
  • hpages (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 6,295 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 12
  • Total maintainers: 1
bioconductor.org: doubletrouble

Identification and classification of duplicated genes

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 6,295 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 31.9%
Downloads: 95.6%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.2.0 depends
  • Biostrings * imports
  • GenomicRanges * imports
  • MSA2dist >= 1.1.4 imports
  • ggplot2 * imports
  • mclust * imports
  • stats * imports
  • syntenet * imports
  • utils * imports
  • BiocStyle * suggests
  • covr * suggests
  • feature * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • sessioninfo * suggests
  • testthat >= 3.0.0 suggests
.github/workflows/check-bioc.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/upload-artifact v2 composite
  • docker/build-push-action v1 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite