cubar

R Package for Codon Usage Bias Analysis. Comprehensive documentation and tutorials are available at:

https://github.com/mt1022/cubar

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary

Keywords

bioinformatics codon-usage machine-learning r-package sequence-analysis
Last synced: 6 months ago · JSON representation

Repository

R Package for Codon Usage Bias Analysis. Comprehensive documentation and tutorials are available at:

Basic Info
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 2
  • Open Issues: 1
  • Releases: 8
Topics
bioinformatics codon-usage machine-learning r-package sequence-analysis
Created almost 5 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License

README.md

cubar

Comprehensive Codon Usage Bias Analysis in R

CRAN status DOI Lifecycle: stable

Table of Contents

Overview

Codon usage bias refers to the non-uniform usage of synonymous codons (codons that encode the same amino acid) across different organisms, genes, and functional categories. cubar is a comprehensive R package for analyzing codon usage bias in coding sequences. It provides a unified framework for calculating established codon usage metrics, conducting sliding-window analyses or differential usage analyses, and optimizing sequences for heterologous expression.

Features

🧬 Codon-Level Analysis

  • RSCU calculation: Relative synonymous codon usage analysis
  • Amino acid usage: Frequency of each amino acid in sequences
  • Codon weights: Calculate weights based on gene expression, tRNA availability, and mRNA stability
  • Optimal codon inference: Machine learning-based identification of optimal codons
  • Codon-anticodon visualization: Visualization of codon-tRNA pairing relationships

📊 Gene-Level Metrics

  • Codon frequency tabulation: Count codon occurrences across sequences
  • CAI (Codon Adaptation Index): Measure similarity to highly expressed genes
  • ENC (Effective Number of Codons): Assess codon usage bias strength
  • Fop (Fraction of Optimal codons): Calculate proportion of optimal codons
  • tAI (tRNA Adaptation Index): Match codon usage to tRNA availability
  • CSCg (Codon Stabilization Coefficients): Quantify mRNA stability effects
  • Dp (Deviation from Proportionality): Analyze virus-host codon usage relationships
  • GC content metrics: Overall GC, GC3s (3rd codon positions), GC4d (4-fold degenerate sites)

🛠️ Utilities & Tools

  • Sliding window analysis: Positional codon usage patterns within genes
  • Sequence optimization: Redesign sequences for optimal expression
  • Differential codon usage: Statistical comparison between sequence sets
  • Quality control: Comprehensive CDS validation and preprocessing

Why Choose cubar?

  • 🚀 High Performance: Process large datasets (>100,000 sequences) efficiently using optimized Biostrings and data.table backends
  • 🧬 Flexible Genetic Codes: Support for all NCBI genetic codes plus custom genetic code tables
  • 🔗 R Ecosystem Integration: Seamlessly integrate with other bioinformatics and data analysis packages
  • 📚 Comprehensive Documentation: Extensive tutorials, examples, and theoretical background
  • 🔬 Research Ready: Implements established metrics with proper citations and validation

Installation

Stable Release (Recommended)

Install the latest stable version from CRAN:

r install.packages("cubar")

Development Version

Install the latest development version from GitHub:

```r

Install devtools if not already installed

if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") }

Install cubar from GitHub

devtools::install_github("mt1022/cubar", dependencies = TRUE) ```

Dependencies

System Requirements: - R (≥ 4.1.0)

Required Packages: - Biostrings (≥ 2.60.0) - Bioconductor package for sequence manipulation - IRanges (≥ 2.34.0) - Bioconductor infrastructure for range operations
- data.table (≥ 1.14.0) - High-performance data manipulation - ggplot2 (≥ 3.3.5) - Data visualization - rlang (≥ 0.4.11) - Language tools

Note: Bioconductor packages will be installed automatically, but you may need to update your R installation if you encounter compatibility issues.

Documentation & Tutorials

📖 Complete documentation is available within R (?function_name) and on our package website.

🎯 Getting Started

📚 Advanced Topics

Example Workflow

Here's a typical analysis workflow demonstrating key functionality:

```r library(cubar) library(ggplot2)

1. Load and quality-check sequences

data(yeastcds) cleancds <- checkcds(yeastcds)

2. Calculate codon frequencies

codonfreq <- countcodons(clean_cds)

3. Calculate multiple metrics

enc <- getenc(codonfreq) # Effective number of codons gc3s <- getgc3s(codonfreq) # GC content at 3rd positions

4. Analyze highly expressed genes

data(yeastexp) yeastexp <- yeastexp[yeastexp$geneid %in% rownames(codonfreq), ] highexpr <- head(yeastexp[order(-yeastexp$fpkm), ], 500) rscuhigh <- estrscu(codonfreq[highexpr$geneid, ]) cai <- getcai(codonfreq, rscu_high)

5. Visualize results

df <- data.frame(ENC = enc, CAI = cai, GC3s = gc3s) ggplot(df, aes(color = GC3s, x = ENC, y = CAI)) + geompoint(alpha = 0.6) + scalecolorviridisc() + labs(title = "Codon Usage Bias Relationships", x = "Effective Number of Codons", y = "Codon Adaptation Index") ```

🆘 Getting Help

Related Packages

For complementary analysis, consider these R packages:

  • Biostrings - Sequence input/output and manipulation
  • Peptides - Peptide and protein property calculations

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • GitHub Copilot was used to suggest code snippets during development
  • GitHub Education for providing free access to development tools
  • The R and Bioconductor communities for excellent foundational packages
  • Contributors and users who have provided feedback and improvements

**[📚 Documentation](https://mt1022.github.io/cubar/) • [🐛 Report Bug](https://github.com/mt1022/cubar/issues) • [💡 Request Feature](https://github.com/mt1022/cubar/issues)**

Owner

  • Name: Hong Zhang
  • Login: mt1022
  • Kind: user
  • Location: Lanzhou, China
  • Company: Lanzhou University

Evolution & Functional Genomics

GitHub Events

Total
  • Create event: 2
  • Release event: 1
  • Issues event: 3
  • Issue comment event: 2
  • Push event: 24
  • Pull request review comment event: 4
  • Pull request review event: 10
  • Pull request event: 17
Last Year
  • Create event: 2
  • Release event: 1
  • Issues event: 3
  • Issue comment event: 2
  • Push event: 24
  • Pull request review comment event: 4
  • Pull request review event: 10
  • Pull request event: 17

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 167
  • Total Committers: 2
  • Avg Commits per committer: 83.5
  • Development Distribution Score (DDS): 0.012
Past Year
  • Commits: 78
  • Committers: 1
  • Avg Commits per committer: 78.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Hong Zhang m****2 165
mt1022 m****g@g****m 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 6
  • Total pull requests: 12
  • Average time to close issues: 12 days
  • Average time to close pull requests: 6 days
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 3.5
  • Average comments per pull request: 0.08
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 11
  • Average time to close issues: 26 days
  • Average time to close pull requests: 1 day
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.09
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mt1022 (3)
  • avvypaks (1)
  • jhtrujillo (1)
  • trilisser (1)
  • maltesemike (1)
Pull Request Authors
  • creaturemoon (20)
  • ZiBuZiBu (1)
Top Labels
Issue Labels
documentation (1) question (1) enhancement (1) no response (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 270 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
cran.r-project.org: cubar

Codon Usage Bias Analysis

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 270 Last month
Rankings
Stargazers count: 28.2%
Forks count: 28.3%
Dependent packages count: 28.8%
Dependent repos count: 34.5%
Average: 41.7%
Downloads: 88.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • R >= 4.1.0 depends
  • Biostrings >= 2.60.0 imports
  • IRanges >= 2.34.0 imports
  • data.table >= 1.14.0 imports
  • ggplot2 >= 3.3.5 imports
  • rlang >= 0.4.11 imports
  • knitr * suggests
  • rmarkdown * suggests
  • testthat >= 3.0.0 suggests