cubar
R Package for Codon Usage Bias Analysis. Comprehensive documentation and tutorials are available at:
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Keywords
Repository
R Package for Codon Usage Bias Analysis. Comprehensive documentation and tutorials are available at:
Basic Info
- Host: GitHub
- Owner: mt1022
- License: other
- Language: R
- Default Branch: main
- Homepage: https://mt1022.github.io/cubar/
- Size: 22.8 MB
Statistics
- Stars: 6
- Watchers: 1
- Forks: 2
- Open Issues: 1
- Releases: 8
Topics
Metadata Files
README.md
cubar
Comprehensive Codon Usage Bias Analysis in R
Table of Contents
- Overview
- Features
- Why Choose cubar?
- Installation
- Documentation & Tutorials
- Example Workflow
- 🆘 Getting Help
- Related Packages
- License
- Acknowledgments
Overview
Codon usage bias refers to the non-uniform usage of synonymous codons (codons that encode the same amino acid) across different organisms, genes, and functional categories. cubar is a comprehensive R package for analyzing codon usage bias in coding sequences. It provides a unified framework for calculating established codon usage metrics, conducting sliding-window analyses or differential usage analyses, and optimizing sequences for heterologous expression.
Features
🧬 Codon-Level Analysis
- RSCU calculation: Relative synonymous codon usage analysis
- Amino acid usage: Frequency of each amino acid in sequences
- Codon weights: Calculate weights based on gene expression, tRNA availability, and mRNA stability
- Optimal codon inference: Machine learning-based identification of optimal codons
- Codon-anticodon visualization: Visualization of codon-tRNA pairing relationships
📊 Gene-Level Metrics
- Codon frequency tabulation: Count codon occurrences across sequences
- CAI (Codon Adaptation Index): Measure similarity to highly expressed genes
- ENC (Effective Number of Codons): Assess codon usage bias strength
- Fop (Fraction of Optimal codons): Calculate proportion of optimal codons
- tAI (tRNA Adaptation Index): Match codon usage to tRNA availability
- CSCg (Codon Stabilization Coefficients): Quantify mRNA stability effects
- Dp (Deviation from Proportionality): Analyze virus-host codon usage relationships
- GC content metrics: Overall GC, GC3s (3rd codon positions), GC4d (4-fold degenerate sites)
🛠️ Utilities & Tools
- Sliding window analysis: Positional codon usage patterns within genes
- Sequence optimization: Redesign sequences for optimal expression
- Differential codon usage: Statistical comparison between sequence sets
- Quality control: Comprehensive CDS validation and preprocessing
Why Choose cubar?
- 🚀 High Performance: Process large datasets (>100,000 sequences) efficiently using optimized
Biostringsanddata.tablebackends - 🧬 Flexible Genetic Codes: Support for all NCBI genetic codes plus custom genetic code tables
- 🔗 R Ecosystem Integration: Seamlessly integrate with other bioinformatics and data analysis packages
- 📚 Comprehensive Documentation: Extensive tutorials, examples, and theoretical background
- 🔬 Research Ready: Implements established metrics with proper citations and validation
Installation
Stable Release (Recommended)
Install the latest stable version from CRAN:
r
install.packages("cubar")
Development Version
Install the latest development version from GitHub:
```r
Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") }
Install cubar from GitHub
devtools::install_github("mt1022/cubar", dependencies = TRUE) ```
Dependencies
System Requirements: - R (≥ 4.1.0)
Required Packages:
- Biostrings (≥ 2.60.0) - Bioconductor package for sequence manipulation
- IRanges (≥ 2.34.0) - Bioconductor infrastructure for range operations
- data.table (≥ 1.14.0) - High-performance data manipulation
- ggplot2 (≥ 3.3.5) - Data visualization
- rlang (≥ 0.4.11) - Language tools
Note: Bioconductor packages will be installed automatically, but you may need to update your R installation if you encounter compatibility issues.
Documentation & Tutorials
📖 Complete documentation is available within R (?function_name) and on our package website.
🎯 Getting Started
- Introduction to cubar - Basic usage and core functionality
- Non-standard Genetic Codes - Working with alternative genetic codes
- Codon Optimization - Sequence optimization strategies
📚 Advanced Topics
- Mathematical Foundations - Detailed theory behind the metrics
- Function Reference - Complete function documentation
Example Workflow
Here's a typical analysis workflow demonstrating key functionality:
```r library(cubar) library(ggplot2)
1. Load and quality-check sequences
data(yeastcds) cleancds <- checkcds(yeastcds)
2. Calculate codon frequencies
codonfreq <- countcodons(clean_cds)
3. Calculate multiple metrics
enc <- getenc(codonfreq) # Effective number of codons gc3s <- getgc3s(codonfreq) # GC content at 3rd positions
4. Analyze highly expressed genes
data(yeastexp) yeastexp <- yeastexp[yeastexp$geneid %in% rownames(codonfreq), ] highexpr <- head(yeastexp[order(-yeastexp$fpkm), ], 500) rscuhigh <- estrscu(codonfreq[highexpr$geneid, ]) cai <- getcai(codonfreq, rscu_high)
5. Visualize results
df <- data.frame(ENC = enc, CAI = cai, GC3s = gc3s) ggplot(df, aes(color = GC3s, x = ENC, y = CAI)) + geompoint(alpha = 0.6) + scalecolorviridisc() + labs(title = "Codon Usage Bias Relationships", x = "Effective Number of Codons", y = "Codon Adaptation Index") ```
🆘 Getting Help
- 📋 GitHub Issues: Report bugs, request features, or ask questions
- 📖 Documentation: Check function help (
?function_name) and online docs
Related Packages
For complementary analysis, consider these R packages:
- Biostrings - Sequence input/output and manipulation
- Peptides - Peptide and protein property calculations
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- GitHub Copilot was used to suggest code snippets during development
- GitHub Education for providing free access to development tools
- The R and Bioconductor communities for excellent foundational packages
- Contributors and users who have provided feedback and improvements
Owner
- Name: Hong Zhang
- Login: mt1022
- Kind: user
- Location: Lanzhou, China
- Company: Lanzhou University
- Repositories: 2
- Profile: https://github.com/mt1022
Evolution & Functional Genomics
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 3
- Issue comment event: 2
- Push event: 24
- Pull request review comment event: 4
- Pull request review event: 10
- Pull request event: 17
Last Year
- Create event: 2
- Release event: 1
- Issues event: 3
- Issue comment event: 2
- Push event: 24
- Pull request review comment event: 4
- Pull request review event: 10
- Pull request event: 17
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Hong Zhang | m****2 | 165 |
| mt1022 | m****g@g****m | 2 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 6
- Total pull requests: 12
- Average time to close issues: 12 days
- Average time to close pull requests: 6 days
- Total issue authors: 4
- Total pull request authors: 2
- Average comments per issue: 3.5
- Average comments per pull request: 0.08
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 11
- Average time to close issues: 26 days
- Average time to close pull requests: 1 day
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.09
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mt1022 (3)
- avvypaks (1)
- jhtrujillo (1)
- trilisser (1)
- maltesemike (1)
Pull Request Authors
- creaturemoon (20)
- ZiBuZiBu (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 270 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 7
- Total maintainers: 1
cran.r-project.org: cubar
Codon Usage Bias Analysis
- Homepage: https://github.com/mt1022/cubar
- Documentation: http://cran.r-project.org/web/packages/cubar/cubar.pdf
- License: MIT + file LICENSE
-
Latest release: 1.2.0
published 6 months ago
Rankings
Maintainers (1)
Dependencies
- JamesIves/github-pages-deploy-action v4.4.1 composite
- actions/checkout v3 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- R >= 4.1.0 depends
- Biostrings >= 2.60.0 imports
- IRanges >= 2.34.0 imports
- data.table >= 1.14.0 imports
- ggplot2 >= 3.3.5 imports
- rlang >= 0.4.11 imports
- knitr * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests