seqArchR

seqArchR: Identifying (promoter) sequence architectures de novo using NMF

https://github.com/snikumbh/seqarchr

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.7%) to scientific vocabulary

Keywords

clustering nmf nonnegative-matrix-factorization promoter-sequence-architectures r r-package scikit-learn sequence-analysis sequence-architectures unsupervised-machine-learning
Last synced: 6 months ago · JSON representation

Repository

seqArchR: Identifying (promoter) sequence architectures de novo using NMF

Basic Info
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 0
Topics
clustering nmf nonnegative-matrix-factorization promoter-sequence-architectures r r-package scikit-learn sequence-analysis sequence-architectures unsupervised-machine-learning
Created about 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License

README.md

seqArchR

DOI codecov <!-- bioc badges: start --> Bioc release status Bioc downloads rank Bioc support Bioc history Bioc dependencies <!-- bioc badges: end -->

seqArchR is an unsupervised, non-negative matrix factorization (NMF)-based algorithm for discovery of sequence architectures de novo. Below is a schematic of seqArchR's algorithm.

Installation

Python scikit-learn dependency

This package requires the Python module scikit-learn. Please see installation instructions here.

To install this package, use

```r if (!requireNamespace("remotes", quietly = TRUE)) { install.packages("remotes")
}

remotes::installgithub("snikumbh/seqArchR", buildvignettes = FALSE) ```

Usage

```r

load package

library(seqArchR) library(Biostrings)

Creation of one-hot encoded data matrix from FASTA file

You can use your own FASTA file instead

inputFastaFilename <- system.file("extdata", "example_data.fa", package = "seqArchR", mustWork = TRUE)

Specifying dinuc generates dinucleotide features

inputSeqsMat <- seqArchR::preparedatafromFASTA(inputFastaFilename, sinucor_dinuc = "dinuc")

inputSeqsRaw <- seqArchR::preparedatafromFASTA(inputFastaFilename, rawseq = TRUE)

nSeqs <- length(inputSeqsRaw) positions <- seq(1, Biostrings::width(inputSeqsRaw[1]))

Set seqArchR configuration

Most arguments have default values

seqArchRconfig <- seqArchR::setconfig( parallelize = TRUE, ncores = 2, nruns = 100, kmin = 1, kmax = 20, modseltype = "stability", bound = 10^-6, chunksize = 100, resultaggl = "ward.D", resultdist = "euclid", flags = list(debug = FALSE, time = TRUE, verbose = TRUE, plot = FALSE) )

Call/Run seqArchR

seqArchRresult <- seqArchR::seqArchR(config = seqArchRconfig, seqsohemat = inputSeqsMat, seqsraw = inputSeqsRaw, seqspos = positions, totalitr = 2, setocollation = c(TRUE, FALSE))

```

Contact

Comments, suggestions, enquiries/requests are welcome! Feel free to email sarvesh.nikumbh@gmail.com or create an new issue

Owner

  • Name: Sarvesh Nikumbh
  • Login: snikumbh
  • Kind: user
  • Location: London

Post-Doc@MRC LMS; MPI-INF PhD. Other profile at MPI-GitHub: https://github.molgen.mpg.de/snikumbh

GitHub Events

Total
Last Year

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 127
  • Total Committers: 2
  • Avg Commits per committer: 63.5
  • Development Distribution Score (DDS): 0.016
Past Year
  • Commits: 7
  • Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
snikumbh s****h@g****m 125
Nitesh Turaga n****a@g****m 2

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: 9 days
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • johannesnicolaus (1)
  • snikumbh (1)
  • parsboy66 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 6,615 total
  • Total dependent packages: 1
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
bioconductor.org: seqArchR

Identify Different Architectures of Sequence Elements

  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 6,615 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 30.6%
Downloads: 91.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/pkgdown.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
.github/workflows/test-coverage.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
DESCRIPTION cran
  • R >= 4.2.0 depends
  • BiocParallel * imports
  • Biostrings * imports
  • MASS * imports
  • Matrix * imports
  • cli * imports
  • cluster * imports
  • cvTools >= 0.3.2 imports
  • fpc * imports
  • ggplot2 >= 3.1.1 imports
  • ggseqlogo >= 0.1 imports
  • grDevices * imports
  • graphics * imports
  • matrixStats * imports
  • methods * imports
  • prettyunits * imports
  • reshape2 >= 1.4.3 imports
  • reticulate >= 1.22 imports
  • stats * imports
  • utils * imports
  • BiocStyle * suggests
  • covr * suggests
  • cowplot * suggests
  • hopach >= 2.42.0 suggests
  • knitr >= 1.22 suggests
  • rmarkdown >= 1.12 suggests
  • testthat >= 3.0.2 suggests
  • vdiffr >= 0.3.0 suggests