https://github.com/bioconductor-source/seqarchr
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: bioconductor-source
- License: gpl-3.0
- Language: R
- Default Branch: devel
- Size: 2.95 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
seqArchR
Note: This package is currently under development. So, please bear with me while I put the final blocks together. Thanks for your understanding!
seqArchR is an unsupervised, non-negative matrix factorization (NMF)-based algorithm for discovery of sequence architectures de novo. Below is a schematic of seqArchR's algorithm.

Installation
Python scikit-learn dependency
This package requires the Python module scikit-learn. Please see installation instructions here.
To install this package, use
```r
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::installgithub("snikumbh/seqArchR", buildvignettes = FALSE) ```
Usage
```r
load package
library(seqArchR) library(Biostrings)
Creation of one-hot encoded data matrix from FASTA file
You can use your own FASTA file instead
inputFastaFilename <- system.file("extdata", "example_data.fa", package = "seqArchR", mustWork = TRUE)
Specifying dinuc generates dinucleotide features
inputSeqsMat <- seqArchR::preparedatafromFASTA(inputFastaFilename, sinucor_dinuc = "dinuc")
inputSeqsRaw <- seqArchR::preparedatafromFASTA(inputFastaFilename, rawseq = TRUE)
nSeqs <- length(inputSeqsRaw) positions <- seq(1, Biostrings::width(inputSeqsRaw[1]))
Set seqArchR configuration
Most arguments have default values
seqArchRconfig <- seqArchR::setconfig( parallelize = TRUE, ncores = 2, nruns = 100, kmin = 1, kmax = 20, modseltype = "stability", bound = 10^-6, chunksize = 100, resultaggl = "ward.D", resultdist = "euclid", flags = list(debug = FALSE, time = TRUE, verbose = TRUE, plot = FALSE) )
Call/Run seqArchR
seqArchRresult <- seqArchR::seqArchR(config = seqArchRconfig, seqsohemat = inputSeqsMat, seqsraw = inputSeqsRaw, seqspos = positions, totalitr = 2, setocollation = c(TRUE, FALSE))
```
Contact
Comments, suggestions, enquiries/requests are welcome! Feel free to email sarvesh.nikumbh@gmail.com or create an new issue
Owner
- Name: (WIP DEV) Bioconductor Packages
- Login: bioconductor-source
- Kind: organization
- Email: maintainer@bioconductor.org
- Website: https://bioconductor.org
- Repositories: 1
- Profile: https://github.com/bioconductor-source
Source code for packages accepted into Bioconductor