https://github.com/bioconductor-source/seqarchr

https://github.com/bioconductor-source/seqarchr

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: bioconductor-source
  • License: gpl-3.0
  • Language: R
  • Default Branch: devel
  • Size: 2.95 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License

README.md

seqArchR

DOI codecov R build status <!-- badges: end -->

Note: This package is currently under development. So, please bear with me while I put the final blocks together. Thanks for your understanding!

seqArchR is an unsupervised, non-negative matrix factorization (NMF)-based algorithm for discovery of sequence architectures de novo. Below is a schematic of seqArchR's algorithm.

Installation

Python scikit-learn dependency

This package requires the Python module scikit-learn. Please see installation instructions here.

To install this package, use

```r if (!requireNamespace("remotes", quietly = TRUE)) { install.packages("remotes")
}

remotes::installgithub("snikumbh/seqArchR", buildvignettes = FALSE) ```

Usage

```r

load package

library(seqArchR) library(Biostrings)

Creation of one-hot encoded data matrix from FASTA file

You can use your own FASTA file instead

inputFastaFilename <- system.file("extdata", "example_data.fa", package = "seqArchR", mustWork = TRUE)

Specifying dinuc generates dinucleotide features

inputSeqsMat <- seqArchR::preparedatafromFASTA(inputFastaFilename, sinucor_dinuc = "dinuc")

inputSeqsRaw <- seqArchR::preparedatafromFASTA(inputFastaFilename, rawseq = TRUE)

nSeqs <- length(inputSeqsRaw) positions <- seq(1, Biostrings::width(inputSeqsRaw[1]))

Set seqArchR configuration

Most arguments have default values

seqArchRconfig <- seqArchR::setconfig( parallelize = TRUE, ncores = 2, nruns = 100, kmin = 1, kmax = 20, modseltype = "stability", bound = 10^-6, chunksize = 100, resultaggl = "ward.D", resultdist = "euclid", flags = list(debug = FALSE, time = TRUE, verbose = TRUE, plot = FALSE) )

Call/Run seqArchR

seqArchRresult <- seqArchR::seqArchR(config = seqArchRconfig, seqsohemat = inputSeqsMat, seqsraw = inputSeqsRaw, seqspos = positions, totalitr = 2, setocollation = c(TRUE, FALSE))

```

Contact

Comments, suggestions, enquiries/requests are welcome! Feel free to email sarvesh.nikumbh@gmail.com or create an new issue

Owner

  • Name: (WIP DEV) Bioconductor Packages
  • Login: bioconductor-source
  • Kind: organization
  • Email: maintainer@bioconductor.org

Source code for packages accepted into Bioconductor

GitHub Events

Total
Last Year