https://github.com/bioconductor-source/sitepath

https://github.com/bioconductor-source/sitepath

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: bioconductor-source
  • License: mit
  • Language: R
  • Default Branch: devel
  • Size: 7.61 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License

README.Rmd

---
title: "sitePath: phylogeny-based sequence clustering using site polymorphism"
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  fig.path = "inst/"
)
```

The below demonstrates the result of phylogeny-based sequence clustering for a H3N2 virus dataset (included in the package)

```{r example}
library(sitePath)

data(h3n2_align) # load the H3N2 sequences
data(h3n2_tree) # load the corresponding phylogenetic tree

options(list("cl.cores" = 10)) # Use 10 cores for multiprocessing

paths <- lineagePath(h3n2_tree, alignment = h3n2_align, Nmin = 0.05)
minEntropy <- sitesMinEntropy(paths)

p1 <- plotSingleSite(paths, site = 208) # The site polymorphism of site 208 on the tree
p2 <- plotSingleSite(minEntropy, site = 208) # The result of clustering using site 208
gridExtra::grid.arrange(p1, p2, ncol = 2)
```

```{r extractTips}
grp1 <- extractTips(paths, 208) # Grouping result using site polymorphism only
grp2 <- extractTips(minEntropy, 208) # Phylogeny-based clustering result
```

# Installation

[R programming language](https://cran.r-project.org/) >= 4.1.0 is required to use `sitePath`.

The stable release is available on [Bioconductor](https://bioconductor.org/packages/sitePath/).
```r
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("sitePath")
```

The installation from [GitHub](https://github.com/wuaipinglab/sitePath/) is in experimental stage but gives the newest feature:
```r
if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")

remotes::install_github("wuaipinglab/sitePath")
```

# QuickStart

The following is a quick tutorial on how to use `sitePath` to find fixation and parallel sites including how to import data, run analysis and visualization of the results.

## 1. Data preparation
You need a _tree_ and a _MSA_ (multiple sequence alignment) file and the sequence names have to be matched!
```{r data_prep}
library(sitePath) # Load the sitePath package

# The path to your tree and MSA files
tree_file <- system.file("extdata", "ZIKV.newick", package = "sitePath")
alignment_file <- system.file("extdata", "ZIKV.fasta", package = "sitePath")


tree <- read.tree(tree_file) # Read the tree file into R
align <- read.alignment(alignment_file, format = "fasta") # Read the MSA file into R

```

## 2. Run analysis
`Nmin` and `minSNP` are the respective parameters for finding fixation and parallel sites (18 and 1 are used as an example for this dataset). The default values will be used if you don't specify them.

```{r run_analysis}
options(list("cl.cores" = 1)) # Set this bigger than 1 to use multiprocessing

paraFix <- paraFixSites(tree, alignment = align, Nmin = 18, minSNP = 1) # Run analysis to find fixation and parallel sites
paraFix
```

## 3. Fixation sites
Use `allSitesName` and set `type` as "fixation" to retrieve fixation sites name
``` {r fixSites_name}
allSitesName(paraFix, type = "fixation")
```

Use `plotFixationSites` to view fixation sites
```{r plot_fixSites}
plotFixationSites(paraFix) # View all fixation sites on the tree
plotFixationSites(paraFix, site = 139) # View a single site

```

## 4. Parallel sites
Use `allSitesName` and set `type` as "parallel" to retrieve parallel sites name
``` {r paraSites_name}
allSitesName(paraFix, type = "parallel")
```

Use `plotParallelSites` to view parallel sites
```{r}
plotParallelSites(paraFix) # View all parallel sites on the tree
plotParallelSites(paraFix, site = 105) # View a single site
```

# Read more

The above uses wrapper functions but the analysis can be dissembled into step functions (so you can view the result of each step and modify parameters). Click [here](https://wuaipinglab.github.io/sitePath/articles/sitePath.html) for a detailed breakdown of the functionality.

# Getting help

Post on Bioconductor [support site](https://support.bioconductor.org/) if having trouble using `sitePath`. Or open an [issue](https://github.com/wuaipinglab/sitePath/issues/new?assignees=&labels=&template=bug_report.md&title=) if a bug is found.

Owner

  • Name: (WIP DEV) Bioconductor Packages
  • Login: bioconductor-source
  • Kind: organization
  • Email: maintainer@bioconductor.org

Source code for packages accepted into Bioconductor

GitHub Events

Total
Last Year

Dependencies

.github/workflows/check-bioc.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/upload-artifact master composite
  • docker/build-push-action v1 composite
  • r-lib/actions/setup-pandoc master composite
  • r-lib/actions/setup-r master composite
DESCRIPTION cran
  • R >= 4.1 depends
  • RColorBrewer * imports
  • Rcpp * imports
  • ape * imports
  • aplot * imports
  • ggplot2 * imports
  • ggrepel * imports
  • ggtree * imports
  • grDevices * imports
  • graphics * imports
  • gridExtra * imports
  • methods * imports
  • parallel * imports
  • seqinr * imports
  • stats * imports
  • tidytree * imports
  • utils * imports
  • BiocStyle * suggests
  • devtools * suggests
  • knitr * suggests
  • magick * suggests
  • rmarkdown * suggests
  • testthat * suggests