microhaplot

microhaplotype visualizer and analyzer

https://github.com/ngthomas/microhaplot

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.2%) to scientific vocabulary

Keywords

amplicon-sequencing microhaplot-shiny shiny vcf

Last synced: 6 months ago · JSON representation

Repository

microhaplotype visualizer and analyzer

Basic Info

Host: GitHub
Owner: ngthomas
License: gpl-3.0
Language: Perl
Default Branch: master
Homepage: https://ngthomas.github.io/microhaplot
Size: 19.8 MB

Statistics

Stars: 19
Watchers: 1
Forks: 7
Open Issues: 4
Releases: 0

Topics

amplicon-sequencing microhaplot-shiny shiny vcf

Created over 9 years ago · Last pushed over 4 years ago

Metadata Files

Readme License

README.Rmd

---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# microhaplot 


  [![CRAN status](https://www.r-pkg.org/badges/version/microhaplot)](https://CRAN.R-project.org/package=microhaplot)

  
`microhaplot` generates visual summaries of microhaplotypes found in short read alignments. All you need are alignment SAM 
files and a variant call VCF file. (The latter tells `microhaplot` which SNPs to include into microhaplotypes).  It was 
designed for extracting and visualized haplotypes from high-quality amplicon sequencing data.  We have used it extensively
to process amplicon sequencing data (with 100 to 500 amplicons) from rockfish and Chinook salmon, generated on an Illumina 
MiSeq sequencer.  It should be extensible to sequences from capture arrays, like RAPTURE data.

This software exists as an R package `microhaplot` that includes within it the code to set up and 
establish an Rstudio/Shiny server to visualize and manipulate the data.  There are two key steps in 
the `microhaplot` workflow:

1. The first step is to summarize alignment and variant (SNP) data into a single data frame that is 
easily operated upon.  This is done using the function `microhaplot::prepHaplotFiles`.  You must supply a 
VCF file that includes variants that you are interested in extracting, and as many SAM files 
(one for each individual) that you want to extract read information from at each of the variants. 
The function `microhaplot::prepHaplotFiles` makes a call
to PERL to parse the CIGAR strings in the SAM files to extract the variant information at each read
and store this information into a data frame which gets saved with the installed Shiny app (see below)
for later use.  Depending on the size of the data set, this can take a few minutes.  

2. The second step is to run the microhaplot Shiny app to visualize the sequence information, call genotypes using
simple read-depth based filtering criteria, and curate the loci. microhaplot is suitable for quick assessment
and quality control of haplotypes generated from library runs. Plot summaries include read depth, fraction of callable haplotypes, Hardy-Weinberg
equilibrium plots, and more. 

   

   
     
See the **Example Data** section to learn about how to run each of these steps on the example data that are provided
with the package.  

   
## Installation and Quick Start

### required Perl dependencies:
You need to have Perl (version >5.014) installed in your OS in order to run Microhaplot.  
For Window users, we recommend install it via http://strawberryperl.com/.  
For Mac and Linux users, Perl can be downloaded from https://www.perl.org/get.html  

You can either clone the repository and build the `microhaplot` package yourself, or, more easily, you can
install it using  [devtools](https://github.com/hadley/devtools). You can get `devtools` by `install.packages("devtools")`.
  
**To mac user: remember to install [XQuartz](https://www.xquartz.org/), when upgrading your macOS to a new major version.**   
 
Once you have `devtools` available in R, you can get `microhaplot` this way:
```r
devtools::install_github("ngthomas/microhaplot", build_vignettes = TRUE, build_opts = c("--no-resave-data", "--no-manual"))
```

Once you have installed the `microhaplot` R package with devtools there you need to use the `microhaplot::mvHaplotype`
to establish the microhaplot Shiny App in a convenient location on your system. The following line
creates the directory `Shiny` in my home directory and then within that it creates the 
directory `microhaplot` and fills it with the Shiny app as well as the example data that go 
along with that.  

```r
microhaplot::mvShinyHaplot("~/Shiny") # provide a directory path to host the microhaplot app
```
To start familiarizing yourself with microhaplot using the provided example data.  We recommend
going through our first vignette.  Call it up with:
```r
browseVignettes("microhaplot")
```
and check out `microhaplot-walkthrough`.

Now, having done that, we can launch Shiny microhaplot on the example data:
```r
library(microhaplot)
app.path <- "~/Shiny/microhaplot"
runShinyHaplot(app.path)
```

## Quick Guide to use microhaplot to parse out SAM and VCF files

This microhaplot package comes with a small customized sample data drawn from an actual run 
of short read sequencing run on Rockfish species. The sample data
contains sequences of eight genomic loci for four populations of five individuals each, 
with a total of twenty individuals. 

First you need to create a tab-separate **label** file with 3 info columns: path to SAM file name, individual ID, and group label (in this particular order). If you do not want assign any group label for the individuals, you can just leave it as "NA". It is recommended that you have all of the SAM files under one directory to make this labeling task easier.

The `label` file looks like this:
```txt
s6.sam  s6      copper
s11.sam s11     copper
s13.sam s13     gold
s14.sam s14     kelp
s18.sam s18     gold
``` 

Once you have the label file in place, you can run `prepHaplotFiles`, a R function that generates tables of microhaplotype, by providing the following:
 * a label to display in haPLOType
 * path to the directory with all SAM files 
 * path to the `label` file you just created
 * path to the VCF file  
 * optional number of threads (for non-Windows user); recommend 2 * # of processors 
 
```R
library(microhaplot)

# to access package sample case study dataset of rockfish
run.label <- "sebastes"

sam.path <- tempdir()
untar(system.file("extdata",
                  "sebastes_sam.tar.gz",
                  package="microhaplot"),
      exdir = sam.path)
      
label.path <- file.path(sam.path, "label.txt")
vcf.path <- file.path(sam.path, "sebastes.vcf")
out.path <- tempdir()
app.path <- "~/Shiny/microhaplot"

# for your dataset: customize the following paths
# sam.path <- "~/microhaplot/extdata/"
# label.path <- "~/microhaplot/extdata/label.txt"
# vcf.path <- "~/microhaplot/extdata/sebastes.vcf"
# app.path <- "~/Shiny/microhaplot"

haplo.read.tbl <- prepHaplotFiles(run.label = run.label,
                            sam.path = sam.path,
                            out.path = out.path,
                            label.path = label.path,
                            vcf.path = vcf.path,
                            app.path = app.path,
                            n.jobs = 4) # assume running on dual core
                            

runShinyHaplot(app.path)
```


## Suggestions
- SAM files: For pair-ended experiment, both directional reads should be flashed into one.

Owner

Name: Thomas C Ng
Login: ngthomas
Kind: user

Repositories: 7
Profile: https://github.com/ngthomas

GitHub Events

Total

Issues event: 1
Watch event: 1
Issue comment event: 2
Pull request event: 1
Fork event: 2

Last Year

Issues event: 1
Watch event: 1
Issue comment event: 2
Pull request event: 1
Fork event: 2

Committers

Last synced: about 2 years ago

All Time

Total Commits: 198
Total Committers: 2
Avg Commits per committer: 99.0
Development Distribution Score (DDS): 0.162

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
ngthomas	t**g@g**m	166
Eric C. Anderson	e**n@n**v	32

Committer Domains (Top 20 + Academic)

noaa.gov: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 14
Total pull requests: 14
Average time to close issues: 7 months
Average time to close pull requests: 11 days
Total issue authors: 10
Total pull request authors: 2
Average comments per issue: 1.21
Average comments per pull request: 0.43
Merged pull requests: 11
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 1
Average comments per issue: 2.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

eriqande (4)
ettorefedele (2)
mhopken (1)
sckieran (1)
ltalignani (1)
LZarri (1)
standage (1)
SoraiaB (1)
Miffy-yan (1)
ngthomas (1)

Pull Request Authors

eriqande (12)
stevemussmann (2)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 174 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

cran.r-project.org: microhaplot

Microhaplotype Constructor and Visualizer

Homepage: https://github.com/ngthomas/microhaplot
Documentation: http://cran.r-project.org/web/packages/microhaplot/microhaplot.pdf
License: GPL-3
Latest release: 1.0.1
published over 6 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 174 Last month

Rankings

Forks count: 11.3%

Stargazers count: 14.2%

Dependent packages count: 29.8%

Average: 31.6%

Dependent repos count: 35.5%

Downloads: 67.3%

Maintainers (1)

tngthomasng@gmail.com

Last synced: 7 months ago

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
DT >= 0.1 imports
dplyr >= 0.4.3 imports
ggiraph >= 0.6.0 imports
ggplot2 >= 2.1.0 imports
grid >= 3.1.2 imports
gtools >= 3.5.0 imports
magrittr >= 1.5 imports
scales >= 0.4.0 imports
shiny >= 0.13.2 imports
shinyBS >= 0.61 imports
shinyWidgets >= 0.4.3 imports
tidyr >= 0.4.1 imports
knitr * suggests
rmarkdown * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

microhaplot

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: microhaplot

Rankings

Maintainers (1)

Dependencies