DNAtools

DNAtools: Tools for Analysing Forensic Genetic DNA Data - Published in JOSS (2020)

https://github.com/mikldk/dnatools

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation

Repository

Development version of the DNAtools R-package

Basic Info
  • Host: GitHub
  • Owner: mikldk
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Size: 1.43 MB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 3
  • Open Issues: 1
  • Releases: 4
Created about 7 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE
)
```

```{r, echo = FALSE}
library(DNAtools)
```

# DNAtools

[![Build Status](https://travis-ci.org/mikldk/DNAtools.svg?branch=master)](https://travis-ci.org/mikldk/DNAtools)
[![Build status](https://ci.appveyor.com/api/projects/status/1861od7todeskm5p/branch/master?svg=true)](https://ci.appveyor.com/project/mikldk/DNAtools/branch/master)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.01981/status.svg)](https://doi.org/10.21105/joss.01981)

There are two main features of this package:

* Computation of the distribution of the numbers of alleles in DNA mixtures.
* Empirical testing of DNA match probabilities.

Each is described in a separate vignette, and a small example given 
below under "Getting started". 
The documentation (vignettes and manual) is both included in package 
and available for reading online at .


## Install

### With internet access

To build and install from Github using R 3.3.0 (or later) and the R `devtools` package 1.11.0 (or later) run this command from within `R`:

```
devtools::install_github("mikldk/DNAtools", 
                         build_opts = c("--no-resave-data", "--no-manual"))
```

You can also install the package without vignettes if needed as follows:

```
devtools::install_github("mikldk/DNAtools")
```

### Without internet access

To install on a computer without internet access:

1. Download `DNAtools` as a `.tar.gz` archive from GitHub, transfer to the destination computer, e.g. using removable media
1. Install `devtools` and `DNAtools` pre-requisites (`multicool`, `Rcpp`, `RcppParallel`, `RcppProgress`, `Rsolnp`)
1. Install `DNAtools` in `R` using the `devtools::install_local()` function


## Contribute, issues, and support ##

Please use the issue tracker at  
if you want to notify us of an issue or need support.
If you want to contribute, please either create an issue or make a pull request.

## Getting started


Please read the vignettes for more elaborate explanations than those given below. 
The below example is meant to illustrate some of the functionality the package provides in 
a compact fashion.

Say that we have a reference database:

```{r}
data(dbExample, package = "DNAtools")
head(dbExample)[, 2:7]
dim(dbExample)
```

We now find the allele frequencies:

```{r}
allele_freqs <- lapply(1:10, function(x){
  al_freq <- table(c(dbExample[[x*2]], dbExample[[1+x*2]]))/(2*nrow(dbExample))
  al_freq[sort.list(as.numeric(names(al_freq)))]
})
names(allele_freqs) <- sub("\\.1", "", names(dbExample)[(1:10)*2])
```


```{r, include=FALSE}
txtbar <- function(x) {
  y <- round(100*noa)
  y2 <- lapply(y, rep.int, x = "|")
  y3 <- lapply(y2, paste0, collapse = "")
  ret <- data.frame(`Number of alleles` = names(x), Frequency = unlist(y3), 
                    check.names = FALSE)
  print(ret, quote = FALSE, row.names = FALSE, right = FALSE)
  return(invisible(ret))
}
```

### Number of alleles

One could ask: What is the distribution of the number of alleles observed in a three person mixture?

The distribution of the number of alleles in a three person mixture can 
be calculated by this package. 
We focus on the D16S539 locus:

```{r}
allele_freqs$D16S539
noa <- Pnm_locus(m = 3, theta = 0, alleleProbs = allele_freqs$D16S539)
names(noa) <- seq_along(noa)
noa
```

This can be illustrated by a barchart:

```{r, echo=FALSE, results='markup', comment=''}
txtbar(noa)
```

So it is most likely that a three person mixture on D16S539 has `r names(noa)[which.max(noa)]` alleles.

This can be done for all loci at once:

```{r}
noa <- Pnm_all(m = 3, theta = 0, probs = allele_freqs, locuswise = TRUE)
noa
```

We can also find the convolution and thereby the total number of distinct alleles:

```{r}
noa <- Pnm_all(m = 3, theta = 0, probs = allele_freqs)
noa
```

This can be illustrated by a barchart:

```{r, echo=FALSE, results='markup', comment=''}
txtbar(noa)
```

So it is most likely that a three person mixture has `r names(noa)[which.max(noa)]` distinct alleles on all loci combined.


### Empirical testing of DNA match probabilities

Another relevant questions is how many matches and near-matches there are. 
This can be calculated as follows:

```{r}
db_summary <- dbCompare(dbExample, hit = 6, trace = FALSE)
db_summary
```

The hit argument returns pairs of profiles that fully match at `hit` (here 6) or more loci.

The summary matrix gives the number of pairs mathcing/partially-matching at $(i,j)$ loci. 
For example the row
```
     partial
match     0     1     2     3     4     5     6     7     8     9    10
   5      6    19    44    41    26     5                              
```
means that there are 6+19+44+41+26+5 = 141 pairs of profiles matching exactly at 
5 loci. 
Conditional on those 5 matches, there are 
6 pairs not matching on the remaining 5 loci, 
19 pairs partial matching on 1 locus and not matching on the remaining 4 loci, 
and so on.


Owner

  • Name: Mikkel Meyer Andersen
  • Login: mikldk
  • Kind: user
  • Location: Denmark

JOSS Publication

DNAtools: Tools for Analysing Forensic Genetic DNA Data
Published
January 16, 2020
Volume 5, Issue 45, Page 1981
Authors
Torben Tvedebrink ORCID
Department of Mathematical Sciences, Aalborg University, Denmark
Mikkel Meyer Andersen ORCID
Department of Mathematical Sciences, Aalborg University, Denmark
James Michael Curran ORCID
Department of Statistics, University of Auckland, New Zealand
Editor
Charlotte Soneson ORCID
Tags
short tandem repeat markers forensic genetics autosomal markers population genetics weight of evidence

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 86
  • Total Committers: 5
  • Avg Commits per committer: 17.2
  • Development Distribution Score (DDS): 0.163
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Mikkel Meyer Andersen m****l@m****k 72
tvedebrink t****e@m****k 8
jmcurran j****n@a****z 4
Torben Tvedebrink t****k 1
Charlotte Soneson c****n@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 2
  • Average time to close issues: 3 days
  • Average time to close pull requests: 21 days
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 2.5
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • standage (2)
  • thoree (1)
  • tomsing1 (1)
Pull Request Authors
  • andrjohns (2)
  • csoneson (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 376 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 2
  • Total versions: 15
  • Total maintainers: 1
cran.r-project.org: DNAtools

Tools for Analysing Forensic Genetic DNA Data

  • Versions: 15
  • Dependent Packages: 1
  • Dependent Repositories: 2
  • Downloads: 376 Last month
Rankings
Forks count: 14.2%
Dependent packages count: 18.1%
Dependent repos count: 19.3%
Average: 23.8%
Downloads: 32.7%
Stargazers count: 34.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.3.0 depends
  • Rcpp >= 0.12.12 imports
  • RcppParallel >= 4.3.20 imports
  • Rsolnp >= 1.16 imports
  • multicool >= 0.1 imports
  • knitr * suggests
  • rmarkdown * suggests
  • testthat * suggests
  • testthis * suggests