wgslr

Forensic likelihood ratios for whole-genome sequencing accounting for errors

https://github.com/mikldk/wgslr

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary
Last synced: 8 months ago · JSON representation ·

Repository

Forensic likelihood ratios for whole-genome sequencing accounting for errors

Basic Info
  • Host: GitHub
  • Owner: mikldk
  • Language: R
  • Default Branch: main
  • Size: 1.08 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Citation

README.Rmd

---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# `wgsLR`: Shotgun sequencing for human identification: Dynamic SNP marker sets and likelihood ratio calculations accounting for errors

Please refer to the online documentation at , including the vignettes.

## Scientific publication

The research associated with this software is described in 

> Andersen, M. M., Kampmann, M.-L., Jepsen, A. H., Morling, N., Eriksen, P. S., Børsting, C., & Andersen, J. D. (2025). 
> *Shotgun DNA sequencing for human identification: Dynamic SNP selection and likelihood ratio calculations accounting for errors.* 
> Forensic Science International: Genetics, 74, 103146. [doi:10.1016/j.fsigen.2024.103146](https://doi.org/10.1016/j.fsigen.2024.103146).


## Installation

To install from Github with vignettes run this command from 
within `R` (please install `remotes` first if not already installed):

```
# install.packages('remotes')
remotes::install_github("mikldk/wgsLR", build_vignettes = TRUE)
```

You can also install the package without vignettes if needed as follows:

```
remotes::install_github("mikldk/wgsLR")
```

## A few small examples

### Estimating the genotype error probability, $w$

```{r}
cases <- wgsLR::sample_data_Hp_w(n = 1000, w = 0.1, p = c(0.25, 0.25, 0.5))
tab <- table(cases$X_D, cases$X_S)
tab
w_mle <- wgsLR::estimate_w(tab)
w_mle
```


#### Cautionary note: not just standard VCF files

It is necessary to obtain sequencing results for all bases in the
selected segments (including read depth and genotype quality). 
Thus, it is **not** sufficient to just use information from
"confirmed"/high probability variants from the reference genome (variants identified in standard vcf-file format), as this can introduce bias in the results. 
**Information from all bases in the chosen genomic areas of interest is needed**. 
One way to achieve this by using [GATK HaplotypeCaller](https://gatk.broadinstitute.org/hc/en-us/articles/9570334998171-HaplotypeCaller) 
with the additional argument `--emit-ref-confidence BP_RESOLUTION` for the 
genomic areas of interest (using `-L areas.interval_list`).



### Calculating likelihood ratios ($LR$'s)

Assume that a trace sample had four loci with genotypes (0/1 = 1, 0/0 = 0, 1/1 = 2, 1/1 = 2).
A person of interest is then typed for the same four loci and 
has genotypes (0/0 = 0, 0/0 = 0, 1/1 = 2, 1/1 = 2), i.e. a mismatch on the first locus.

For simplicity, assume that the genotype probabilites are 
P(0/0 = 0) = 0.25, 
P(0/1 = 1/0 = 1) = 0.25, and 
P(1/1 = 2) = 0.5.

If no errors are possible, then $w=0$ and

```{r}
wgsLR::calc_LRs_w(xs = c(0, 0, 2, 2), 
                  xd = c(1, 0, 2, 2), 
                  w = 0, 
                  p = c(0.25, 0.25, 0.5))
```

and the product is 0 due to the mismatch at the first locus.

If instead we acknowledge that errors are possible, then for $w = 0.001$ we obtain that

```{r}
LR_contribs <- wgsLR::calc_LRs_w(xs = c(0, 0, 2, 2), 
                                 xd = c(1, 0, 2, 2), 
                                 w = 0.001, 
                                 p = c(0.25, 0.25, 0.5))
LR_contribs
prod(LR_contribs)
```

We can also consider the $LR$s for a range for plausible values of $w$:

```{r}
ws <- c(1e-6, 1e-3, 1e-2, 1e-1)
LRs <- sapply(ws, \(w) wgsLR::calc_LRs_w(xs = c(0, 0, 2, 2), 
                                         xd = c(1, 0, 2, 2), 
                                         w = w, 
                                         p = c(0.25, 0.25, 0.5)) |> 
                prod())
data.frame(log10w = log10(ws), w = ws, 
           LR = LRs, WoElog10LR = log10(LRs))
```


## Different error rates

Assume that the trace donor profile has $w_D = 10^{-4}$ and 
the suspect reference profile has $w_S = 10^{-8}$. Then 
the $LR$ is:

```{r}
LR_contribs <- wgsLR::calc_LRs_wDwS(xs = c(0, 0, 2, 2), 
                                    xd = c(1, 0, 2, 2), 
                                    wD = 1e-4, 
                                    wS = 1e-8,
                                    p = c(0.25, 0.25, 0.5))
prod(LR_contribs)
```

Owner

  • Name: Mikkel Meyer Andersen
  • Login: mikldk
  • Kind: user
  • Location: Denmark

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this work, please cite it as below."
authors:
    - family-names: Andersen
      given-names: Mikkel Meyer
    - family-names: Eriksen
      given-names: Poul Svante
title: "wgsLR"
preferred-citation:
  type: article
  authors:
    - family-names: Andersen
      given-names: Mikkel Meyer
    - family-names: Kampmann
      given-names: Marie-Louise
    - family-names: Jepsen
      given-names: Alberte Honoré
    - family-names: Morling
      given-names: Niels
    - family-names: Eriksen
      given-names: Poul Svante
    - family-names: Børsting
      given-names: Claus
    - family-names: Andersen
      given-names: Jeppe Dyrberg
  title: "Shotgun DNA sequencing for human identification: Dynamic SNP selection and likelihood ratio calculations accounting for errors"
  doi: 10.1016/j.fsigen.2024.103146
  url: https://doi.org/10.1016/j.fsigen.2024.103146
  year: 2025
  journal: "Forensic Science International: Genetics"
  volume: 74
  publisher: "Elsevier BV"
  issn: 1872-4973

GitHub Events

Total
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 9
  • Pull request event: 6
  • Fork event: 1
  • Create event: 1
Last Year
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 9
  • Pull request event: 6
  • Fork event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • mikldk (2)
  • Magnus-Krogh (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 3.0 depends
  • methods * depends
  • Rmpfr * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • testthat >= 3.0.0 suggests
  • tidygraph * suggests