wgslr
Forensic likelihood ratios for whole-genome sequencing accounting for errors
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Last synced: 8 months ago
·
JSON representation
·
Repository
Forensic likelihood ratios for whole-genome sequencing accounting for errors
Basic Info
- Host: GitHub
- Owner: mikldk
- Language: R
- Default Branch: main
- Size: 1.08 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Created over 2 years ago
· Last pushed 10 months ago
Metadata Files
Readme
Changelog
Citation
README.Rmd
---
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# `wgsLR`: Shotgun sequencing for human identification: Dynamic SNP marker sets and likelihood ratio calculations accounting for errors
Please refer to the online documentation at , including the vignettes.
## Scientific publication
The research associated with this software is described in
> Andersen, M. M., Kampmann, M.-L., Jepsen, A. H., Morling, N., Eriksen, P. S., Børsting, C., & Andersen, J. D. (2025).
> *Shotgun DNA sequencing for human identification: Dynamic SNP selection and likelihood ratio calculations accounting for errors.*
> Forensic Science International: Genetics, 74, 103146. [doi:10.1016/j.fsigen.2024.103146](https://doi.org/10.1016/j.fsigen.2024.103146).
## Installation
To install from Github with vignettes run this command from
within `R` (please install `remotes` first if not already installed):
```
# install.packages('remotes')
remotes::install_github("mikldk/wgsLR", build_vignettes = TRUE)
```
You can also install the package without vignettes if needed as follows:
```
remotes::install_github("mikldk/wgsLR")
```
## A few small examples
### Estimating the genotype error probability, $w$
```{r}
cases <- wgsLR::sample_data_Hp_w(n = 1000, w = 0.1, p = c(0.25, 0.25, 0.5))
tab <- table(cases$X_D, cases$X_S)
tab
w_mle <- wgsLR::estimate_w(tab)
w_mle
```
#### Cautionary note: not just standard VCF files
It is necessary to obtain sequencing results for all bases in the
selected segments (including read depth and genotype quality).
Thus, it is **not** sufficient to just use information from
"confirmed"/high probability variants from the reference genome (variants identified in standard vcf-file format), as this can introduce bias in the results.
**Information from all bases in the chosen genomic areas of interest is needed**.
One way to achieve this by using [GATK HaplotypeCaller](https://gatk.broadinstitute.org/hc/en-us/articles/9570334998171-HaplotypeCaller)
with the additional argument `--emit-ref-confidence BP_RESOLUTION` for the
genomic areas of interest (using `-L areas.interval_list`).
### Calculating likelihood ratios ($LR$'s)
Assume that a trace sample had four loci with genotypes (0/1 = 1, 0/0 = 0, 1/1 = 2, 1/1 = 2).
A person of interest is then typed for the same four loci and
has genotypes (0/0 = 0, 0/0 = 0, 1/1 = 2, 1/1 = 2), i.e. a mismatch on the first locus.
For simplicity, assume that the genotype probabilites are
P(0/0 = 0) = 0.25,
P(0/1 = 1/0 = 1) = 0.25, and
P(1/1 = 2) = 0.5.
If no errors are possible, then $w=0$ and
```{r}
wgsLR::calc_LRs_w(xs = c(0, 0, 2, 2),
xd = c(1, 0, 2, 2),
w = 0,
p = c(0.25, 0.25, 0.5))
```
and the product is 0 due to the mismatch at the first locus.
If instead we acknowledge that errors are possible, then for $w = 0.001$ we obtain that
```{r}
LR_contribs <- wgsLR::calc_LRs_w(xs = c(0, 0, 2, 2),
xd = c(1, 0, 2, 2),
w = 0.001,
p = c(0.25, 0.25, 0.5))
LR_contribs
prod(LR_contribs)
```
We can also consider the $LR$s for a range for plausible values of $w$:
```{r}
ws <- c(1e-6, 1e-3, 1e-2, 1e-1)
LRs <- sapply(ws, \(w) wgsLR::calc_LRs_w(xs = c(0, 0, 2, 2),
xd = c(1, 0, 2, 2),
w = w,
p = c(0.25, 0.25, 0.5)) |>
prod())
data.frame(log10w = log10(ws), w = ws,
LR = LRs, WoElog10LR = log10(LRs))
```
## Different error rates
Assume that the trace donor profile has $w_D = 10^{-4}$ and
the suspect reference profile has $w_S = 10^{-8}$. Then
the $LR$ is:
```{r}
LR_contribs <- wgsLR::calc_LRs_wDwS(xs = c(0, 0, 2, 2),
xd = c(1, 0, 2, 2),
wD = 1e-4,
wS = 1e-8,
p = c(0.25, 0.25, 0.5))
prod(LR_contribs)
```
Owner
- Name: Mikkel Meyer Andersen
- Login: mikldk
- Kind: user
- Location: Denmark
- Repositories: 64
- Profile: https://github.com/mikldk
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this work, please cite it as below."
authors:
- family-names: Andersen
given-names: Mikkel Meyer
- family-names: Eriksen
given-names: Poul Svante
title: "wgsLR"
preferred-citation:
type: article
authors:
- family-names: Andersen
given-names: Mikkel Meyer
- family-names: Kampmann
given-names: Marie-Louise
- family-names: Jepsen
given-names: Alberte Honoré
- family-names: Morling
given-names: Niels
- family-names: Eriksen
given-names: Poul Svante
- family-names: Børsting
given-names: Claus
- family-names: Andersen
given-names: Jeppe Dyrberg
title: "Shotgun DNA sequencing for human identification: Dynamic SNP selection and likelihood ratio calculations accounting for errors"
doi: 10.1016/j.fsigen.2024.103146
url: https://doi.org/10.1016/j.fsigen.2024.103146
year: 2025
journal: "Forensic Science International: Genetics"
volume: 74
publisher: "Elsevier BV"
issn: 1872-4973
GitHub Events
Total
- Watch event: 1
- Delete event: 1
- Issue comment event: 1
- Push event: 9
- Pull request event: 6
- Fork event: 1
- Create event: 1
Last Year
- Watch event: 1
- Delete event: 1
- Issue comment event: 1
- Push event: 9
- Pull request event: 6
- Fork event: 1
- Create event: 1
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- mikldk (2)
- Magnus-Krogh (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
DESCRIPTION
cran
- R >= 3.0 depends
- methods * depends
- Rmpfr * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests
- tidygraph * suggests