Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, pubmed.ncbi, ncbi.nlm.nih.gov, nature.com -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Keywords
1000genomes
bioinformatics
genetics
genomics
metadata
population-genetics
sequencing
Last synced: 4 months ago
·
JSON representation
·
Repository
1000 Genomes Project Metadata R Package
Basic Info
- Host: GitHub
- Owner: stephenturner
- License: other
- Language: R
- Default Branch: main
- Homepage: https://stephenturner.github.io/kgp/
- Size: 1000 KB
Statistics
- Stars: 20
- Watchers: 2
- Forks: 4
- Open Issues: 5
- Releases: 2
Topics
1000genomes
bioinformatics
genetics
genomics
metadata
population-genetics
sequencing
Created over 3 years ago
· Last pushed about 3 years ago
Metadata Files
Readme
Changelog
License
Citation
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# kgp
[](https://CRAN.R-project.org/package=kgp)
[](https://lifecycle.r-lib.org/articles/stages.html#stable)
[](https://arxiv.org/abs/2210.00539)
This kgp data package provides metadata about populations and data about samples from the 1000 Genomes Project, including the 2,504 samples sequenced for the Phase 3 release and the expanded collection of 3,202 samples with 602 additional trios.
## Installation
You can install the released version of kgp from [CRAN](https://CRAN.R-project.org/package=kgp) with:
```r
install.packages("kgp")
```
You can install the development version of kgp from [GitHub](https://github.com/stephenturner/kgp) with:
```r
# install.packages("devtools")
devtools::install_github("stephenturner/kgp")
```
## About the data
The 1000 Genomes Project data Phase 3 data contains 2,504 samples with sequence data available, and was later expanded to 3,202 samples with high coverage adding 602 trios. Data is available through the [1000 Genomes FTP site](http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/) and [GitHub](https://github.com/igsr/1000Genomes_data_indexes/).
- Pilot publication: [An integrated map of genetic variation from 1,092 human genomes](https://www.nature.com/articles/nature11632)
- Phase 1 publication: [A map of human genome variation from population scale sequencing](https://www.nature.com/articles/nature09534)
- Phase 3 publication: [A global reference for human genetic variation](https://www.nature.com/articles/nature15393)
- Expanded high-coverage publication: [High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios](https://pubmed.ncbi.nlm.nih.gov/36055201/)
There are three data sets available in the kgp package.
```{r example}
library(kgp)
data(kgp)
```
The `kgp3` data contains pedigree and population information for the 2,504 samples included in the Phase 3 release of the 1000 Genomes Project data.
```{r}
kgp3
```
The `kgpe` data contains pedigree and population information all 3,202 samples included in the expanded 1000 Genomes Project data, which includes 602 trios.
```{r}
kgpe
```
The `kgpmeta` contains population metadata for the 26 populations across five continental regions.
```{r}
kgpmeta
```
## Examples
```{r, message=FALSE, warning=FALSE}
library(dplyr)
library(ggplot2)
library(kgp)
data(kgp)
```
Count the number of samples in each region, or in each population:
```{r}
kgp3 %>%
count(region) %>%
knitr::kable()
```
```{r}
kgp3 %>%
count(region, population) %>%
knitr::kable()
```
```{r kgp3barplot, fig.width=9, fig.height=12}
kgp3 %>%
count(region, population) %>%
arrange(region, n) %>%
mutate(population=forcats::fct_inorder(population)) %>%
ggplot(aes(population, n)) +
geom_col(aes(fill=region)) +
labs(fill=NULL, x=NULL, x="N") +
coord_flip() +
theme_bw() +
theme(legend.position="bottom")
```
The latitude and longitude coordinates in `kgpmeta` can be used to plot a map of the locations of the 1000 Genomes populations. There is also a column for region color, which provides a hexadecimal color code to enable reproduction of the population data map as shown on the IGSR population data page. The figure below shows a static map produced using ggplot2, but interactive maps such as that shown on the IGSR population data portal can be created with the leaflet package.
```{r kgpmap, fig.cap="Map showing locations of the 1000 Genomes Phase 3 populations.", fig.width=8, fig.height=6}
pal <- kgpmeta %>% distinct(reg, regcolor) %>% tibble::deframe()
ggplot() +
geom_polygon(data=map_data("world"),
aes(long, lat, group=group),
col="gray30", fill="gray95", lwd=.2, alpha=.5) +
geom_point(data=kgpmeta, aes(lng, lat, col=reg), size=4) +
scale_colour_manual(values=pal) +
theme_minimal() +
theme(axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
legend.title = element_blank(),
panel.grid = element_blank(),
legend.position = "bottom")
```
The table below shows a selection of samples from `kgpe` showing pedigree information for each sample. This pedigree information could be used in downstream analysis to filter out related individuals, select only trios, or to visualize family structure.
```{r kgpe}
kgpe %>%
filter(pid!="0" & mid!="0") %>%
group_by(pop) %>%
slice(1) %>%
head(12) %>%
arrange(reg, pop) %>%
select(fid:reg) %>%
select(-sexf) %>%
knitr::kable()
```
The figure below shows an example of a pedigree plot made by parsing the pedigree information using [skater](https://cran.r-project.org/package=skater) and plotting using [kinship2](https://cran.r-project.org/package=kinship2). The skater package provides documentation, examples, and a vignette demonstrating how to iteratively plot all pedigrees in a given data set.
```{r pedplot, fig.height=5, fig.width=8, fig.cap="Trios in 1000 Genomes Project family 13291."}
kgpe %>%
filter(fid=="13291") %>%
transmute(fid, id, dadid=pid, momid=mid, sex, affected=1) %>%
skater::fam2ped() %>%
pull(ped) %>%
purrr::pluck(1) %>%
kinship2::plot.pedigree(mar=c(4,2,4,2), cex=.8)
```
Owner
- Name: Stephen Turner
- Login: stephenturner
- Kind: user
- Location: Charlottesville, VA
- Company: @colossal-compsci
- Website: http://StephenTurner.us
- Twitter: strnr
- Repositories: 125
- Profile: https://github.com/stephenturner
Data scientist in biotech, former academic, Principal Scientist and Head of Genomic Strategy at Colossal Biosciences
Citation (CITATION.cff)
# -----------------------------------------------------------
# CITATION file created with {cffr} R package, v0.2.3
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
cff-version: 1.2.0
message: 'To cite package "kgp" in publications use:'
type: software
license: Apache-2.0
title: 'kgp: 1000 Genomes Project Metadata'
version: 1.0.0
abstract: Metadata about populations and data about samples from the 1000 Genomes
Project, including the 2,504 samples sequenced for the Phase 3 release and the expanded
collection of 3,202 samples with 602 additional trios. The data is described in
Auton et al. (2015) <doi:10.1038/nature15393> and Byrska-Bishop et al. (2022) <doi:10.1016/j.cell.2022.08.004>,
and raw data is available at <http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/>.
authors:
- family-names: Turner
given-names: Stephen
email: vustephen@gmail.com
orcid: https://orcid.org/0000-0001-9140-9028
preferred-citation:
type: manual
title: 'kgp: 1000 Genomes Project Metadata'
authors:
- family-names: Turner
given-names: Stephen
email: vustephen@gmail.com
orcid: https://orcid.org/0000-0001-9140-9028
version: 1.0.0
abstract: Metadata about populations and data about samples from the 1000 Genomes
Project, including the 2,504 samples sequenced for the Phase 3 release and the
expanded collection of 3,202 samples with 602 additional trios. The data is described
in Auton et al. (2015) <doi:10.1038/nature15393> and Byrska-Bishop et al. (2022)
<doi:10.1016/j.cell.2022.08.004>, and raw data is available at <http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/>.
repository-code: https://github.com/stephenturner/kgp
url: https://stephenturner.github.io/kgp/
contact:
- family-names: Turner
given-names: Stephen
email: vustephen@gmail.com
orcid: https://orcid.org/0000-0001-9140-9028
keywords:
- 1000genomes
- bioinformatics
- genetics
- genomics
- metadata
- population-genetics
- sequencing
license: Apache-2.0
year: '2022'
repository-code: https://github.com/stephenturner/kgp
url: https://stephenturner.github.io/kgp/
contact:
- family-names: Turner
given-names: Stephen
email: vustephen@gmail.com
orcid: https://orcid.org/0000-0001-9140-9028
keywords:
- 1000genomes
- bioinformatics
- genetics
- genomics
- metadata
- population-genetics
- sequencing
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Stephen Turner | v****n@g****m | 30 |
Issues and Pull Requests
Last synced: 5 months ago
All Time
- Total issues: 7
- Total pull requests: 2
- Average time to close issues: about 15 hours
- Average time to close pull requests: 3 minutes
- Total issue authors: 3
- Total pull request authors: 1
- Average comments per issue: 0.43
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- stephenturner (5)
- pamonlan (1)
- carolhuaxia (1)
Pull Request Authors
- stephenturner (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 222 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 3
- Total maintainers: 1
cran.r-project.org: kgp
1000 Genomes Project Metadata
- Homepage: https://github.com/stephenturner/kgp
- Documentation: http://cran.r-project.org/web/packages/kgp/kgp.pdf
- License: Apache License (≥ 2)
-
Latest release: 1.1.1
published about 3 years ago
Rankings
Stargazers count: 14.6%
Forks count: 14.9%
Dependent packages count: 29.8%
Average: 32.6%
Dependent repos count: 35.5%
Downloads: 68.4%
Maintainers (1)
Last synced:
5 months ago
Dependencies
DESCRIPTION
cran
- R >= 2.10 depends
- tibble * suggests
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action 4.1.4 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite