mungesumstats
Rapid standardisation and quality control of GWAS or QTL summary statistics
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
2 of 12 committers (16.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Keywords
bioconductor-package
bioinformatics
database-api
genomics
gwas
qtl
r
r-package
standardisation
summary-statistics
vcf-files
Keywords from Contributors
rnaseq
recount3
recount
mouse
junction
illumina
human
gene
exon
derfinder
Last synced: 6 months ago
·
JSON representation
Repository
Rapid standardisation and quality control of GWAS or QTL summary statistics
Basic Info
- Host: GitHub
- Owner: Al-Murphy
- Language: R
- Default Branch: master
- Homepage: https://al-murphy.github.io/MungeSumstats/
- Size: 9.41 MB
Statistics
- Stars: 86
- Watchers: 3
- Forks: 18
- Open Issues: 10
- Releases: 0
Topics
bioconductor-package
bioinformatics
database-api
genomics
gwas
qtl
r
r-package
standardisation
summary-statistics
vcf-files
Created almost 5 years ago
· Last pushed 6 months ago
Metadata Files
Readme
Changelog
README.Rmd
--- title: "`MungeSumstats`: Standardise the format of GWAS summary statistics" author: "Authors: Alan Murphy, Brian Schilder and Nathan Skene
" date: "Updated: `r format(Sys.Date(), '%b-%d-%Y')`
" bibliography: vignettes/MungeSumstats.bib csl: vignettes/nature.csl output: rmarkdown::github_document vignette: > %\VignetteIndexEntry{MungeSumstats} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ```{r, echo=FALSE} knitr::opts_chunk$set(tidy = FALSE, warning = FALSE, message = FALSE) ``` `r badger::badge_bioc_release(color = "black")` `r badger::badge_github_version(color = "black")` `r badger::badge_last_commit(branch = "master")` `r badger::badge_bioc_download(by = "total", color = "blue")` `r badger::badge_license()` `r badger::badge_doi(doi = "https://doi.org/10.1093/bioinformatics/btab665", color="blue")` # Introduction The `MungeSumstats` package is designed to facilitate the standardisation of GWAS summary statistics. ## Overview The package is designed to handle the lack of standardisation of output files by the GWAS community. The [MRC IEU Open GWAS](https://gwas.mrcieu.ac.uk/) team have provided full summary statistics for >10k GWAS, which are API-accessible via the [`ieugwasr`](https://mrcieu.github.io/ieugwasr/) and [`gwasvcf`](https://github.com/MRCIEU/gwasvcf) packages. But these GWAS are only standardised in the sense that they are VCF format, and can be fully standardised with `MungeSumstats`. `MungeSumstats` provides a framework to standardise the format for any GWAS summary statistics, including those in VCF format, enabling downstream integration and analysis. It addresses the most common discrepancies across summary statistic files, and offers a range of adjustable Quality Control (QC) steps. ## Citation If you use `MungeSumstats`, please cite the original authors of the GWAS as well as: > Alan E Murphy, Brian M Schilder, Nathan G Skene (2021) MungeSumstats: A Bioconductor package for the standardisation and quality control of many GWAS summary statistics. *Bioinformatics*, btab665, https://doi.org/10.1093/bioinformatics/btab665 # Installing `MungeSumstats` `MungeSumstats` is available on [Bioconductor](https://bioconductor.org/packages/MungeSumstats). To install `MungeSumstats` on Bioconductor run: ```R if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("MungeSumstats") ``` You can then load the package and data package: ```R library(MungeSumstats) ``` Note that there is also a [docker image for MungeSumstats](https://hub.docker.com/r/neurogenomicslab/mungesumstats). Note that for a number of the checks implored by `MungeSumstats` a reference genome is used. If your GWAS summary statistics file of interest relates to *GRCh38*, you will need to install `SNPlocs.Hsapiens.dbSNP155.GRCh38` and `BSgenome.Hsapiens.NCBI.GRCh38` from Bioconductor as follows: ```R BiocManager::install("SNPlocs.Hsapiens.dbSNP155.GRCh38") BiocManager::install("BSgenome.Hsapiens.NCBI.GRCh38") ``` If your GWAS summary statistics file of interest relates to *GRCh37*, you will need to install `SNPlocs.Hsapiens.dbSNP155.GRCh37` and `BSgenome.Hsapiens.1000genomes.hs37d5` from Bioconductor as follows: ```R BiocManager::install("SNPlocs.Hsapiens.dbSNP155.GRCh37") BiocManager::install("BSgenome.Hsapiens.1000genomes.hs37d5") ``` These may take some time to install and are not included in the package as some users may only need one of *GRCh37*/*GRCh38*. If you are unsure of the genome build, MungeSumstats can also infer this information from your data. # Getting started See the [Getting started vignette website](https://al-murphy.github.io/MungeSumstats/articles/MungeSumstats.html) for up-to-date instructions on usage. See the [OpenGWAS vignette website](https://al-murphy.github.io/MungeSumstats/articles/OpenGWAS.html) for information on how to use MungeSumstats to access, standardise and perform quality control on GWAS Summary Statistics from the MRC IEU [Open GWAS Project](https://gwas.mrcieu.ac.uk/). **NOTE** to authenticate, you need to generate a token from the OpenGWAS website. The token behaves like a password, and it will be used to authorise the requests you make to the OpenGWAS API. Here are the steps to generate the token and then have `ieugwasr` automatically use it for your queries: 1. Login to https://api.opengwas.io/profile/ 2. Generate a new token 3. Add `OPENGWAS_JWT=` to your .Renviron file, thi can be edited in R by running `usethis::edit_r_environ()` 4. Restart your R session 5. To check that your token is being recognised, run `ieugwasr::get_opengwas_jwt()`. If it returns a long random string then you are authenticated. 6. To check that your token is working, run `ieugwasr::user()`. It will make a request to the API for your user information using your token. It should return a list with your user information. If it returns an error, then your token is not working. 7. Make sure you have submitted use Please read carefully through the [FAQ website](https://github.com/Al-Murphy/MungeSumstats/wiki/FAQ) for an queries about running MungeSumstats. If you have any outside of this problems please do file an [Issue](https://github.com/al-murphy/MungeSumstats/issues) here on GitHub. # Future Enhancements The `MungeSumstats` package aims to be able to handle the most common summary statistic file formats including VCF. If your file can not be formatted by `MungeSumstats` feel free to report the [Issue](https://github.com/al-murphy/MungeSumstats/issues) on GitHub along with your summary statistics file header. We also encourage people to edit the code to resolve their particular issues too and are happy to incorporate these through pull requests on github. If your summary statistic file headers are not recognised by `MungeSumstats` but correspond to one of ``` SNP, BP, CHR, A1, A2, P, Z, OR, BETA, LOG_ODDS, SIGNED_SUMSTAT, N, N_CAS, N_CON, NSTUDY, INFO or FRQ, ``` Feel free to update the `data("sumstatsColHeaders")` following the approach in the *data.R* file and add your mapping. Then use a [Pull Request](https://github.com/al-murphy/MungeSumstats/pulls) on GitHub and we will incorporate this change into the package. # Contributors We would like to acknowledge all those who have contributed to `MungeSumstats` development: * [Alan Murphy](https://github.com/Al-Murphy) * [Nathan Skene](https://github.com/NathanSkene) * [Brian Schilder](https://github.com/bschilder) * [Shea Andrews](https://github.com/sjfandrews) * [Jonathan Griffiths](https://github.com/jonathangriffiths) * [Kitty Murphy](https://github.com/KittyMurphy) * [Mykhaylo Malakhov](https://github.com/MykMal) * [Alasdair Warwick](https://github.com/rmgpanw) * [Ao Lu](https://github.com/leoarrow1) * [Sufyan Sualeman](https://github.com/sufyansuleman)
Owner
- Name: Alan Murphy
- Login: Al-Murphy
- Kind: user
- Location: London
- Company: UK DRI at Imperial College London
- Website: https://al-murphy.github.io/
- Twitter: Al_Murphy_
- Repositories: 1
- Profile: https://github.com/Al-Murphy
Computational Biologist, PhD Student, Neurogenomics
GitHub Events
Total
- Issues event: 21
- Watch event: 8
- Issue comment event: 22
- Push event: 16
- Pull request event: 2
- Gollum event: 9
- Fork event: 3
- Create event: 3
Last Year
- Issues event: 21
- Watch event: 8
- Issue comment event: 22
- Push event: 16
- Pull request event: 2
- Gollum event: 9
- Fork event: 3
- Create event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Al-Murphy | a****4@h****m | 235 |
| Brian M. Schilder | 3****r | 120 |
| sjfandrews | s****s@g****m | 13 |
| J Wokaty | j****y@s****u | 10 |
| Nitesh Turaga | n****a@g****m | 6 |
| Jonathan Griffiths | 7****s | 5 |
| Brian Fulton-Howard | f****1@g****m | 4 |
| A Wokaty | a****y@s****u | 2 |
| rmgpanw | a****6@g****m | 1 |
| lshep | l****d@r****g | 1 |
| cfbeuchel | c****l@c****e | 1 |
| MykMal | m****v@d****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 7
- Total pull requests: 3
- Average time to close issues: about 2 months
- Average time to close pull requests: 2 days
- Total issue authors: 6
- Total pull request authors: 1
- Average comments per issue: 1.29
- Average comments per pull request: 0.33
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 7
- Pull requests: 3
- Average time to close issues: about 2 months
- Average time to close pull requests: 2 days
- Issue authors: 6
- Pull request authors: 1
- Average comments per issue: 1.29
- Average comments per pull request: 0.33
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Al-Murphy (3)
- shaman-yellow (2)
- LouChen-med (2)
- jksull (1)
- songlyzz (1)
- rmgpanw (1)
- AhmedArslan (1)
- htqqdd (1)
- svenkatesh25 (1)
- bschilder (1)
- jaqueytw (1)
- JacksonDU630 (1)
- karpat90 (1)
- AndyYang0924 (1)
- DaXuanGarden (1)
Pull Request Authors
- sufyansuleman (3)
- Al-Murphy (2)
- MC4R (1)
Top Labels
Issue Labels
bug (7)
help wanted (2)
enhancement (1)
Pull Request Labels
help wanted (1)
Dependencies
DESCRIPTION
cran
- R >= 4.1 depends
- BSgenome * imports
- Biostrings * imports
- GenomeInfoDb * imports
- GenomicRanges * imports
- IRanges * imports
- R.utils * imports
- RCurl * imports
- VariantAnnotation * imports
- data.table * imports
- dplyr * imports
- googleAuthR * imports
- httr * imports
- jsonlite * imports
- magrittr * imports
- methods * imports
- parallel * imports
- rtracklayer * imports
- stats * imports
- stringr * imports
- utils * imports
- BSgenome.Hsapiens.1000genomes.hs37d5 * suggests
- BSgenome.Hsapiens.NCBI.GRCh38 * suggests
- BiocGenerics * suggests
- BiocParallel * suggests
- BiocStyle * suggests
- GenomicFiles * suggests
- MatrixGenerics * suggests
- Rsamtools * suggests
- S4Vectors * suggests
- SNPlocs.Hsapiens.dbSNP144.GRCh37 * suggests
- SNPlocs.Hsapiens.dbSNP144.GRCh38 * suggests
- SNPlocs.Hsapiens.dbSNP155.GRCh37 * suggests
- SNPlocs.Hsapiens.dbSNP155.GRCh38 * suggests
- UpSetR * suggests
- badger * suggests
- covr * suggests
- knitr * suggests
- markdown * suggests
- rmarkdown * suggests
- seqminer * suggests
- testthat >= 3.0.0 suggests
.github/workflows/rworkflows.yml
actions
- neurogenomics/rworkflows master composite