snpsea

:bar_chart: Identify cell types and pathways affected by genetic risk loci.

https://github.com/slowkow/snpsea

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: ncbi.nlm.nih.gov
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Keywords

algorithm bioinformatics enrichment gene gene-sets gwas risk-loci tissue

Last synced: 6 months ago · JSON representation ·

Repository

:bar_chart: Identify cell types and pathways affected by genetic risk loci.

Basic Info

Host: GitHub
Owner: slowkow
License: other
Language: C++
Default Branch: master
Homepage: http://www.broadinstitute.org/mpg/snpsea/
Size: 20.5 MB

Statistics

Stars: 37
Watchers: 2
Forks: 9
Open Issues: 4
Releases: 3

Topics

algorithm bioinformatics enrichment gene gene-sets gwas risk-loci tissue

Created about 12 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

SNPsea: an algorithm to identify cell types, tissues, and pathways affected by risk loci

Home Page: http://www.broadinstitute.org/mpg/snpsea

Documentation: HTML | PDF | Epub

Executable: snpsea-v1.0.3.tar.gz

Data: SNPseadata20140520.zip

License: GNU GPLv3

Citation

If you benefit from this method, please cite:

Slowikowski, K. et al. SNPsea: an algorithm to identify cell types, tissues, and pathways affected by risk loci. Bioinformatics (2014). doi:10.1093/bioinformatics/btu326

See the first description of the algorithm and additional examples here:

Hu, X. et al. Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsets. The American Journal of Human Genetics 89, 496–506 (2011). PubMed

Description

SNPsea is an algorithm to identify cell types and pathways likely to be affected by risk loci. It requires a list of SNP identifiers and a matrix of genes and conditions.

Genome-wide association studies (GWAS) have discovered multiple genomic loci associated with risk for different types of disease. SNPsea provides a simple way to determine the types of cells influenced by genes in these risk loci.

Suppose disease-associated alleles influence a small number of pathogenic cell types. We hypothesize that genes with critical functions in those cell types are likely to be within risk loci for that disease. We assume that a gene's specificity to a cell type is a reasonable indicator of its importance to the unique function of that cell type.

First, we identify the genes in linkage disequilibrium (LD) with the given trait-associated SNPs and score the gene set for specificity to each cell type. Next, we define a null distribution of scores for each cell type by sampling random SNP sets matched on the number of linked genes. Finally, we evaluate the significance of the original gene set's specificity by comparison to the null distributions: we calculate an exact permutation p-value.

SNPsea is a general algorithm. You may provide your own:

Continuous gene matrix with gene expression profiles (or other values).
Binary gene annotation matrix with presence/absence 1/0 values.

We provide you with three expression matrices and one annotation matrix. See the Data section of the Manual.

The columns of the matrix may be tissues, cell types, GO annotation codes, or other conditions. Continuous matrices must be normalized before running SNPsea: columns must be directly comparable to each other.

Example

SNPsea results for RBC count-associated SNPs in the Gene Atlas.

The heatmap shows Pearson correlation coefficients between pairs of tissue expression profiles. The blue bars show p-values. Statistically significant p-values cross the Bonferroni multiple testing threshold (black line).

We identified BM-CD71+Early Erythroid as the cell type with most significant enrichment (P < 2e-7) for cell type-specific gene expression relative to 78 other tissues in the Gene Atlas (Su et al. 2004).

SNPsea tested the genes in linkage disequilibrium (LD) with 45 input SNPs associated with count of red blood cells (P <= 5e-8 in Europeans) (Harst et al. 2012). For each of the 79 cell types in the Gene Atlas, we tested a maximum of 1e7 null SNP sets where each null SNP was matched to an input SNP on the number of genes in LD.

We ran SNPsea like this:

```bash options=( --snps Redbloodcellcount-Harst2012-45SNPs.gwas --gene-matrix GeneAtlas2004.gct.gz --gene-intervals NCBIgenes2013.bed.gz --snp-intervals TGP2011.bed.gz --null-snps Lango2010.txt.gz --out out --slop 10e3 --threads 8 --null-snpsets 0 --min-observations 100 --max-iterations 1e7 ) snpsea ${options[*]}

Time elapsed: 2 minutes 36 seconds

Create the figure shown above:

snpsea-barplot out ```

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository.

Owner

Name: Kamil Slowikowski
Login: slowkow
Kind: user
Company: Mass General Brigham

Website: https://slowkow.com
Twitter: slowkow
Repositories: 22
Profile: https://github.com/slowkow

Computational biologist. Using transcriptomics to learn about inflammation and cancer.

Citation (CITATION)

Plain text
----------

Kamil Slowikowski, Xinli Hu, and Soumya Raychaudhuri. "SNPsea: an algorithm to
identify cell types, tissues and pathways affected by risk loci."
Bioinformatics (2014) 30 (17): 2496-2497.


Web
---

http://bioinformatics.oxfordjournals.org/content/30/17/2496.full

doi:10.1093/bioinformatics/btu326


BibTeX
------

@article{Slowikowski2014,
    author = {Slowikowski, Kamil and Hu, Xinli and Raychaudhuri, Soumya}, 
    title = {SNPsea: an algorithm to identify cell types, tissues and pathways
    affected by risk loci},
    volume = {30}, 
    number = {17}, 
    pages = {2496-2497}, 
    year = {2014}, 
    doi = {10.1093/bioinformatics/btu326}, 
    abstract = {Summary: We created a fast, robust and general C++
        implementation of a single-nucleotide polymorphism (SNP) set
        enrichment algorithm to identify cell types, tissues and pathways
        affected by risk loci. It tests trait-associated genomic loci for
        enrichment of specificity to conditions (cell types, tissues and
        pathways). We use a non-parametric statistical approach to compute
        empirical P-values by comparison with null SNP sets. As a proof of
        concept, we present novel applications of our method to four sets of
        genome-wide significant SNPs associated with red blood cell count,
        multiple sclerosis, celiac disease and HDL cholesterol.Availability
        and implementation: http://broadinstitute.org/mpg/snpseaContact:
        soumya@broadinstitute.orgSupplementary information: Supplementary data
        are available at Bioinformatics online.}, 
    URL = {http://bioinformatics.oxfordjournals.org/content/30/17/2496.abstract}, 
    eprint = {http://bioinformatics.oxfordjournals.org/content/30/17/2496.full.pdf+html}, 
    journal = {Bioinformatics} 
}

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Committers

Last synced: 8 months ago

All Time

Total Commits: 318
Total Committers: 1
Avg Commits per committer: 318.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Kamil Slowikowski	k**i@g**m	318

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 5
Total pull requests: 1
Average time to close issues: about 3 years
Average time to close pull requests: 6 months
Total issue authors: 4
Total pull request authors: 1
Average comments per issue: 4.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

snpsea

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

SNPsea: an algorithm to identify cell types, tissues, and pathways affected by risk loci

Citation

Description

Example

Time elapsed: 2 minutes 36 seconds

Create the figure shown above:

Contributing

Owner

Citation (CITATION)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels