https://github.com/broadinstitute/tensorqtl

Ultrafast GPU-enabled QTL mapper

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: nature.com
✓
Committers with academic emails
3 of 5 committers (60.0%) from academic institutions
✓
Institutional organization owner
Organization broadinstitute has institutional domain (www.broadinstitute.org)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Ultrafast GPU-enabled QTL mapper

Basic Info

Host: GitHub
Owner: broadinstitute
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage:
Size: 39.1 MB

Statistics

Stars: 190
Watchers: 13
Forks: 56
Open Issues: 56
Releases: 10

Created almost 8 years ago · Last pushed 12 months ago

Metadata Files

Readme License

tensorQTL

tensorQTL is a GPU-enabled QTL mapper, achieving ~200-300 fold faster cis- and trans-QTL mapping compared to CPU-based implementations.

If you use tensorQTL in your research, please cite the following paper: Taylor-Weiner, Aguet, et al., Genome Biol., 2019.
Empirical beta-approximated p-values are computed as described in Ongen et al., Bioinformatics, 2016.

Install

You can install tensorQTL using pip: pip3 install tensorqtl or directly from this repository: ``` $ git clone git@github.com:broadinstitute/tensorqtl.git $ cd tensorqtl

install into a new virtual environment and load

$ mamba env create -f install/tensorqtl_env.yml $ conda activate tensorqtl To install the latest version from this repository, run pip install pip@git+https://github.com/broadinstitute/tensorqtl.git ```

To use PLINK 2 binary files (pgen/pvar/psam), pgenlib must be installed using either pip install Pgenlib (this is included in tensorqtl_env.yml above), or from the source : git clone git@github.com:chrchang/plink-ng.git cd plink-ng/2.0/Python/ python3 setup.py build_ext python3 setup.py install

Requirements

tensorQTL requires an environment configured with a GPU for optimal performance, but can also be run on a CPU. Instructions for setting up a virtual machine on Google Cloud Platform are provided here.

Input formats

Three inputs are required for QTL analyses with tensorQTL: genotypes, phenotypes, and covariates. * Phenotypes must be provided in BED format, with a single header line starting with # and the first four columns corresponding to: chr, start, end, phenotype_id, with the remaining columns corresponding to samples (the identifiers must match those in the genotype input). In addition to .bed/.bed.gz, BED input in .parquet is also supported. The BED file can specify the center of the cis-window (usually the TSS), with start == end-1, or alternatively, start and end positions, in which case the cis-window is [start-window, end+window]. A function for generating a BED template from a gene annotation in GTF format is available in pyqtl (io.gtf_to_tss_bed). * Covariates can be provided as a tab-delimited text file (covariates x samples) or dataframe (samples x covariates), with row and column headers. * Genotypes should preferrably be in PLINK2 pgen/pvar/psam format, which can be generated from a VCF as follows: plink2 \ --output-chr chrM \ --vcf ${plink_prefix_path}.vcf.gz \ --out ${plink_prefix_path} If using --make-bed with PLINK 1.9 or earlier, add the --keep-allele-order flag.

Alternatively, the genotypes can be provided in bed/bim/fam format, or as a parquet dataframe (genotypes x samples).

The examples notebook below contains examples of all input files. The input formats for phenotypes and covariates are identical to those used by FastQTL.

Examples

For examples illustrating cis- and trans-QTL mapping, please see tensorqtl_examples.ipynb.

Running tensorQTL

This section describes how to run the different modes of tensorQTL, both from the command line and within Python. For a full list of options, run python3 -m tensorqtl --help

Loading input files

This section is only relevant when running tensorQTL in Python. The following imports are required: import pandas as pd import tensorqtl from tensorqtl import genotypeio, cis, trans Phenotypes and covariates can be loaded as follows: phenotype_df, phenotype_pos_df = tensorqtl.read_phenotype_bed(phenotype_bed_file) covariates_df = pd.read_csv(covariates_file, sep='\t', index_col=0).T # samples x covariates Genotypes can be loaded as follows, where plink_prefix_path is the path to the VCF in PLINK format (excluding .bed/.bim/.fam extensions): ``` pr = genotypeio.PlinkReader(plinkprefixpath)

load genotypes and variants into data frames

genotypedf = pr.loadgenotypes() variantdf = pr.bim.setindex('snp')[['chrom', 'pos']] To save memory when using genotypes for a subset of samples, a subset of samples can be loaded (this is not strictly necessary, since tensorQTL will select the relevant samples from `genotype_df` otherwise): pr = genotypeio.PlinkReader(plinkprefixpath, selectsamples=phenotypedf.columns) ```

cis-QTL mapping: permutations

This is the main mode for cis-QTL mapping. It generates phenotype-level summary statistics with empirical p-values, enabling calculation of genome-wide FDR. In Python: cis_df = cis.map_cis(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df) tensorqtl.calculate_qvalues(cis_df, qvalue_lambda=0.85) Shell command: python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \ --covariates ${covariates_file} \ --mode cis ${prefix} specifies the output file name.

cis-QTL mapping: summary statistics for all variant-phenotype pairs

In Python: cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, prefix, covariates_df, output_dir='.') Shell command: python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \ --covariates ${covariates_file} \ --mode cis_nominal The results are written to a parquet file for each chromosome. These files can be read using pandas: df = pd.read_parquet(file_name)

cis-QTL mapping: conditionally independent QTLs

This mode maps conditionally independent cis-QTLs using the stepwise regression procedure described in GTEx Consortium, 2017. The output from the permutation step (see map_cis above) is required. In Python: indep_df = cis.map_independent(genotype_df, variant_df, cis_df, phenotype_df, phenotype_pos_df, covariates_df) Shell command: python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \ --covariates ${covariates_file} \ --cis_output ${prefix}.cis_qtl.txt.gz \ --mode cis_independent

cis-QTL mapping: interactions

Instead of mapping the standard linear model (p ~ g), this mode includes an interaction term (p ~ g + i + gi) and returns full summary statistics for the model. The interaction term is a tab-delimited text file or dataframe mapping sample ID to interaction value(s) (if multiple interactions are used, the file must include a header with variable names). With the run_eigenmt=True option, eigenMT-adjusted p-values are computed. In Python: cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, prefix, covariates_df=covariates_df, interaction_df=interaction_df, maf_threshold_interaction=0.05, run_eigenmt=True, output_dir='.', write_top=True, write_stats=True) The input options write_top and write_stats control whether the top association per phenotype and full summary statistics, respectively, are written to file.

Shell command: python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \ --covariates ${covariates_file} \ --interaction ${interactions_file} \ --best_only \ --mode cis_nominal The option --best_only disables output of full summary statistics.

Full summary statistics are saved as parquet files for each chromosome, in ${output_dir}/${prefix}.cis_qtl_pairs.${chr}.parquet, and the top association for each phenotype is saved to ${output_dir}/${prefix}.cis_qtl_top_assoc.txt.gz. In these files, the columns b_g, b_g_se, pval_g are the effect size, standard error, and p-value of g in the model, with matching columns for i and gi. In the *.cis_qtl_top_assoc.txt.gz file, tests_emt is the effective number of independent variants in the cis-window estimated with eigenMT, i.e., based on the eigenvalue decomposition of the regularized genotype correlation matrix (Davis et al., AJHG, 2016). pval_emt = pval_gi * tests_emt, and pval_adj_bh are the Benjamini-Hochberg adjusted p-values corresponding to pval_emt.

trans-QTL mapping

This mode computes nominal associations between all phenotypes and genotypes. tensorQTL generates sparse output by default (associations with p-value < 1e-5). cis-associations are filtered out. The output is in parquet format, with four columns: phenotypeid, variantid, pval, maf. In Python: ``` transdf = trans.maptrans(genotypedf, phenotypedf, covariatesdf, returnsparse=True, pvalthreshold=1e-5, mafthreshold=0.05, batch_size=20000)

remove cis-associations

transdf = trans.filtercis(transdf, phenotypeposdf.T.todict(), variantdf, window=5000000) Shell command: python3 -m tensorqtl ${plinkprefixpath} ${expressionbed} ${prefix} \ --covariates ${covariates_file} \ --mode trans ```

Owner

Name: Broad Institute
Login: broadinstitute
Kind: organization
Location: Cambridge, MA

Website: http://www.broadinstitute.org/
Twitter: broadinstitute
Repositories: 1,083
Profile: https://github.com/broadinstitute

Broad Institute of MIT and Harvard

GitHub Events

Total

Create event: 1
Release event: 1
Issues event: 49
Watch event: 21
Issue comment event: 70
Push event: 6
Pull request event: 1
Fork event: 5

Last Year

Create event: 1
Release event: 1
Issues event: 49
Watch event: 21
Issue comment event: 70
Push event: 6
Pull request event: 1
Fork event: 5

Committers

Last synced: about 1 year ago

All Time

Total Commits: 323
Total Committers: 5
Avg Commits per committer: 64.6
Development Distribution Score (DDS): 0.025

Past Year

Commits: 14
Committers: 2
Avg Commits per committer: 7.0
Development Distribution Score (DDS): 0.071

Top Committers

Name	Email	Commits
Francois Aguet	f**s@b**g	315
faguet	f**t@i**m	4
susie-song	7****g	2
Thouis (Ray) Jones	t**s@b**g	1
Khalid Shakir	k**r@b**g	1

Committer Domains (Top 20 + Academic)

broadinstitute.org: 3 illumina.com: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 123
Total pull requests: 8
Average time to close issues: 4 months
Average time to close pull requests: 5 days
Total issue authors: 84
Total pull request authors: 5
Average comments per issue: 2.64
Average comments per pull request: 0.13
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 27
Pull requests: 2
Average time to close issues: 10 days
Average time to close pull requests: N/A
Issue authors: 23
Pull request authors: 1
Average comments per issue: 0.41
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

liaguiro (9)
maxozo (7)
lintingyi2014 (5)
snesic (4)
ariadnacilleros (4)
cora97846 (4)
guandailu (4)
JonMarten (4)
daniel-munro (3)
JingjingBai2021 (3)
Junjun-Xu (3)
zhangpicb (3)
anglixue (3)
idinsmore1 (2)
iamjli (2)

Pull Request Authors

francois-a (2)
royoelen (2)
susie-song (2)
anglixue (1)
thouis (1)
jvierstra (1)
kshakir (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 612 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 12
Total maintainers: 1

pypi.org: tensorqtl

GPU-accelerated QTL mapper

Documentation: https://tensorqtl.readthedocs.io/
License: BSD 3-Clause License Copyright (c) 2018-2019, The Broad Institute, Inc. and The General Hospital Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Latest release: 1.0.10
published over 1 year ago

Versions: 12
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 612 Last month

Rankings

Dependent packages count: 10.1%

Downloads: 14.4%

Average: 15.4%

Dependent repos count: 21.6%

Maintainers (1)

francois-a

Last synced: 10 months ago

Dependencies

Dockerfile docker

nvidia/cuda 11.3.1-cudnn8-runtime-ubuntu20.04 build

install/requirements.txt pypi

Cython *
ipython *
jupyter *
matplotlib *
numpy *
pandas *
pandas-plink *
pyarrow *
qtl *
rpy2 *
scipy *
torch *

setup.py pypi

Cython *
numpy *
pandas *
pandas-plink *
pyarrow *
qtl *
scipy *
torch *

https://github.com/broadinstitute/tensorqtl

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

tensorQTL

Install

install into a new virtual environment and load

Requirements

Input formats

Examples

Running tensorQTL

Loading input files

load genotypes and variants into data frames

cis-QTL mapping: permutations

cis-QTL mapping: summary statistics for all variant-phenotype pairs

cis-QTL mapping: conditionally independent QTLs

cis-QTL mapping: interactions

trans-QTL mapping

remove cis-associations

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: tensorqtl

Rankings

Maintainers (1)

Dependencies