geneview

Genomics data visualization in Python by using matplotlib.

Keywords

bioinformatics bioinformatics-tool data-visualization genomics-data-visualization matplotlib plotting python visualization

Last synced: 6 months ago · JSON representation

Repository

Genomics data visualization in Python by using matplotlib.

Basic Info

Host: GitHub
Owner: ShujiaHuang
License: gpl-3.0
Language: Python
Default Branch: master
Homepage:
Size: 10.8 MB

Statistics

Stars: 65
Watchers: 5
Forks: 9
Open Issues: 2
Releases: 0

Topics

bioinformatics bioinformatics-tool data-visualization genomics-data-visualization matplotlib plotting python visualization

Created about 10 years ago · Last pushed about 2 years ago

Metadata Files

Readme License

geneview: A python package for visualizing genomics data

geneview is a library for making attractive and informative genomics graphics in Python. It is built on top of matplotlib and tightly integrated with the PyData stack, including support for numpy and pandas data structures. And now it is actively developed.

Some of the features that geneview offers are:

High-level abstractions for structuring grids of plots that let you easily build complex visualizations.
Functions for visualizing general genomics plots.

Installation

To install the released version, just do

bash pip install geneview

This command will install geneview and all the dependencies.

Quick start

Manhattan and Q-Q plot

We use a PLINK2.x association output data gwas.csv which is in geneview-data directory, as the input for the plots below. Here is the format preview of gwas:

|#CHROM|POS|ID|REF|ALT|A1|TEST|OBS_CT|BETA|SE|T_STAT|P| |:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| |chr1|904165|1_904165|G|A|A|ADD|282|-0.0908897|0.195476|-0.464967|0.642344| |chr1|1563691|1_1563691|T|G|G|ADD|271|0.447021|0.422194|1.0588|0.290715| |chr1|1707740|1_1707740|T|G|G|ADD|283|0.149911|0.161387|0.928888|0.353805| |chr1|2284195|1_2284195|T|C|C|ADD|275|-0.024704|0.13966|-0.176887|0.859739| |chr1|2779043|1_2779043|T|C|T|ADD|272|-0.111771|0.139929|-0.79877|0.425182| |chr1|2944527|1_2944527|G|A|A|ADD|276|-0.054472|0.166038|-0.32807|0.743129| |chr1|3803755|1_3803755|T|C|T|ADD|283|-0.0392713|0.128528|-0.305547|0.760193| |chr1|4121584|1_4121584|A|G|G|ADD|279|0.120902|0.127063|0.951511|0.342239| |chr1|4170048|1_4170048|C|T|T|ADD|280|0.250807|0.143423|1.74873|0.0815274| |chr1|4180842|1_4180842|C|T|T|ADD|277|0.209195|0.146122|1.43165|0.153469| |chr1|6053630|1_6053630|T|G|G|ADD|269|-0.210917|0.129069|-1.63414|0.103503| |chr1|7569602|1_7569602|C|T|C|ADD|281|-0.136834|0.13265|-1.03154|0.303249| |chr1|7575666|1_7575666|T|C|C|ADD|277|-0.231278|0.159448|-1.45049|0.14815|

Manhattan plot with default parameters

The manhattanplot() function in geneview takes a data frame with columns containing the chromosomal name/id, chromosomal position, P-value and optionally the name of SNP(e.g. rsID in dbSNP).

By default, manhattanplot() looks for column names corresponding to those outout by the plink2 association results, namely, #CHROM, POS, P, and ID, although different column names can be specificed by user. Calling manhattanplot() function with a data frame of GWAS results as the single argument draws a basic manhattan plot, defaulting to a darkblue and lightblue color scheme.

```python import matplotlib.pyplot as plt import geneview as gv

load data

df = gv.load_dataset("gwas")

Plot a basic manhattan plot with horizontal xtick labels and the figure will display in screen.

ax = gv.manhattanplot(data=df) plt.show() ```

Rotate the x-axis tick label by setting xticklabel_kws to avoid label overlap:

python ax = manhattanplot(data=df, xticklabel_kws={"rotation": "vertical"})

Or rotate the labels 45 degrees by setting xticklabel_kws={"rotation": 45}.

When run with default parameters, the manhattanplot() function draws horizontal lines drawn at $-log{10}{(1e-5)}$ for "suggestive" associations and $-log{10}{(5e-8)}$ for the "genome-wide significant" threshold. These can be move to different locations or turned off completely with the arguments suggestiveline and genomewideline, respectively.

python ax = manhattanplot(data=df, suggestiveline=None, # Turn off suggestiveline genomewideline=None, # Turn off genomewideline xticklabel_kws={"rotation": "vertical"})

The behavior of the manhattanplot function changes slightly when results from only a single chromosome is used. Here, instead of plotting alternating colors and chromosome ID on the x-axis, the SNP\'s position on the chromosome is plotted on the x-axis:

```python

plot only results of chromosome 8.

manhattanplot(data=df, CHR="chr8", xlabel="Chromosome 8") ```

manhattanplot() funcion has the ability to highlight SNPs with significant GWAS signal and annotate the Top SNP, which has the lowest P-value:

python ax = manhattanplot(data=df, sign_marker_p=1e-6, # highline the significant SNP with ``sign_marker_color`` color. is_annotate_topsnp=True, # annotate the top SNP xticklabel_kws={"rotation": "vertical"})

Additionally, highlighting SNPs of interest can be combined with limiting to a single chromosome to enable \"zooming\" into a particular region containing SNPs of interest.

Show a better manhattan plot

Futher graphical parameters can be passed to the manhattanplot() function to control thing like plot title, point character, size, colors, etc. Here is the example:

```python import matplotlib.pyplot as plt import geneview as gv

common parameters for plotting

pltparams = { "pdf.fonttype": 42, "font.sans-serif": "Arial", "legend.fontsize": 14, "axes.titlesize": 18, "axes.labelsize": 16, "xtick.labelsize": 14, "ytick.labelsize": 14 } plt.rcParams.update(pltparams)

Create a manhattan plot

f, ax = plt.subplots(figsize=(12, 4), facecolor="w", edgecolor="k") xtick = set(["chr" + i for i in list(map(str, range(1, 10))) + ["11", "13", "15", "18", "21", "X"]]) _ = gv.manhattanplot(data=df, marker=".", signmarkerp=1e-6, # Genome wide significant p-value signmarkercolor="r", snp="ID", # The column name of annotation information for top SNPs.

                 title="Test",
                 xtick_label_set=xtick,

                 xlabel="Chromosome",
                 ylabel=r"$-log_{10}{(P)}$",

                 sign_line_cols=["#D62728", "#2CA02C"],
                 hline_kws={"linestyle": "--", "lw": 1.3},

                 is_annotate_topsnp=True,
                 ld_block_size=50000,  # 50000 bp
                 text_kws={"fontsize": 12,
                           "arrowprops": dict(arrowstyle="-", color="k", alpha=0.6)},
                 ax=ax)

```

QQ plot with default parameters

The qqplot() function can be used to generate a Q-Q plot to visualize the distribution of association "P-value". The qqplot() function takes a vector of P-values as its the only required argument.

```python

import matplotlib.pyplot as plt import geneview as gv

load data

df = gv.load_dataset("gwas")

Plot a basic manhattan plot with horizontal xtick labels and the figure will display in screen.

ax = gv.qqplot(data=df["P"]) plt.show()

```

Show a better QQ plot

Futher graphical parameters can be passed to qqplot() to control the plot title, axis labels, point characters, colors, points sizes, etc. Here is the example:

```python import matplotlib.pyplot as plt import geneview as gv

f, ax = plt.subplots(figsize=(6, 6), facecolor="w", edgecolor="k") _ = gv.qqplot(data=df["P"], marker="o", title="Test", xlabel=r"Expected $-log{10}{(P)}$", ylabel=r"Observed $-log{10}{(P)}$", ax=ax) ```

More tutorials about GWAS

Admixture plot

Generate Admixture plot from the raw admixture output result:

simple example for admixtureplot

```python import matplotlib.pyplot as plt from geneview import load_dataset from geneview import admixtureplot

f, ax = plt.subplots(1, 1, figsize=(14, 2), facecolor="w", constrainedlayout=True, dpi=300) admixtureplot(data=loaddataset("admixtureoutput.Q"), populationinfo=loaddataset("admixturepopulation.info"), ylabel_kws={"rotation": 45, "ha": "right"}, ax=ax) ```

or

```python import matplotlib.pyplot as plt import geneview as gv

admixtureoutputfn = gv.loaddataset("admixtureoutput.Q") populationgroupfn = gv.loaddataset("admixturepopulation.info")

define the order for population to plot

popgroup1kg = ["KHV", "CDX", "CHS", "CHB", "JPT", "BEB", "STU", "ITU", "GIH", "PJL", "FIN", "CEU", "GBR", "IBS", "TSI", "PEL", "PUR", "MXL", "CLM", "ASW", "ACB", "GWD", "MSL", "YRI", "ESN", "LWK"]

f, ax = plt.subplots(1, 1, figsize=(14, 2), facecolor="w", constrainedlayout=True, dpi=300) gv.popgen.admixtureplot(data=admixtureoutputfn, populationinfo=populationgroupfn, edgewidth=2.0, grouporder=popgroup1kg, shufflepopsamplekws={"frac": 0.5}, ylabelkws={"rotation": 45, "ha": "right"}, ax=ax) ```

admixtureplot

The format of input files and more details about admixtureplot

Venn plots

Venn diagrams for 2, 3, 4, 5, 6 sets.

Minimal venn plot example

```python import geneview as gv

table = { "Dataset 1": {"A", "B", "D", "E"}, "Dataset 2": {"C", "F", "B", "G"}, "Dataset 3": {"J", "C", "K"} } ax = gv.venn(table)

```

Manual adjustment of petal labels

If necessary, the labels on the petals (i.e., various intersections in the Venn diagram) can be adjusted manually.

For this, generate_petal_labels() can be called first to get the petal_labels dictionary, which can be modified.

After modification, pass petal_labels to functions venn().

```python from numpy.random import choice import geneview as gv

dataset_dict = { name: set(choice(1000, 250, replace=False)) for name in list("ABCD") }

petallabels = gv.generatepetallabels(datasetdict.values(), fmt="{logic}\n({percentage:.1f}%)") ax = gv.venn(data=petallabels, names=list(datasetdict.keys()), legendusepetal_color=True)

```

More tutorials about venn

Dependencies

Geneview only supports Python 3 and no longer supports Python 2.

Installation requires numpy, scipy, pandas, and matplotlib. Some functions will use statsmodels.

We need the data structures: DataFrame and Series in pandas. It's easy and worth to learn, click here to see more detail tutorial for these two data type.

License

Released under a GPL-3.0 license

Owner

Name: Shujia Huang
Login: ShujiaHuang
Kind: user
Location: Guangzhou,China
Company: Guangzhou women and children‘s medical center

Website: https://scholar.google.com/citations?user=J4frGNMAAAAJ
Twitter: huangshujia
Repositories: 67
Profile: https://github.com/ShujiaHuang

A bioinformatician, human genome researcher and programmer.

GitHub Events

Total

Watch event: 4

Last Year

Watch event: 4

Committers

Last synced: over 2 years ago

All Time

Total Commits: 348
Total Committers: 3
Avg Commits per committer: 116.0
Development Distribution Score (DDS): 0.009

Past Year

Commits: 5
Committers: 1
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Shujia Huang	h**9@g**m	345
Shujia Huang	h**a@S**l	2
Shujia Huang	h**a@S**l	1

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 3
Total pull requests: 0
Average time to close issues: almost 5 years
Average time to close pull requests: N/A
Total issue authors: 3
Total pull request authors: 0
Average comments per issue: 1.33
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

zhaouu (1)
ghost (1)
amssljc (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 84 last-month

Total dependent packages: 1
Total dependent repositories: 2
Total versions: 22
Total maintainers: 1

pypi.org: geneview

Geneview: A python package for genomics data visualization.

Homepage: https://github.com/ShujiaHuang/geneview
Documentation: https://geneview.readthedocs.io/
License: BSD (3-clause)
Latest release: 0.2.1
published over 2 years ago

Versions: 22
Dependent Packages: 1
Dependent Repositories: 2
Downloads: 84 Last month

Rankings

Dependent packages count: 4.7%

Stargazers count: 9.3%

Dependent repos count: 11.6%

Forks count: 11.9%

Average: 12.2%

Downloads: 23.6%

Maintainers (1)

huangshujia

Last synced: 7 months ago

geneview

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

geneview: A python package for visualizing genomics data

Installation

Quick start

Manhattan and Q-Q plot

Manhattan plot with default parameters

load data

Plot a basic manhattan plot with horizontal xtick labels and the figure will display in screen.

plot only results of chromosome 8.

Show a better manhattan plot

common parameters for plotting

Create a manhattan plot

QQ plot with default parameters

load data

Plot a basic manhattan plot with horizontal xtick labels and the figure will display in screen.

Show a better QQ plot

Admixture plot

simple example for admixtureplot

define the order for population to plot

Venn plots

Minimal venn plot example

Manual adjustment of petal labels

Dependencies

License

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: geneview

Rankings

Maintainers (1)

Dependencies