pygwalker

PyGWalker: Turn your dataframe into an interactive UI for visual analysis

https://github.com/kanaries/pygwalker

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    1 of 25 committers (4.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary

Keywords

data-analysis data-exploration dataframe matplotlib pandas plotly tableau tableau-alternative visualization

Keywords from Contributors

augmented-analytics automated-data-analysis automated-visualization autovis causal-discovery causal-inference causality datamining eda k6s
Last synced: 4 months ago · JSON representation ·

Repository

PyGWalker: Turn your dataframe into an interactive UI for visual analysis

Basic Info
Statistics
  • Stars: 15,098
  • Watchers: 88
  • Forks: 806
  • Open Issues: 69
  • Releases: 65
Topics
data-analysis data-exploration dataframe matplotlib pandas plotly tableau tableau-alternative visualization
Created almost 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation

README.md

English | Español | Français | Deutsch | 中文 | Türkçe | 日本語 | 한국어 | Русский

PyGWalker: A Python Library for Exploratory Data Analysis with Visualization

PyPI version binder PyPI downloads conda-forge

discord invitation link Twitter Follow Join Kanaries on Slack

PyGWalker can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe into an interactive user interface for visual exploration.

PyGWalker (pronounced like "Pig Walker", just for fun) is named as an abbreviation of "Python binding of Graphic Walker". It integrates Jupyter Notebook with Graphic Walker, an open-source alternative to Tableau. It allows data scientists to visualize / clean / annotates the data with simple drag-and-drop operations and even natural language queries.

https://github.com/Kanaries/pygwalker/assets/22167673/2b940e11-cf8b-4cde-b7f6-190fb10ee44b

[!TIP] If you want more AI features, we also build runcell, an AI Code Agent in Jupyter that understands your code/data/cells and generate code, execute cells and take actions for you. It can be used in jupyter lab with pip install runcell

https://github.com/user-attachments/assets/9ec64252-864d-4bd1-8755-83f9b0396d38

Visit Google Colab, Kaggle Code or Graphic Walker Online Demo to test it out!

If you prefer using R, check GWalkR, the R wrapper of Graphic Walker. If you prefer a Desktop App that can be used offline and without any coding, check out PyGWalker Desktop.

Features

PyGWalker is a Python library that simplifies data analysis and visualization workflows by turning pandas DataFrames into interactive visual interfaces. It offers a variety of features that make it a powerful tool for data exploration: - ##### Interactive Data Exploration: - Drag-and-drop interface for easy visualization creation.   - Real-time updates as you make changes to the visualization. - Ability to zoom, pan, and filter the data.   - ##### Data Cleaning and Transformation: - Visual data cleaning tools to identify and remove outliers or inconsistencies.   - Ability to create new variables and features based on existing data.   - ##### Advanced Visualization Capabilities: - Support for various chart types (bar charts, line charts, scatter plots, etc.). - Customization options for colors, labels, and other visual elements.   - Interactive features like tooltips and drill-down capabilities.   - ##### Integration with Jupyter Notebooks: - Seamless integration with Jupyter Notebooks for a smooth workflow.   - ##### Open-Source and Free: - Available for free and allows for customization and extension.

Getting Started

Check our video tutorial about using pygwalker, pygwalker + streamlit and pygwalker + snowflake, How to explore data with PyGWalker in Python

| Run in Kaggle | Run in Colab | |--------------------------------------------------------------|--------------------------------------------------------| | Kaggle Code | Google Colab |

Setup pygwalker

Before using pygwalker, make sure to install the packages through the command line using pip or conda.

pip

bash pip install pygwalker

Note

For an early trial, you can install with pip install pygwalker --upgrade to keep your version up to date with the latest release or even pip install pygwalker --upgrade --pre to obtain latest features and bug-fixes.

Conda-forge

bash conda install -c conda-forge pygwalker or bash mamba install -c conda-forge pygwalker See conda-forge feedstock for more help.

Use pygwalker in Jupyter Notebook

Quick Start

Import pygwalker and pandas to your Jupyter Notebook to get started.

python import pandas as pd import pygwalker as pyg

You can use pygwalker without breaking your existing workflow. For example, you can call up PyGWalker with the dataframe loaded in this way:

python df = pd.read_csv('./bike_sharing_dc.csv') walker = pyg.walk(df)

That's it. Now you have an interactive UI to analyze and visualize data with simple drag-and-drop operations.

Cool things you can do with PyGwalker:

  • You can change the mark type into others to make different charts, for example, a line chart: graphic walker line chart

  • To compare different measures, you can create a concat view by adding more than one measure into rows/columns. graphic walker area chart

  • To make a facet view of several subviews divided by the value in dimension, put dimensions into rows or columns to make a facets view. graphic walker scatter chart

  • PyGWalker contains a powerful data table, which provides a quick view of data and its distribution, profiling. You can also add filters or change the data types in the table. pygwalker-data-preview

  • You can save the data exploration result to a local file

Better Practices

There are some important parameters you should know when using pygwalker:

  • spec: for save/load chart config (json string or file path)
  • kernel_computation: for using duckdb as computing engine which allows you to handle larger dataset faster in your local machine.
  • use_kernel_calc: Deprecated, use kernel_computation instead.

python df = pd.read_csv('./bike_sharing_dc.csv') walker = pyg.walk( df, spec="./chart_meta_0.json", # this json file will save your chart state, you need to click save button in ui mannual when you finish a chart, 'autosave' will be supported in the future. kernel_computation=True, # set `kernel_computation=True`, pygwalker will use duckdb as computing engine, it support you explore bigger dataset(<=100GB). )

Example in local notebook

Example in cloud notebook

Programmatic Export of Charts

After saving a chart from the UI, you can retrieve the image directly from Python.

```python walker = pyg.walk(df, spec="./chartmeta0.json")

edit the chart in the UI and click the save button

walker.savecharttofile("Chart 1", "chart1.svg", savetype="svg") pngbytes = walker.exportchartpng("Chart 1") svgbytes = walker.exportchartsvg("Chart 1") ```

Use pygwalker in Streamlit

Streamlit allows you to host a web version of pygwalker without figuring out details of how web application works.

Here are some of the app examples build with pygwalker and streamlit: + PyGWalker + streamlit for Bike sharing dataset + Earthquake Dashboard

```python from pygwalker.api.streamlit import StreamlitRenderer import pandas as pd import streamlit as st

Adjust the width of the Streamlit page

st.setpageconfig( page_title="Use Pygwalker In Streamlit", layout="wide" )

Add Title

st.title("Use Pygwalker In Streamlit")

You should cache your pygwalker renderer, if you don't want your memory to explode

@st.cacheresource def getpygrenderer() -> "StreamlitRenderer": df = pd.readcsv("./bikesharingdc.csv") # If you want to use feature of saving chart config, set spec_io_mode="rw" return StreamlitRenderer(df, spec="./gwconfig.json", specio_mode="rw")

renderer = getpygrenderer()

renderer.explorer() ```

API Reference

pygwalker.walk

| Parameter | Type | Default | Description | |------------------------|-----------------------------------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------| | dataset | Union[DataFrame, Connector] | - | The dataframe or connector to be used. | | gid | Union[int, str] | None | ID for the GraphicWalker container div, formatted as 'gwalker-{gid}'. | | env | Literal['Jupyter', 'JupyterWidget'] | 'JupyterWidget' | Environment using pygwalker. | | fieldspecs | Optional[Dict[str, FieldSpec]] | None | Specifications of fields. Will be automatically inferred from dataset if not specified. | | hidedatasourceconfig | bool | True | If True, hides DataSource import and export button. | | themekey | Literal['vega', 'g2'] | 'g2' | Theme type for the GraphicWalker. | | appearance | Literal['media', 'light', 'dark'] | 'media' | Theme setting. 'media' will auto-detect the OS theme. | | spec | str | "" | Chart configuration data. Can be a configuration ID, JSON, or remote file URL. | | usepreview | bool | True | If True, uses the preview function. | | kernel_computation | bool | False | If True, uses kernel computation for data. | | **kwargs | Any | - | Additional keyword arguments. |

Development

Refer it: local-development

Tested Environments

  • [x] Jupyter Notebook
  • [x] Google Colab
  • [x] Kaggle Code
  • [x] Jupyter Lab
  • [x] Jupyter Lite
  • [x] Databricks Notebook (Since version 0.1.4a0)
  • [x] Jupyter Extension for Visual Studio Code (Since version 0.1.4a0)
  • [x] Most web applications compatiable with IPython kernels. (Since version 0.1.4a0)
  • [x] Streamlit (Since version 0.1.4.9), enabled with pyg.walk(df, env='Streamlit')
  • [x] DataCamp Workspace (Since version 0.1.4a0)
  • [x] Panel. See panel-graphic-walker.
  • [x] marimo (Since version 0.4.9.11)
  • [ ] Hex Projects
  • [ ] ...feel free to raise an issue for more environments.

Configuration And Privacy Policy(pygwalker >= 0.3.10)

You can use pygwalker config to set your privacy configuration.

```bash $ pygwalker config --help

usage: pygwalker config [-h] [--set [key=value ...]] [--reset [key ...]] [--reset-all] [--list]

Modify configuration file. (default: ~/Library/Application Support/pygwalker/config.json) Available configurations:

  • privacy 'offline', 'update-only', 'events'. "offline": fully offline, no data is send or api is requested "update-only": only check whether this is a new version of pygwalker to update "events": share which events about which feature is used in pygwalker, it only contains events data about which feature you arrive for product optimization. No DATA YOU ANALYSIS IS SEND. Events data will bind with a unique id, which is generated by pygwalker when it is installed based on timestamp. We will not collect any other information about you.

  • kanaries_token 'your kanaries token'. your kanaries token, you can get it from https://kanaries.net. refer: https://space.kanaries.net/t/how-to-get-api-key-of-kanaries. by kanaries token, you can use kanaries service in pygwalker, such as share chart, share config.

options: -h, --help show this help message and exit --set [key=value ...] Set configuration. e.g. "pygwalker config --set privacy=update-only" --reset [key ...] Reset user configuration and use default values instead. e.g. "pygwalker config --reset privacy" --reset-all Reset all user configuration and use default values instead. e.g. "pygwalker config --reset-all" --list List current used configuration. ```

More details, refer it: How to set your privacy configuration?

License

Apache License 2.0

Contribution Guideline

You are encouraged to contribute to PyGWalker in any way that suits your interests. This may include: - Answering questions and providing support - Sharing ideas for new features - Reporting bugs and glitches - Contributing code to the project - Offering suggestions for website improvements and better documentation

Resources

PyGWalker Cloud is released! You can now save your charts to cloud, publish the interactive cell as a web app and use advanced GPT-powered features. Check out the PyGWalker Cloud for more details.

Owner

  • Name: Kanaries
  • Login: Kanaries
  • Kind: organization
  • Email: support@kanaries.org

Build data tools from the future

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: PyGWalker
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - website: 'https://kanaries.net/'
    name: Kanaries Open Source Community
repository-code: 'https://github.com/Kanaries/pygwalker'
url: 'https://kanaries.net/pygwalker'
abstract: >-
  PyGWalker is a Python Library for Exploratory Data
  Analysis with Visualization that can simplify your Jupyter
  Notebook data analysis and data visualization workflow, by
  turning your pandas dataframe into an interactive user
  interface for visual exploration.
keywords:
  - Data Analysis
  - Exploratory Data Analysis
  - Data Visualization tools
  - Python Library
  - interactive
license: Apache-2.0

GitHub Events

Total
  • Create event: 14
  • Release event: 2
  • Issues event: 45
  • Watch event: 2,084
  • Delete event: 2
  • Issue comment event: 73
  • Push event: 60
  • Pull request review comment event: 2
  • Pull request review event: 11
  • Pull request event: 41
  • Fork event: 159
Last Year
  • Create event: 14
  • Release event: 2
  • Issues event: 45
  • Watch event: 2,084
  • Delete event: 2
  • Issue comment event: 73
  • Push event: 60
  • Pull request review comment event: 2
  • Pull request review event: 11
  • Pull request event: 41
  • Fork event: 159

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 597
  • Total Committers: 25
  • Avg Commits per committer: 23.88
  • Development Distribution Score (DDS): 0.275
Past Year
  • Commits: 80
  • Committers: 12
  • Avg Commits per committer: 6.667
  • Development Distribution Score (DDS): 0.425
Top Committers
Name Email Commits
longxiaofei l****2@g****m 433
Asm.Def w****n@z****n 63
observedobserver 2****1@q****m 54
islxyqwe i****3@g****m 9
rickhg12hs 6****s 8
Bruk07 a****6@g****m 4
Vignesh Skanda a****a@g****m 3
ysj0226 y****j@k****g 3
DeastinY p****d@g****m 2
Srihari Thyagarajan h****3@g****m 2
jojocys y****y@g****m 2
0warning0error z****g@q****m 1
Abhinav 6****p 1
Akshay Agrawal a****7@g****m 1
BHznJNs 6****s 1
Bernd Schrooten b****n@d****m 1
Eduard l****3@g****m 1
Ian Mayo i****n@p****m 1
Julius Plehn j****n@m****m 1
Marc Skov Madsen m****a@o****m 1
RenChu Wang p****g@g****m 1
Swapnil Patel s****7@g****m 1
Viddesh 6****1 1
unknown d****k@s****m 1
蓝友和 3****2@q****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 140
  • Total pull requests: 140
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 12 days
  • Total issue authors: 114
  • Total pull request authors: 22
  • Average comments per issue: 2.54
  • Average comments per pull request: 0.23
  • Merged pull requests: 121
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 48
  • Pull requests: 48
  • Average time to close issues: 28 days
  • Average time to close pull requests: about 1 month
  • Issue authors: 37
  • Pull request authors: 12
  • Average comments per issue: 1.63
  • Average comments per pull request: 0.4
  • Merged pull requests: 41
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ObservedObserver (10)
  • longxiaofei (6)
  • vanbolin (6)
  • relakuman (4)
  • ilyanoskov (3)
  • Asm-Def (3)
  • JeevankumarDharmalingam (3)
  • rotcx (3)
  • DonYum (3)
  • Julius-Plehn (3)
  • MarcSkovMadsen (3)
  • thienphuoc86 (2)
  • dataxcount (2)
  • dickhfchan (2)
  • Json-Woo (2)
Pull Request Authors
  • longxiaofei (196)
  • Asm-Def (32)
  • ObservedObserver (11)
  • islxyqwe (11)
  • vignesh1507 (10)
  • blondon1 (9)
  • ysj0226 (3)
  • ikohu-66 (2)
  • Haleshot (2)
  • dwestjohn (2)
  • thomasbs17 (2)
  • BHznJNs (2)
  • rickhg12hs (2)
  • akshayka (2)
  • MarcSkovMadsen (2)
Top Labels
Issue Labels
bug (51) enhancement (19) P1 (17) fixed but needs feedback (15) graphic-walker (13) good first issue (11) P2 (5) Vote if you want it (4) linear (3) proposal (3) new idea to discuss (3) documentation (3) High priority (1)
Pull Request Labels
codex (5) documentation (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 129,759 last-month
  • Total docker downloads: 849
  • Total dependent packages: 5
  • Total dependent repositories: 10
  • Total versions: 218
  • Total maintainers: 1
pypi.org: pygwalker

pygwalker: turn your data into an interactive UI for data exploration and visualization

  • Versions: 218
  • Dependent Packages: 5
  • Dependent Repositories: 10
  • Downloads: 129,759 Last month
  • Docker Downloads: 849
Rankings
Downloads: 1.8%
Dependent packages count: 2.4%
Average: 3.2%
Docker downloads count: 3.9%
Dependent repos count: 4.6%
Maintainers (1)
Last synced: 4 months ago

Dependencies

pyproject.toml pypi
  • ipython *
  • jinja2 *
  • pandas *
  • python ^3.5
.github/workflows/auto-ci.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-node v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
.github/workflows/publish.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-node v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • pypa/gh-action-pypi-publish v1.8.5 composite
app/package.json npm
  • @rollup/plugin-commonjs ^24.0.x development
  • @rollup/plugin-replace ^5.0.x development
  • @rollup/plugin-terser ^0.4.x development
  • @rollup/plugin-typescript ^11.0.x development
  • @types/react ^17.x development
  • @types/react-dom ^17.x development
  • @types/react-syntax-highlighter ^15.5.7 development
  • @types/styled-components ^5.1.26 development
  • @vitejs/plugin-react ^3.1.x development
  • typescript ^4.9.5 development
  • vite ^4.1.4 development
  • vite-plugin-wasm ^3.2.2 development
  • @headlessui/react ^1.7.14
  • @heroicons/react ^2.0.8
  • @kanaries-temp/gw-dsl-parser 0.1.3
  • @kanaries/graphic-walker 0.4.12
  • autoprefixer ^10.3.5
  • buffer ^6.0.3
  • html-to-image ^1.11.11
  • mobx ^6.9.0
  • mobx-react-lite ^3.4.3
  • postcss ^8.3.7
  • react ^17.x
  • react-dom ^17.x
  • react-syntax-highlighter ^15.5.0
  • styled-components ^5.3.6
  • tailwindcss ^3.2.4
app/yarn.lock npm
  • 419 dependencies
environment.yml pypi
  • pygwalker >=0.1