chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse

https://github.com/chdb-io/chdb

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • â—‹
    DOI references
  • â—‹
    Academic publication links
  • â—‹
    Committers with academic emails
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

chdb clickhouse clickhouse-database clickhouse-server data-science database embedded-database olap python sql

Keywords from Contributors

cloud-native dbms distributed embedded lakehouse mpp self-hosted
Last synced: 6 months ago · JSON representation

Repository

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse

Basic Info
  • Host: GitHub
  • Owner: chdb-io
  • License: apache-2.0
  • Language: C++
  • Default Branch: main
  • Homepage: https://clickhouse.com/chdb
  • Size: 874 MB
Statistics
  • Stars: 2,455
  • Watchers: 34
  • Forks: 85
  • Open Issues: 53
  • Releases: 0
Topics
chdb clickhouse clickhouse-database clickhouse-server data-science database embedded-database olap python sql
Created almost 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing Funding License Code of conduct Citation Security Authors

README-zh.md

chDB joins the ClickHouse family +
[![](https://github.com/chdb-io/chdb/actions/workflows/build_linux_x86_wheels.yml/badge.svg?branch=main)](https://github.com/chdb-io/chdb/actions/workflows/build_linux_x86_wheels.yml) [![PyPI](https://img.shields.io/pypi/v/chdb.svg)](https://pypi.org/project/chdb/) [![Downloads](https://static.pepy.tech/badge/chdb)](https://pepy.tech/project/chdb) [![Discord](https://img.shields.io/discord/1098133460310294528?logo=Discord)](https://discord.gg/D2Daa2fM5K) [![Twitter](https://img.shields.io/twitter/url/http/shields.io.svg?style=social&label=Twitter)](https://twitter.com/chdb_io)

chDB

English

chDB ClickHouse SQL OLAP chDB: ClickHouse as a Function

  • Python SQL OLAP ClickHouse
  • ClickHouse
  • ParquetCSVJSONArrowORC 60
  • Python DB API 2.0 , example

chDB macOSx86_64 ARM64 Linux Python 3.8+ bash pip install chdb

python3 -m chdb SQL [OutputFormat] bash python3 -m chdb "SELECT 1,'abc'" Pretty

chdb DB-API

ParquetCSVJSONArrowORC 60

SQL

python import chdb res = chdb.query('select version()', 'Pretty'); print(res)

Parquet CSV

```python

tests/format_output.py

res = chdb.query('select * from file("data.parquet", Parquet)', 'JSON'); print(res) res = chdb.query('select * from file("data.csv", CSV)', 'CSV'); print(res) print(f"SQL read {res.rowsread()} rows, {res.bytesread()} bytes, elapsed {res.elapsed()} seconds") ```

Pandas DataFrame

```python

https://clickhouse.com/docs/en/interfaces/formats

chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe') ```

Pandas DataFrameParquet /Arrow /
### Pandas DataFrame ```python import chdb.dataframe as cdf import pandas as pd # Join 2 DataFrames df1 = pd.DataFrame({'a': [1, 2, 3], 'b': ["one", "two", "three"]}) df2 = pd.DataFrame({'c': [1, 2, 3], 'd': ["", "", ""]}) ret_tbl = cdf.query(sql="select * from __tbl1__ t1 join __tbl2__ t2 on t1.a = t2.c", tbl1=df1, tbl2=df2) print(ret_tbl) # Query on the DataFrame Table print(ret_tbl.query('select b, sum(a) from __table__ group by b')) ```

Session

```python from chdb import session as chs ## DB, Table, View sess = chs.Session() sess.query("CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic") sess.query("CREATE TABLE IF NOT EXISTS db_xxx.log_table_xxx (x String, y Int) ENGINE = Log;") sess.query("INSERT INTO db_xxx.log_table_xxx VALUES ('a', 1), ('b', 3), ('c', 2), ('d', 5);") sess.query( "CREATE VIEW db_xxx.view_xxx AS SELECT * FROM db_xxx.log_table_xxx LIMIT 4;" ) print("Select from view:\n") print(sess.query("SELECT * FROM db_xxx.view_xxx", "Pretty")) ``` : [test_stateful.py](tests/test_stateful.py)

Python DB-API 2.0

```python import chdb.dbapi as dbapi print("chdb driver version: {0}".format(dbapi.get_client_info())) conn1 = dbapi.connect() cur1 = conn1.cursor() cur1.execute('select version()') print("description: ", cur1.description) print("data: ", cur1.fetchone()) cur1.close() conn1.close() ```

Query with UDF(User Defined Functions)

```python from chdb.udf import chdb_udf from chdb import query @chdb_udf() def sum_udf(lhs, rhs): return int(lhs) + int(rhs) print(query("select sum_udf(12,22)")) ``` : [test_udf.py](tests/test_udf.py).

```python from chdb import session as chs sess = chs.Session() # 1 rows_cnt = 0 with sess.send_query("SELECT * FROM numbers(200000)", "CSV") as stream_result: for chunk in stream_result: rows_cnt += chunk.rows_read() print(rows_cnt) # 200000 # 2fetch() rows_cnt = 0 stream_result = sess.send_query("SELECT * FROM numbers(200000)", "CSV") while True: chunk = stream_result.fetch() if chunk is None: break rows_cnt += chunk.rows_read() print(rows_cnt) # 200000 # 3 rows_cnt = 0 stream_result = sess.send_query("SELECT * FROM numbers(200000)", "CSV") while True: chunk = stream_result.fetch() if chunk is None: break if rows_cnt > 0: stream_result.close() break rows_cnt += chunk.rows_read() print(rows_cnt) # 65409 # 4PyArrow RecordBatchReader import pyarrow as pa from deltalake import write_deltalake # arrow stream_result = sess.send_query("SELECT * FROM numbers(100000)", "Arrow") # RecordBatchReaderrows_per_batch=1000000 batch_reader = stream_result.record_batch(rows_per_batch=10000) # RecordBatchReaderDelta Lake write_deltalake( table_or_uri="./my_delta_table", data=batch_reader, mode="overwrite" ) stream_result.close() sess.close() ``` ****`StreamingResult``stream_result.close()``with` : [test_streaming_query.py](tests/test_streaming_query.py) [test_arrow_record_reader_deltalake.py](tests/test_arrow_record_reader_deltalake.py)

examples tests


  • Star
  • [ ]
  • [ ]
  • [ ]

VERSION-GUIDE.md

Apache 2.0 LICENSE

chDB ClickHouse chDB

Owner

  • Name: chdb-io
  • Login: chdb-io
  • Kind: organization

GitHub Events

Total
  • Create event: 23
  • Release event: 8
  • Issues event: 51
  • Watch event: 416
  • Issue comment event: 124
  • Push event: 95
  • Pull request review comment event: 9
  • Pull request review event: 13
  • Pull request event: 78
  • Fork event: 20
Last Year
  • Create event: 23
  • Release event: 8
  • Issues event: 51
  • Watch event: 416
  • Issue comment event: 124
  • Push event: 95
  • Pull request review comment event: 9
  • Pull request review event: 13
  • Pull request event: 78
  • Fork event: 20

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 1,445
  • Total Committers: 15
  • Avg Commits per committer: 96.333
  • Development Distribution Score (DDS): 0.253
Past Year
  • Commits: 869
  • Committers: 14
  • Avg Commits per committer: 62.071
  • Development Distribution Score (DDS): 0.22
Top Committers
Name Email Commits
auxten a****c@g****m 1,079
Daniel-Robbins e****s@g****m 180
Lorenzo Mangani l****i@g****m 72
nmreadelf f****g@h****m 36
Yunyu Lin m****l@y****n 22
allcontributors[bot] 4****] 20
wudidapaopao x****u@c****m 17
Nevin n****1@g****m 6
laodouya l****a@y****t 6
reema93jain 1****n 2
Alex Bocharov a****x@x****i 1
DemoYeti 1****i 1
Michael Eastham m****m@g****m 1
xinhuitian x****n@1****m 1
Michael Razuvaev r****v@y****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 141
  • Total pull requests: 249
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 10 days
  • Total issue authors: 71
  • Total pull request authors: 16
  • Average comments per issue: 2.52
  • Average comments per pull request: 0.51
  • Merged pull requests: 198
  • Bot issues: 0
  • Bot pull requests: 6
Past Year
  • Issues: 42
  • Pull requests: 88
  • Average time to close issues: 24 days
  • Average time to close pull requests: 4 days
  • Issue authors: 27
  • Pull request authors: 6
  • Average comments per issue: 1.48
  • Average comments per pull request: 0.2
  • Merged pull requests: 65
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • auxten (16)
  • danthegoodman1 (9)
  • lmangani (7)
  • yidigo (6)
  • djouallah (6)
  • wonb168 (5)
  • arnaudbriche (4)
  • blackrez (4)
  • LPauzies (4)
  • DanielMao1 (3)
  • l1t1 (3)
  • jovezhong (2)
  • agoncear-mwb (2)
  • zhuzhuyan93 (2)
  • nalgeon (2)
Pull Request Authors
  • auxten (136)
  • wudidapaopao (39)
  • nmreadelf (21)
  • Daniel-Robbins (15)
  • lmangani (10)
  • allcontributors[bot] (6)
  • yunyu (4)
  • reema93jain (3)
  • DemoYeti (2)
  • xinhuitian (2)
  • bocharov (2)
  • agoncear-mwb (2)
  • nevinpuri (2)
  • laodouya (2)
  • meastham (2)
Top Labels
Issue Labels
question (39) help wanted (18) feature request (14) bug (12) enhancement (10) test wanted (6) Arrow (4) Session (4) fixed (2) good first issue (2) Notebook (2) documentation (1) WASM (1) Json (1) duplicate (1)
Pull Request Labels
enhancement (2) documentation (1) Arrow (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 908,210 last-month
  • Total docker downloads: 5,783
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 2
    (may contain duplicates)
  • Total versions: 129
  • Total maintainers: 1
proxy.golang.org: github.com/chdb-io/chdb
  • Versions: 55
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 1.8%
Forks count: 3.5%
Average: 6.4%
Dependent packages count: 9.5%
Dependent repos count: 10.7%
Last synced: 6 months ago
pypi.org: chdb

chDB is an in-process SQL OLAP Engine powered by ClickHouse

  • Versions: 69
  • Dependent Packages: 1
  • Dependent Repositories: 2
  • Downloads: 908,210 Last month
  • Docker Downloads: 5,783
Rankings
Stargazers count: 2.5%
Downloads: 3.7%
Average: 7.1%
Forks count: 7.6%
Dependent packages count: 10.1%
Dependent repos count: 11.5%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/chdb-io/chdb/chdb/golang
  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 9.5%
Average: 10.1%
Dependent repos count: 10.7%
Last synced: 7 months ago