pytables

A Python package to manage extremely large amounts of data

https://github.com/pytables/pytables

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

A Python package to manage extremely large amounts of data

Basic Info
  • Host: GitHub
  • Owner: PyTables
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage: http://www.pytables.org
  • Size: 39.4 MB
Statistics
  • Stars: 1,341
  • Watchers: 61
  • Forks: 276
  • Open Issues: 156
  • Releases: 19
Created about 15 years ago · Last pushed 12 months ago
Metadata Files
Readme Contributing Funding License Code of conduct Citation Security

README.rst

===========================================
 PyTables: hierarchical datasets in Python
===========================================

.. image:: https://badges.gitter.im/Join%20Chat.svg
   :alt: Join the chat at https://gitter.im/PyTables/PyTables
   :target: https://gitter.im/PyTables/PyTables

.. image:: https://github.com/PyTables/PyTables/workflows/CI/badge.svg
   :target: https://github.com/PyTables/PyTables/actions?query=workflow%3ACI

.. image:: https://img.shields.io/pypi/v/tables.svg
  :target: https://pypi.org/project/tables/

.. image:: https://img.shields.io/pypi/pyversions/tables.svg
  :target: https://pypi.org/project/tables/

.. image:: https://img.shields.io/pypi/l/tables
  :target: https://github.com/PyTables/PyTables/


:URL: http://www.pytables.org/


PyTables is a package for managing hierarchical datasets, designed
to efficiently cope with extremely large amounts of data.

It is built on top of the HDF5 library and the NumPy package. It
features an object-oriented interface that, combined with C extensions
for the performance-critical parts of the code (generated using
Cython), makes it a fast, yet extremely easy to use tool for
interactively saving and retrieving very large amounts of data. One
important feature of PyTables is that it optimizes memory and disk
resources so that they take much less space (between 3 to 5 times
and more if the data is compressible) than other solutions, like for
example, relational or object-oriented databases.

State-of-the-art compression
----------------------------

PyTables supports the `Blosc compressor `_ out of the box.
This allows for extremely high compression speed, while keeping decent
compression ratios. By doing so, I/O can be accelerated by a large extent, and
you may end up achieving higher performance than the bandwidth provided by your
I/O subsystem. See the
`Tuning The Chunksize section of the Optimization Tips chapter
`_
of the user documentation for some benchmarks.

Not a RDBMS replacement
-----------------------

PyTables is not designed to work as a relational database replacement,
but rather as a teammate. If you want to work with large datasets of
multidimensional data (for example, for multidimensional analysis), or
just provide a categorized structure for some portions of your
cluttered RDBS, then give PyTables a try. It works well for storing
data from data acquisition systems, simulation software, network
data monitoring systems (for example, traffic measurements of IP
packets on routers), or as a centralized repository for system logs,
to name only a few possible use cases.

Tables
------

A table is defined as a collection of records whose values are stored
in fixed-length fields. All records have the same structure, and all
values in each field have the same data type. The terms "fixed-length"
and strict "data types" seem to be a strange requirement for an
interpreted language like Python, but they serve a useful function if
the goal is to save very large quantities of data (such as
generated by many scientific applications, for example) in an
efficient manner that reduces demand on CPU time and I/O.

Arrays
------

There are other useful objects like arrays, enlargeable arrays, or
variable-length arrays that can cope with different use cases on your
project.

Easy to use
-----------

One of the principal objectives of PyTables is to be user-friendly.
In addition, many different iterators have been implemented to
make interactive work as productive as possible.

Platforms
---------

We use Linux on top of Intel32 and Intel64 boxes as the main
development platforms, but PyTables should be easy to compile/install
on other UNIX (including macOS) or Windows machines.

Compiling
---------

To compile PyTables, you will need a recent version of the HDF5
(C flavor) library, the Zlib compression library, and the NumPy and
Numexpr packages. Besides, PyTables comes with support for the Blosc, LZO,
and bzip2 compressor libraries. Blosc is mandatory, but PyTables comes
with Blosc sources so, although it is recommended to have Blosc
installed in your system, you don't absolutely need to install it
separately. LZO and bzip2 compression libraries are, however,
optional.

Make sure you have HDF5 version 1.10.5 or above. On Debian-based Linux
distributions, you can install it with::

   $ sudo apt install libhdf5-serial-dev

Installation
------------

1. Install with `pip `_:

       $ python3 -m pip install tables

2. To run the test suite::

       $ python3 -m tables.tests.test_all

   If there is some test that does not pass, please send us the
   complete output using the
   `GitHub Issue Tracker `_.


**Enjoy data!** -- The PyTables Team

.. Local Variables:
.. mode: text
.. coding: utf-8
.. fill-column: 70
.. End:

Owner

  • Name: PyTables
  • Login: PyTables
  • Kind: organization

GitHub Events

Total
  • Create event: 22
  • Release event: 1
  • Issues event: 17
  • Watch event: 33
  • Delete event: 20
  • Issue comment event: 58
  • Push event: 60
  • Gollum event: 1
  • Pull request review event: 27
  • Pull request review comment event: 2
  • Pull request event: 60
  • Fork event: 8
Last Year
  • Create event: 22
  • Release event: 1
  • Issues event: 17
  • Watch event: 33
  • Delete event: 20
  • Issue comment event: 58
  • Push event: 60
  • Gollum event: 1
  • Pull request review event: 27
  • Pull request review comment event: 2
  • Pull request event: 60
  • Fork event: 8

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 132
  • Total pull requests: 176
  • Average time to close issues: 8 months
  • Average time to close pull requests: 29 days
  • Total issue authors: 108
  • Total pull request authors: 40
  • Average comments per issue: 5.73
  • Average comments per pull request: 2.17
  • Merged pull requests: 147
  • Bot issues: 0
  • Bot pull requests: 51
Past Year
  • Issues: 4
  • Pull requests: 24
  • Average time to close issues: 3 days
  • Average time to close pull requests: about 17 hours
  • Issue authors: 4
  • Pull request authors: 3
  • Average comments per issue: 1.75
  • Average comments per pull request: 0.08
  • Merged pull requests: 20
  • Bot issues: 0
  • Bot pull requests: 22
Top Authors
Issue Authors
  • mgorny (6)
  • alpae (4)
  • andreabedini (3)
  • joycebrum (3)
  • ivilata (3)
  • froody (2)
  • kostasmarkakis (2)
  • dependabot[bot] (2)
  • jpjarnoux (2)
  • FrancescAlted (2)
  • maxnoe (2)
  • sunilshah (2)
  • joshmoore (2)
  • KoStehner (2)
  • ulfllorenz (2)
Pull Request Authors
  • dependabot[bot] (84)
  • KoStehner (54)
  • ivilata (19)
  • avalentino (19)
  • larsoner (13)
  • maxnoe (7)
  • FrancescAlted (7)
  • matham (6)
  • pyssling (4)
  • cbrnr (3)
  • xmatthias (3)
  • eumiro (3)
  • graingert (2)
  • Joshuaalbert (2)
  • joycebrum (2)
Top Labels
Issue Labels
defect (37) help wanted (24) enhancement (19) setup (12) good first issues (11) wheel (9) osx-arm64 (8) invalid (5) windows (4) website (3) dependencies (3) duplicate (3) python (2) documentation (2) osx (2) from_trac (2) tests (2) python3 (2) strings (1) iteration (1) CI (1) aarch64/arm (1)
Pull Request Labels
dependencies (87) python (60) enhancement (58) typing (29) github_actions (24) defect (17) documentation (8) CI (7) help wanted (4) python3 (3) setup (3) good first issues (2) osx-arm64 (2) tests (1)

Packages

  • Total packages: 5
  • Total downloads:
    • pypi 1,681,462 last-month
  • Total docker downloads: 363,811,515
  • Total dependent packages: 500
    (may contain duplicates)
  • Total dependent repositories: 11,445
    (may contain duplicates)
  • Total versions: 98
  • Total maintainers: 7
pypi.org: tables

Hierarchical datasets for Python

  • Versions: 47
  • Dependent Packages: 496
  • Dependent Repositories: 11,004
  • Downloads: 1,681,462 Last month
  • Docker Downloads: 363,811,515
Rankings
Dependent packages count: 0.0%
Dependent repos count: 0.1%
Average: 0.2%
Docker downloads count: 0.3%
Downloads: 0.4%
Last synced: 10 months ago
proxy.golang.org: github.com/PyTables/PyTables
  • Versions: 19
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Forks count: 1.7%
Stargazers count: 1.9%
Average: 6.0%
Dependent packages count: 9.6%
Dependent repos count: 10.8%
Last synced: 11 months ago
proxy.golang.org: github.com/pytables/pytables
  • Versions: 19
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Forks count: 1.7%
Stargazers count: 1.9%
Average: 6.0%
Dependent packages count: 9.6%
Dependent repos count: 10.8%
Last synced: 10 months ago
anaconda.org: pytables

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package.

  • Versions: 11
  • Dependent Packages: 4
  • Dependent Repositories: 440
Rankings
Dependent repos count: 8.0%
Dependent packages count: 11.1%
Average: 15.3%
Forks count: 20.7%
Stargazers count: 21.6%
Last synced: 10 months ago
anaconda.org: tables

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 1
Rankings
Forks count: 20.3%
Stargazers count: 21.4%
Average: 36.1%
Dependent packages count: 51.2%
Dependent repos count: 51.4%
Last synced: 10 months ago

Dependencies

environment.yml conda
  • hdf5
requirements.txt pypi
  • numexpr >=2.6.2
  • numpy >=1.19.0
  • packaging *
.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/ubuntu.yml actions
  • actions/checkout v3 composite
.github/workflows/wheels.yml actions
  • actions/cache v2 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • conda-incubator/setup-miniconda v2 composite
  • docker/setup-qemu-action v1 composite
pyproject.toml pypi
  • numexpr >= 2.6.2
  • numpy >= 1.19.0
  • packaging *
  • py-cpuinfo *