pandas-dataclasses

:zap: pandas data creation by data classes

https://github.com/astropenguin/pandas-dataclasses

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

dataclasses pandas python specifications typing
Last synced: 6 months ago · JSON representation ·

Repository

:zap: pandas data creation by data classes

Basic Info
Statistics
  • Stars: 53
  • Watchers: 2
  • Forks: 3
  • Open Issues: 4
  • Releases: 16
Topics
dataclasses pandas python specifications typing
Created over 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

pandas-dataclasses

Release Python Downloads DOI Tests

pandas data creation by data classes

Overview

pandas-dataclass makes it easy to create pandas data (DataFrame and Series) by specifying their data types, attributes, and names using the Python's dataclass:

Click to see all imports ```python from dataclasses import dataclass from pandas_dataclasses import AsFrame, Data, Index ```

```python @dataclass class Weather(AsFrame): """Weather information."""

year: Index[int]
month: Index[int]
temp: Data[float]
wind: Data[float]

df = Weather.new( [2020, 2020, 2021, 2021, 2022], [1, 7, 1, 7, 1], [7.1, 24.3, 5.4, 25.9, 4.9], [2.4, 3.1, 2.3, 2.4, 2.6], ) ```

where df will become a DataFrame object like:

temp wind year month 2020 1 7.1 2.4 7 24.3 3.1 2021 1 5.4 2.3 7 25.9 2.4 2022 1 4.9 2.6

Features

  • Specifying data types and names of each element in pandas data
  • Specifying metadata stored in pandas data attributes (attrs)
  • Support for hierarchical index and columns
  • Support for custom factory for data creation
  • Support for full dataclass features
  • Support for static type check by mypy and Pyright

Installation

bash pip install pandas-dataclasses

How it works

pandas-dataclasses provides you the following features:

  • Type hints for dataclass fields (Attr, Data, Index) to specify the data type and name of each element in pandas data
  • Mix-in classes for dataclasses (As, AsFrame, AsSeries) to create pandas data by a classmethod (new) that takes the same arguments as dataclass initialization

When you call new, it will first create a dataclass object and then create a Series or DataFrame object from the dataclass object according the type hints and values in it. In the example above, df = Weather.new(...) is thus equivalent to:

Click to see all imports ```python from pandas_dataclasses import asframe ```

python obj = Weather([2020, ...], [1, ...], [7.1, ...], [2.4, ...]) df = asframe(obj)

where asframe is a conversion function. pandas-dataclasses does not touch the dataclass object creation itself; this allows you to fully customize your dataclass before conversion by the dataclass features (field, __post_init__, ...).

Basic usage

DataFrame creation

As shown in the example above, a dataclass that has the AsFrame (or AsDataFrame as an alias) mix-in will create DataFrame objects:

Click to see all imports ```python from dataclasses import dataclass from pandas_dataclasses import AsFrame, Data, Index ```

```python @dataclass class Weather(AsFrame): """Weather information."""

year: Index[int]
month: Index[int]
temp: Data[float]
wind: Data[float]

df = Weather.new(...) ```

where fields typed by Index are index fields, each value of which will become an index or a part of a hierarchical index of a DataFrame object. Fields typed by Data are data fields, each value of which will become a data column of a DataFrame object. Fields typed by other types are just ignored in the DataFrame creation.

Each data or index will be cast to the data type specified in a type hint like Index[int]. Use Any or None (like Index[Any]) if you do not want type casting. See also data typing rules for more examples.

By default, a field name (i.e. an argument name) is used for the name of corresponding data or index. See also custom naming and naming rules if you want customization.

Series creation

A dataclass that has the AsSeries mix-in will create Series objects:

Click to see all imports ```python from dataclasses import dataclass from pandas_dataclasses import AsSeries, Data, Index ```

```python @dataclass class Weather(AsSeries): """Weather information."""

year: Index[int]
month: Index[int]
temp: Data[float]

ser = Weather.new(...) ```

Unlike AsFrame, the second and subsequent data fields are ignored in the Series creation even if they exist. Other rules are the same as for the DataFrame creation.

Advanced usage

Metadata storing

Fields typed by Attr are attribute fields, each value of which will become an item of attributes of a DataFrame or a Series object:

Click to see all imports ```python from dataclasses import dataclass from pandas_dataclasses import AsFrame, Attr, Data, Index ```

```python @dataclass class Weather(AsFrame): """Weather information."""

year: Index[int]
month: Index[int]
temp: Data[float]
wind: Data[float]
loc: Attr[str] = "Tokyo"
lon: Attr[float] = 139.69167
lat: Attr[float] = 35.68944

df = Weather.new(...) ```

where df.attrs will become like:

python {"loc": "Tokyo", "lon": 139.69167, "lat": 35.68944}

Custom naming

The name of attribute, data, or index can be explicitly specified by adding a hashable annotation to the corresponding type:

Click to see all imports ```python from dataclasses import dataclass from typing import Annotated as Ann from pandas_dataclasses import AsFrame, Attr, Data, Index ```

```python @dataclass class Weather(AsFrame): """Weather information."""

year: Ann[Index[int], "Year"]
month: Ann[Index[int], "Month"]
temp: Ann[Data[float], "Temperature (deg C)"]
wind: Ann[Data[float], "Wind speed (m/s)"]
loc: Ann[Attr[str], "Location"] = "Tokyo"
lon: Ann[Attr[float], "Longitude (deg)"] = 139.69167
lat: Ann[Attr[float], "Latitude (deg)"] = 35.68944

df = Weather.new(...) ```

where df and df.attrs will become like:

Temperature (deg C) Wind speed (m/s) Year Month 2020 1 7.1 2.4 7 24.3 3.1 2021 1 5.4 2.3 7 25.9 2.4 2022 1 4.9 2.6

python {"Location": "Tokyo", "Longitude (deg)": 139.69167, "Latitude (deg)": 35.68944}

If an annotation is a format string, it will be formatted by a dataclass object before the data creation:

Click to see all imports ```python from dataclasses import dataclass from typing import Annotated as Ann from pandas_dataclasses import AsFrame, Data, Index ```

```python @dataclass class Weather(AsFrame): """Weather information."""

year: Ann[Index[int], "Year"]
month: Ann[Index[int], "Month"]
temp: Ann[Data[float], "Temperature ({.temp_unit})"]
wind: Ann[Data[float], "Wind speed ({.wind_unit})"]
temp_unit: str = "deg C"
wind_unit: str = "m/s"

df = Weather.new(..., tempunit="deg F", windunit="km/h") ```

where units of the temperature and the wind speed will be dynamically updated (see also naming rules).

Hierarchical columns

Adding tuple annotations to data fields will create DataFrame objects with hierarchical columns:

Click to see all imports ```python from dataclasses import dataclass from typing import Annotated as Ann from pandas_dataclasses import AsFrame, Data, Index ```

```python @dataclass class Weather(AsFrame): """Weather information."""

year: Ann[Index[int], "Year"]
month: Ann[Index[int], "Month"]
temp_avg: Ann[Data[float], ("Temperature (deg C)", "Average")]
temp_max: Ann[Data[float], ("Temperature (deg C)", "Maximum")]
wind_avg: Ann[Data[float], ("Wind speed (m/s)", "Average")]
wind_max: Ann[Data[float], ("Wind speed (m/s)", "Maximum")]

df = Weather.new(...) ```

where df will become like:

Temperature (deg C) Wind speed (m/s) Average Maximum Average Maximum Year Month 2020 1 7.1 11.1 2.4 8.8 7 24.3 27.7 3.1 10.2 2021 1 5.4 10.3 2.3 10.7 7 25.9 30.3 2.4 9.0 2022 1 4.9 9.4 2.6 8.8

Column names can be (explicitly) specified by dictionary annotations:

Click to see all imports ```python from dataclasses import dataclass from typing import Annotated as Ann from pandas_dataclasses import AsFrame, Data, Index ```

```python def name(meas: str, stat: str) -> dict[str, str]: """Create a dictionary annotation for a column name.""" return {"Measurement": meas, "Statistic": stat}

@dataclass class Weather(AsFrame): """Weather information."""

year: Ann[Index[int], "Year"]
month: Ann[Index[int], "Month"]
temp_avg: Ann[Data[float], name("Temperature (deg C)", "Average")]
temp_max: Ann[Data[float], name("Temperature (deg C)", "Maximum")]
wind_avg: Ann[Data[float], name("Wind speed (m/s)", "Average")]
wind_max: Ann[Data[float], name("Wind speed (m/s)", "Maximum")]

df = Weather.new(...) ```

where df will become like:

Measurement Temperature (deg C) Wind speed (m/s) Statistic Average Maximum Average Maximum Year Month 2020 1 7.1 11.1 2.4 8.8 7 24.3 27.7 3.1 10.2 2021 1 5.4 10.3 2.3 10.7 7 25.9 30.3 2.4 9.0 2022 1 4.9 9.4 2.6 8.8

If a tuple or dictionary annotation has format strings, they will also be formatted by a dataclass object (see also naming rules).

Multiple-item fields

Multiple (and possibly extra) attributes, data, or indices can be added by fields with corresponding type hints wrapped by Multiple:

Click to see all imports ```python from dataclasses import dataclass from pandas_dataclasses import AsFrame, Data, Index, Multiple ```

```python @dataclass class Weather(AsFrame): """Weather information."""

year: Index[int]
month: Index[int]
temp: Data[float]
wind: Data[float]
extra_index: Multiple[Index[int]]
extra_data: Multiple[Data[float]]

df = Weather.new( [2020, 2020, 2021, 2021, 2022], [1, 7, 1, 7, 1], [7.1, 24.3, 5.4, 25.9, 4.9], [2.4, 3.1, 2.3, 2.4, 2.6], extraindex={ "day": [1, 1, 1, 1, 1], "week": [2, 2, 4, 3, 5], }, extradata={ "humid": [65, 89, 57, 83, 52], "press": [1013.8, 1006.2, 1014.1, 1007.7, 1012.7], }, ) ```

where df will become like:

temp wind humid press year month day week 2020 1 1 2 7.1 2.4 65.0 1013.8 7 1 2 24.3 3.1 89.0 1006.2 2021 1 1 4 5.4 2.3 57.0 1014.1 7 1 3 25.9 2.4 83.0 1007.7 2022 1 1 5 4.9 2.6 52.0 1012.7

If multiple items of the same name exist, the last-defined one will be finally used. For example, if the extra_index field contains "month": [2, 8, 2, 8, 2], the values given by the month field will be overwritten.

Custom pandas factory

A custom class can be specified as a factory for the Series or DataFrame creation by As, the generic version of AsFrame and AsSeries. Note that the custom class must be a subclass of either pandas.Series or pandas.DataFrame:

Click to see all imports ```python import pandas as pd from dataclasses import dataclass from pandas_dataclasses import As, Data, Index ```

```python class CustomSeries(pd.Series): """Custom pandas Series."""

pass

@dataclass class Temperature(As[CustomSeries]): """Temperature information."""

year: Index[int]
month: Index[int]
temp: Data[float]

ser = Temperature.new(...) ```

where ser is statically regarded as CustomSeries and will become a CustomSeries object.

Generic Series type (Series[T]) is also supported, however, it is only for static the type check in the current pandas versions. In such cases, you can additionally give a factory that must work in runtime as a class argument:

Click to see all imports ```python import pandas as pd from dataclasses import dataclass from pandas_dataclasses import As, Data, Index ```

```python @dataclass class Temperature(As["pd.Series[float]"], factory=pd.Series): """Temperature information."""

year: Index[int]
month: Index[int]
temp: Data[float]

ser = Temperature.new(...) ```

where ser is statically regarded as Series[float] but will become a Series object in runtime.

Appendix

Data typing rules

The data type (dtype) of data or index is determined from the first Data or Index type of the corresponding field, respectively. The following table shows how the data type is inferred:

Click to see all imports ```python from typing import Any, Annotated as Ann, Literal as L from pandas_dataclasses import Data ```

Type hint | Inferred data type --- | --- Data[Any] | None (no type casting) Data[None] | None (no type casting) Data[int] | numpy.int64 Data[int \| str] | numpy.int64 Data[numpy.int32] | numpy.int32 Data[L["datetime64[ns]"]] | numpy.dtype("<M8[ns]") Data[L["category"]] | pandas.CategoricalDtype() Data[int] \| str | numpy.int64 Data[int] \| Data[float] | numpy.int64 Ann[Data[int], "spam"] | numpy.int64 Data[Ann[int, "spam"]] | numpy.int64

Naming rules

The name of attribute, data, or index is determined from the first annotation of the first Attr, Data, or Index type of the corresponding field, respectively. If the annotation is a format string or a tuple that has format strings, it (they) will be formatted by a dataclass object before the data creation. Otherwise, the field name (i.e. argument name) will be used. The following table shows how the name is inferred:

Click to see all imports ```python from typing import Any, Annotated as Ann from pandas_dataclasses import Data ```

Type hint | Inferred name --- | --- Data[Any] | (field name) Ann[Data[Any], ..., "spam"] | (field name) Ann[Data[Any], "spam"] | "spam" Ann[Data[Any], "spam", "ham"] | "spam" Ann[Data[Any], "spam"] \| Ann[str, "ham"] | "spam" Ann[Data[Any], "spam"] \| Ann[Data[float], "ham"] | "spam" Ann[Data[Any], "{.name}" | "{.name}".format(obj) Ann[Data[Any], ("spam", "ham")] | ("spam", "ham") Ann[Data[Any], ("{.name}", "ham")] | ("{.name}".format(obj), "ham")

where obj is a dataclass object that is expected to have obj.name.

Development roadmap

Release version | Features --- | --- v0.5 | Support for dynamic naming v0.6 | Support for extension array and dtype v0.7 | Support for hierarchical columns v0.8 | Support for mypy and callable pandas factory v0.9 | Support for Ellipsis (...) as an alias of field name v0.10 | Support for union type in type hints v0.11 | Support for Python 3.11 and drop support for Python 3.7 v0.12 | Support for multiple items received in a single field v1.0 | Initial major release (freezing public features until v2.0)

Owner

  • Name: Akio Taniguchi
  • Login: astropenguin
  • Kind: user
  • Location: Nagoya, Japan
  • Company: Nagoya University

Project assistant professor (LMT-FINER)

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: pandas-dataclasses
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Akio
    family-names: Taniguchi
    email: taniguchi.akio@gmail.com
    affiliation: Kitami Institute of Technology
    orcid: 'https://orcid.org/0000-0002-9695-6183'
identifiers:
  - type: doi
    value: 10.5281/zenodo.10652375
repository-code: 'https://github.com/astropenguin/pandas-dataclasses'
url: 'https://astropenguin.github.io/pandas-dataclasses/v1.0.0'
abstract: pandas data creation by data classes
keywords:
  - python
  - dataclasses
  - pandas
  - specifications
  - typing
license: MIT
version: 1.0.0
date-released: '2025-01-01'

GitHub Events

Total
  • Create event: 5
  • Release event: 2
  • Issues event: 5
  • Watch event: 5
  • Delete event: 2
  • Push event: 9
  • Pull request event: 6
Last Year
  • Create event: 5
  • Release event: 2
  • Issues event: 5
  • Watch event: 5
  • Delete event: 2
  • Push event: 9
  • Pull request event: 6

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 423
  • Total Committers: 1
  • Avg Commits per committer: 423.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 18
  • Committers: 1
  • Avg Commits per committer: 18.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Akio Taniguchi t****i@a****p 423
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 58
  • Total pull requests: 51
  • Average time to close issues: 9 days
  • Average time to close pull requests: about 16 hours
  • Total issue authors: 5
  • Total pull request authors: 1
  • Average comments per issue: 0.28
  • Average comments per pull request: 0.02
  • Merged pull requests: 49
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 3
  • Average time to close issues: 14 minutes
  • Average time to close pull requests: 8 minutes
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • astropenguin (54)
  • adri0 (1)
  • Kimonili (1)
  • westurner (1)
  • callumwebb (1)
Pull Request Authors
  • astropenguin (54)
Top Labels
Issue Labels
feature (36) release (13) bug (8) environment (3) docs (1)
Pull Request Labels
feature (32) release (15) bug (7) environment (5)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 412 last-month
  • Total dependent packages: 2
  • Total dependent repositories: 1
  • Total versions: 16
  • Total maintainers: 1
pypi.org: pandas-dataclasses

pandas data creation by data classes

  • Documentation: https://pandas-dataclasses.readthedocs.io/
  • License: MIT License Copyright (c) 2021-2025 Akio Taniguchi Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  • Latest release: 1.0.0
    published about 1 year ago
  • Versions: 16
  • Dependent Packages: 2
  • Dependent Repositories: 1
  • Downloads: 412 Last month
Rankings
Dependent packages count: 4.8%
Stargazers count: 10.2%
Downloads: 13.3%
Average: 14.5%
Dependent repos count: 21.5%
Forks count: 22.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

poetry.lock pypi
  • alabaster 0.7.12 develop
  • appnope 0.1.3 develop
  • asttokens 2.0.5 develop
  • atomicwrites 1.4.0 develop
  • attrs 21.4.0 develop
  • babel 2.10.3 develop
  • backcall 0.2.0 develop
  • beautifulsoup4 4.11.1 develop
  • black 22.3.0 develop
  • certifi 2022.6.15 develop
  • charset-normalizer 2.0.12 develop
  • click 8.1.3 develop
  • colorama 0.4.5 develop
  • decorator 5.1.1 develop
  • docutils 0.18.1 develop
  • executing 0.8.3 develop
  • idna 3.3 develop
  • imagesize 1.3.0 develop
  • importlib-metadata 4.11.4 develop
  • iniconfig 1.1.1 develop
  • ipython 7.34.0 develop
  • ipython 8.4.0 develop
  • jedi 0.18.1 develop
  • jinja2 3.1.2 develop
  • markdown-it-py 2.1.0 develop
  • markupsafe 2.1.1 develop
  • matplotlib-inline 0.1.3 develop
  • mdit-py-plugins 0.3.0 develop
  • mdurl 0.1.1 develop
  • mypy-extensions 0.4.3 develop
  • myst-parser 0.18.0 develop
  • nodeenv 1.6.0 develop
  • packaging 21.3 develop
  • pandas-stubs 1.2.0.62 develop
  • parso 0.8.3 develop
  • pathspec 0.9.0 develop
  • pexpect 4.8.0 develop
  • pickleshare 0.7.5 develop
  • platformdirs 2.5.2 develop
  • pluggy 1.0.0 develop
  • prompt-toolkit 3.0.29 develop
  • ptyprocess 0.7.0 develop
  • pure-eval 0.2.2 develop
  • py 1.11.0 develop
  • pydata-sphinx-theme 0.9.0 develop
  • pygments 2.12.0 develop
  • pyparsing 3.0.9 develop
  • pyright 1.1.255 develop
  • pytest 7.1.2 develop
  • pyyaml 6.0 develop
  • requests 2.28.0 develop
  • snowballstemmer 2.2.0 develop
  • soupsieve 2.3.2.post1 develop
  • sphinx 5.0.2 develop
  • sphinxcontrib-applehelp 1.0.2 develop
  • sphinxcontrib-devhelp 1.0.2 develop
  • sphinxcontrib-htmlhelp 2.0.0 develop
  • sphinxcontrib-jsmath 1.0.1 develop
  • sphinxcontrib-qthelp 1.0.3 develop
  • sphinxcontrib-serializinghtml 1.1.5 develop
  • stack-data 0.3.0 develop
  • tomli 2.0.1 develop
  • traitlets 5.3.0 develop
  • typed-ast 1.5.4 develop
  • urllib3 1.26.9 develop
  • wcwidth 0.2.5 develop
  • zipp 3.8.0 develop
  • morecopy 0.2.4
  • numpy 1.23.0
  • numpy 1.21.6
  • pandas 1.4.2
  • pandas 1.3.5
  • python-dateutil 2.8.2
  • pytz 2022.1
  • six 1.16.0
  • typing-extensions 4.2.0
pyproject.toml pypi
  • black ^22.3 develop
  • ipython --- - !ruby/hash:ActiveSupport::HashWithIndifferentAccess version: "^7.34" python: ">=3.7.1, <3.8" - !ruby/hash:ActiveSupport::HashWithIndifferentAccess version: "^8.4" python: ">=3.8, <3.11" develop
  • myst-parser ^0.18 develop
  • pandas-stubs ^1.2 develop
  • pydata-sphinx-theme ^0.9 develop
  • pyright ^1.1 develop
  • pytest ^7.1 develop
  • sphinx ^5.0 develop
  • morecopy ^0.2
  • numpy --- - !ruby/hash:ActiveSupport::HashWithIndifferentAccess version: ">=1.20, <1.22" python: ">=3.7.1, <3.8" - !ruby/hash:ActiveSupport::HashWithIndifferentAccess version: "^1.20" python: ">=3.8, <3.11"
  • pandas --- - !ruby/hash:ActiveSupport::HashWithIndifferentAccess version: ">=1.3, <1.4" python: ">=3.7.1, <3.8" - !ruby/hash:ActiveSupport::HashWithIndifferentAccess version: "^1.3" python: ">=3.8, <3.11"
  • python >=3.7.1, <3.11
  • typing-extensions ^4.1
.github/workflows/gh-pages.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/pypi.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/tests.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.devcontainer/Dockerfile docker
  • python 3.11-slim build