https://github.com/NKeleher/statsframe
Customizable data and model summaries in Python.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Repository
Customizable data and model summaries in Python.
Basic Info
- Host: GitHub
- Owner: NKeleher
- License: mit
- Language: Python
- Default Branch: main
- Size: 908 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 9
- Releases: 0
Metadata Files
README.md
statsframe
Customizable data and model summaries in Python.
statsframe creates tables that provide descriptive statistics of
numeric and categorical data.
The goal is to provide a simple -- yet customizable -- way to summarize data and models in Python.
statsframe is heavily inspired by modelsummary
in R. The goal is not to replicate all that modelsummary does, but to provide
a way of achieving similar results in Python.
In order to achieve this, statsframe builds on the polars
library to produce tables that can be easily customized and exported to other formats.
Basic Usage
As an example of statsframe usage, the skim_frame function provides a
summary of a DataFrame (either polars.DataFrame or pandas.DataFrame).
The default summary statistics returned by statsframe.skim_frame() are unique values,
percentage missing, mean, standard deviation, minimum, median, and maximum.
Where possible, statsframe will print a table to the console and return a
polars DataFrame with the summary statistics. This allows for easy customization.
For example, the polars.DataFrame with statistics from statsframe can be
modified using the
Great Tables package.
```python import polars as pl import statsframe as sf
df = ( pl.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv") .drop("rownames") )
stats = sf.skim_frame(df)
Summary Statistics Rows: 32, Columns: 11 ┌──────┬────────────┬─────────────┬───────┬───────┬──────┬────────┬───────┐ │ ┆ Unique (#) ┆ Missing (%) ┆ Mean ┆ SD ┆ Min ┆ Median ┆ Max │ ╞══════╪════════════╪═════════════╪═══════╪═══════╪══════╪════════╪═══════╡ │ mpg ┆ 25 ┆ 0.0 ┆ 20.1 ┆ 6.0 ┆ 10.4 ┆ 19.2 ┆ 33.9 │ │ cyl ┆ 3 ┆ 0.0 ┆ 6.2 ┆ 1.8 ┆ 4.0 ┆ 6.0 ┆ 8.0 │ │ disp ┆ 27 ┆ 0.0 ┆ 230.7 ┆ 123.9 ┆ 71.1 ┆ 196.3 ┆ 472.0 │ │ hp ┆ 22 ┆ 0.0 ┆ 146.7 ┆ 68.6 ┆ 52.0 ┆ 123.0 ┆ 335.0 │ │ drat ┆ 22 ┆ 0.0 ┆ 3.6 ┆ 0.5 ┆ 2.8 ┆ 3.7 ┆ 4.9 │ │ wt ┆ 29 ┆ 0.0 ┆ 3.2 ┆ 1.0 ┆ 1.5 ┆ 3.3 ┆ 5.4 │ │ qsec ┆ 30 ┆ 0.0 ┆ 17.8 ┆ 1.8 ┆ 14.5 ┆ 17.7 ┆ 22.9 │ │ vs ┆ 2 ┆ 0.0 ┆ 0.4 ┆ 0.5 ┆ 0.0 ┆ 0.0 ┆ 1.0 │ │ am ┆ 2 ┆ 0.0 ┆ 0.4 ┆ 0.5 ┆ 0.0 ┆ 0.0 ┆ 1.0 │ │ gear ┆ 3 ┆ 0.0 ┆ 3.7 ┆ 0.7 ┆ 3.0 ┆ 4.0 ┆ 5.0 │ │ carb ┆ 6 ┆ 0.0 ┆ 2.8 ┆ 1.6 ┆ 1.0 ┆ 2.0 ┆ 8.0 │ └──────┴────────────┴─────────────┴───────┴───────┴──────┴────────┴───────┘ ```
We can achieve the same result above with a pandas DataFrame.
```python import pandas as pd import statsframe as sf
treesdf = pd.readcsv( "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/trees.csv" ).drop(columns=["rownames"])
treesstats = sf.skimframe(trees_df)
Summary Statistics Rows: 31, Columns: 3 ┌────────┬────────────┬─────────────┬──────┬──────┬──────┬────────┬──────┐ │ ┆ Unique (#) ┆ Missing (%) ┆ Mean ┆ SD ┆ Min ┆ Median ┆ Max │ ╞════════╪════════════╪═════════════╪══════╪══════╪══════╪════════╪══════╡ │ Girth ┆ 27 ┆ 0.0 ┆ 13.2 ┆ 3.1 ┆ 8.3 ┆ 12.9 ┆ 20.6 │ │ Height ┆ 21 ┆ 0.0 ┆ 76.0 ┆ 6.4 ┆ 63.0 ┆ 76.0 ┆ 87.0 │ │ Volume ┆ 30 ┆ 0.0 ┆ 30.2 ┆ 16.4 ┆ 10.2 ┆ 24.2 ┆ 77.0 │ └────────┴────────────┴─────────────┴──────┴──────┴──────┴────────┴──────┘
```
Contributing
If you encounter a bug, have usage questions, or want to share ideas to make
the statsframe package more useful, please feel free to file an
issue.
Code of Conduct
Please note that the statsframe project is released with a contributor code of conduct.
By participating in this project you agree to abide by its terms.
License
statsframe is licensed under the MIT license.
Governance
This project is primarily maintained by Niall Keleher. Contributions from other authors is welcome.
Owner
- Name: Niall Keleher
- Login: NKeleher
- Kind: user
- Location: Seattle, WA
- Company: @textioHQ
- Website: https://www.nkeleher.com/
- Repositories: 1
- Profile: https://github.com/NKeleher
GitHub Events
Total
- Push event: 38
Last Year
- Push event: 38
Dependencies
- babel 2.14.0
- colorama 0.4.6
- commonmark 0.9.1
- contourpy 1.2.0
- cycler 0.12.1
- exceptiongroup 1.2.0
- fonttools 4.47.2
- great-tables 0.1.5
- htmltools 0.5.1
- importlib-metadata 7.0.1
- importlib-resources 6.1.1
- iniconfig 2.0.0
- kiwisolver 1.4.5
- matplotlib 3.8.2
- numpy 1.26.3
- packaging 23.2
- pandas 2.1.4
- pillow 10.2.0
- pluggy 1.3.0
- polars 0.20.5
- pyparsing 3.1.1
- pytest 7.4.4
- python-dateutil 2.8.2
- pytz 2023.3.post1
- seaborn 0.13.1
- six 1.16.0
- tomli 2.0.1
- typing-extensions 4.9.0
- tzdata 2023.4
- webcolors 1.13
- zipp 3.17.0
- great-tables ^0.1.5
- pandas ^2.1.4
- polars ^0.20.5
- python ^3.9
- seaborn ^0.13.1
- pytest ^7.4.4 test
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- extractions/setup-just v1 composite
- peaceiris/actions-gh-pages v3 composite
- quarto-dev/quarto-actions/setup v2 composite
- snok/install-poetry v1 composite