pecking

pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests

https://github.com/mmore500/pecking

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests

Basic Info
  • Host: GitHub
  • Owner: mmore500
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 332 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 1
Created about 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

PyPi CI GitHub stars DOI

:hatchingchick: **pecking_** identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests.

Install

python3 -m pip install pecking

Example Usage

```python3

import pecking samples = [[1, 2, 3, 4, 5], [2, 3, 4, 4, 4], [8, 9, 7, 6, 4]] labels = ['Group 1', 'Group 2', 'Group 3'] pecking.skim_highest(samples, labels) ['Group 1'] ```


```python3 import functools from matplotlib import pyplot as plt import pecking import seaborn as sns

g = peckplot( sns.loaddataset("titanic"), score="age", x="who", y="age", hue="class", col="survived", legendkws=dict(prop={"size": 8}, bboxtoanchor=(0.88, 0.5)), skimmers=( functools.partial( skimhighest, alpha=0.05, minobs=8, nanpolicy="omit" ), functools.partial( skimlowest, alpha=0.05, minobs=8, nanpolicy="omit" ), ), skimlabels=["Oldest", "Youngest"], palette=sns.colorpalette("tab10")[:3], ) assert g is not None g.map_dataframe( sns.stripplot, x="who", y="age", hue="class", s=2, color="black", dodge=True, jitter=0.3, )

plt.show() ```

Example Plot

API

See function docstrings for full parameter and return value descriptions.

pecking.skim_lowest/pecking.skim_highest

Direct interface to the underlying statistical tests.

python3 def skim_highest( samples: typing.Sequence[typing.Sequence[float]], labels: typing.Optional[typing.Sequence[typing.Union[str, int]]] = None, alpha: float = 0.05, ) -> typing.List[typing.Union[str, int]]: """Identify the set of highest-ranked groups that are statistically indistinguishable amongst themselves based on a Kruskal-Wallis H-test followed by multiple Mann-Whitney U-tests."""

python3 def skim_highest( samples: typing.Sequence[typing.Sequence[float]], labels: typing.Optional[typing.Sequence[typing.Union[str, int]]] = None, alpha: float = 0.05, ) -> typing.List[typing.Union[str, int]]: """Identify the set of lowest-ranked groups that are statistically indistinguishable amongst themselves based on a Kruskal-Wallis H-test followed by multiple Mann-Whitney U-tests."""

pecking.mask_skimmed_rows

Tidy-data interface to calculate the results of skim_lowest/skim_highest among row groups in a DataFrame.

```python3 def maskskimmedrows( data: pd.DataFrame, score: str, groupbyinner: typing.Union[typing.Sequence[str], str], groupbyouter: typing.Union[typing.Sequence[str], str] = tuple(), skimmer: typing.Callable = skim_highest, **kwargs: dict, ) -> pd.Series: """Create a boolean mask for a DataFrame, identifying rows within significantly outstanding groups.

This function applies a two-level grouping to the input DataFrame: an outer
grouping ('groupby_outer') followed by an inner grouping ('groupby_inner').
For each inner group, it uses a 'skimmer' function to determine which rows
are part of significantly outstanding groups based on a specified 'score'
column. Only inner groups within the same outer group are compared.

Rows identified as members of significantly outstanding inner groups are
marked True in the returned Series, while all others are marked False."""

```

pecking.peckplot

Wraps seaborn.catplot to add hatched backgrounds behind the best and worst groups within the each row/col facet. (Comparison scope/pooling can be controlled with *_group parameters.)

```python3 def peckplot( data: pd.DataFrame, score: str, x: typing.Optional[str] = None, y: typing.Optional[str] = None, hue: typing.Optional[str] = None, col: typing.Optional[str] = None, row: typing.Optional[str] = None, xgroup: typing.Literal["inner", "outer", "ignore"] = "inner", ygroup: typing.Literal["inner", "outer", "ignore"] = "inner", huegroup: typing.Literal["inner", "outer", "ignore"] = "inner", colgroup: typing.Literal["inner", "outer", "ignore"] = "outer", rowgroup: typing.Literal["inner", "outer", "ignore"] = "outer", skimmers: typing.Sequence[typing.Callable] = ( functools.partial(skimhighest, alpha=0.05), functools.partial(skimlowest, alpha=0.05), ), skimhatches: typing.Sequence[str] = ("", "O.", "xx", "++"), skimlabels: typing.Sequence[str] = ("Best", "Worst"), skimtitle: typing.Optional[str] = "Rank", orient: typing.Literal["v", "h"] = "v", *kwargs: dict, ) -> sns.FacetGrid: """Boxplot the distribution of a score across various categories, highlighting the best (and/or worst) performing groups.

Uses nonparametric `skim_highest`/`skim_lowest` to distinguish the sets of
groups with statistically indistinguishable highest/lowest scores. Uses
`backstrip`'s `backplot` to add hatched backgrounds behind the best and
worst groups."""

```

Citing

If pecking contributes to a scientific publication, please cite it as

Matthew Andres Moreno. (2024). mmore500/pecking. Zenodo. https://doi.org/10.5281/zenodo.10701185

bibtex @software{moreno2024pecking, author = {Matthew Andres Moreno}, title = {mmore500/pecking}, month = feb, year = 2024, publisher = {Zenodo}, doi = {10.5281/zenodo.10701185}, url = {https://doi.org/10.5281/zenodo.10701185} }

Consider also citing matplotlib, seaborn, and SciPy. And don't forget to leave a star on GitHub!

Owner

  • Name: Matthew Andres Moreno
  • Login: mmore500
  • Kind: user
  • Location: East Lansing, MI
  • Company: @devosoft

doctoral student, Computer Science and Engineering at Michigan State University

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
title: 'pecking: a Python library for nonparametric comparison between groups'
abstract: "pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests."
authors:
- family-names: Moreno
  given-names: Matthew Andres
  orcid: 0000-0003-4726-4479
date-released: 2024-02-24
doi: 10.5281/zenodo.10701185
license: MIT
repository-code: https://github.com/mmore500/pecking
url: "https://github.com/mmore500/pecking"

GitHub Events

Total
Last Year

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 36
  • Total Committers: 1
  • Avg Commits per committer: 36.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Matthew Andres Moreno m****g@g****m 36

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 1
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: about 6 hours
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mmore500 (1)
Pull Request Authors
  • mmore500 (6)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/ci.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish release/v1 composite
pyproject.toml pypi
  • numpy *
  • scipy *
requirements.txt pypi
  • attrs ==23.2.0
  • black ==22.10.0
  • build ==1.0.3
  • bump2version ==1.0.1
  • click ==8.1.7
  • importlib-metadata ==7.0.1
  • iniconfig ==2.0.0
  • isort ==5.12.0
  • mypy-extensions ==1.0.0
  • numpy ==1.24.4
  • packaging ==23.2
  • pathspec ==0.12.1
  • pip-tools ==7.3.0
  • platformdirs ==4.2.0
  • pluggy ==1.4.0
  • py ==1.11.0
  • pyproject-hooks ==1.0.0
  • pytest ==6.2.5
  • ruff ==0.1.11
  • scipy ==1.10.1
  • toml ==0.10.2
  • tomli ==2.0.1
  • typing-extensions ==4.9.0
  • wheel ==0.42.0
  • zipp ==3.17.0
setup.py pypi