pecking

pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests

https://github.com/mmore500/pecking

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests

Basic Info

Host: GitHub
Owner: mmore500
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 332 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 1

Created over 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

README.md

:hatchingchick: **pecking_** identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests.

Free software: MIT license
Repository: https://github.com/mmore500/pecking
Documentation: https://github.com/mmore500/pecking/blob/master/README.md

Install

python3 -m pip install pecking

Example Usage

```python3

import pecking samples = [[1, 2, 3, 4, 5], [2, 3, 4, 4, 4], [8, 9, 7, 6, 4]] labels = ['Group 1', 'Group 2', 'Group 3'] pecking.skim_highest(samples, labels) ['Group 1'] ```

```python3 import functools from matplotlib import pyplot as plt import pecking import seaborn as sns

g = peckplot( sns.loaddataset("titanic"), score="age", x="who", y="age", hue="class", col="survived", legendkws=dict(prop={"size": 8}, bboxtoanchor=(0.88, 0.5)), skimmers=( functools.partial( skimhighest, alpha=0.05, minobs=8, nanpolicy="omit" ), functools.partial( skimlowest, alpha=0.05, minobs=8, nanpolicy="omit" ), ), skimlabels=["Oldest", "Youngest"], palette=sns.colorpalette("tab10")[:3], ) assert g is not None g.map_dataframe( sns.stripplot, x="who", y="age", hue="class", s=2, color="black", dodge=True, jitter=0.3, )

plt.show() ```

Example Plot

API

See function docstrings for full parameter and return value descriptions.

`pecking.skim_lowest`/`pecking.skim_highest`

Direct interface to the underlying statistical tests.

python3 def skim_highest( samples: typing.Sequence[typing.Sequence[float]], labels: typing.Optional[typing.Sequence[typing.Union[str, int]]] = None, alpha: float = 0.05, ) -> typing.List[typing.Union[str, int]]: """Identify the set of highest-ranked groups that are statistically indistinguishable amongst themselves based on a Kruskal-Wallis H-test followed by multiple Mann-Whitney U-tests."""

python3 def skim_highest( samples: typing.Sequence[typing.Sequence[float]], labels: typing.Optional[typing.Sequence[typing.Union[str, int]]] = None, alpha: float = 0.05, ) -> typing.List[typing.Union[str, int]]: """Identify the set of lowest-ranked groups that are statistically indistinguishable amongst themselves based on a Kruskal-Wallis H-test followed by multiple Mann-Whitney U-tests."""

`pecking.mask_skimmed_rows`

Tidy-data interface to calculate the results of skim_lowest/skim_highest among row groups in a DataFrame.

```python3 def maskskimmedrows( data: pd.DataFrame, score: str, groupbyinner: typing.Union[typing.Sequence[str], str], groupbyouter: typing.Union[typing.Sequence[str], str] = tuple(), skimmer: typing.Callable = skim_highest, **kwargs: dict, ) -> pd.Series: """Create a boolean mask for a DataFrame, identifying rows within significantly outstanding groups.

This function applies a two-level grouping to the input DataFrame: an outer
grouping ('groupby_outer') followed by an inner grouping ('groupby_inner').
For each inner group, it uses a 'skimmer' function to determine which rows
are part of significantly outstanding groups based on a specified 'score'
column. Only inner groups within the same outer group are compared.

Rows identified as members of significantly outstanding inner groups are
marked True in the returned Series, while all others are marked False."""

```

`pecking.peckplot`

Wraps seaborn.catplot to add hatched backgrounds behind the best and worst groups within the each row/col facet. (Comparison scope/pooling can be controlled with *_group parameters.)

```python3 def peckplot( data: pd.DataFrame, score: str, x: typing.Optional[str] = None, y: typing.Optional[str] = None, hue: typing.Optional[str] = None, col: typing.Optional[str] = None, row: typing.Optional[str] = None, xgroup: typing.Literal["inner", "outer", "ignore"] = "inner", ygroup: typing.Literal["inner", "outer", "ignore"] = "inner", huegroup: typing.Literal["inner", "outer", "ignore"] = "inner", colgroup: typing.Literal["inner", "outer", "ignore"] = "outer", rowgroup: typing.Literal["inner", "outer", "ignore"] = "outer", skimmers: typing.Sequence[typing.Callable] = ( functools.partial(skimhighest, alpha=0.05), functools.partial(skimlowest, alpha=0.05), ), skimhatches: typing.Sequence[str] = ("", "O.", "xx", "++"), skimlabels: typing.Sequence[str] = ("Best", "Worst"), skimtitle: typing.Optional[str] = "Rank", orient: typing.Literal["v", "h"] = "v", *kwargs: dict, ) -> sns.FacetGrid: """Boxplot the distribution of a score across various categories, highlighting the best (and/or worst) performing groups.

Uses nonparametric `skim_highest`/`skim_lowest` to distinguish the sets of
groups with statistically indistinguishable highest/lowest scores. Uses
`backstrip`'s `backplot` to add hatched backgrounds behind the best and
worst groups."""

```

Citing

If pecking contributes to a scientific publication, please cite it as

Matthew Andres Moreno. (2024). mmore500/pecking. Zenodo. https://doi.org/10.5281/zenodo.10701185

bibtex @software{moreno2024pecking, author = {Matthew Andres Moreno}, title = {mmore500/pecking}, month = feb, year = 2024, publisher = {Zenodo}, doi = {10.5281/zenodo.10701185}, url = {https://doi.org/10.5281/zenodo.10701185} }

Consider also citing matplotlib, seaborn, and SciPy. And don't forget to leave a star on GitHub!

Owner

Name: Matthew Andres Moreno
Login: mmore500
Kind: user
Location: East Lansing, MI
Company: @devosoft

Website: mmore500.github.io
Twitter: MorenoMathewA
Repositories: 43
Profile: https://github.com/mmore500

doctoral student, Computer Science and Engineering at Michigan State University

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
title: 'pecking: a Python library for nonparametric comparison between groups'
abstract: "pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests."
authors:
- family-names: Moreno
  given-names: Matthew Andres
  orcid: 0000-0003-4726-4479
date-released: 2024-02-24
doi: 10.5281/zenodo.10701185
license: MIT
repository-code: https://github.com/mmore500/pecking
url: "https://github.com/mmore500/pecking"

GitHub Events

Total

Last Year

Committers

Last synced: about 1 year ago

All Time

Total Commits: 36
Total Committers: 1
Avg Commits per committer: 36.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Matthew Andres Moreno	m**g@g**m	36

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 1
Total pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: about 6 hours
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mmore500 (1)

Pull Request Authors

mmore500 (6)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/ci.yaml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pypa/gh-action-pypi-publish release/v1 composite

pyproject.toml pypi

numpy *
scipy *

requirements.txt pypi

attrs ==23.2.0
black ==22.10.0
build ==1.0.3
bump2version ==1.0.1
click ==8.1.7
importlib-metadata ==7.0.1
iniconfig ==2.0.0
isort ==5.12.0
mypy-extensions ==1.0.0
numpy ==1.24.4
packaging ==23.2
pathspec ==0.12.1
pip-tools ==7.3.0
platformdirs ==4.2.0
pluggy ==1.4.0
py ==1.11.0
pyproject-hooks ==1.0.0
pytest ==6.2.5
ruff ==0.1.11
scipy ==1.10.1
toml ==0.10.2
tomli ==2.0.1
typing-extensions ==4.9.0
wheel ==0.42.0
zipp ==3.17.0

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

pecking

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Install

Example Usage

API

`pecking.skim_lowest`/`pecking.skim_highest`

`pecking.mask_skimmed_rows`

`pecking.peckplot`

Citing

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

pecking

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Install

Example Usage

API

pecking.skim_lowest/pecking.skim_highest

pecking.mask_skimmed_rows

pecking.peckplot

Citing

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

`pecking.skim_lowest`/`pecking.skim_highest`

`pecking.mask_skimmed_rows`

`pecking.peckplot`