https://github.com/tomkyle/binning

Determine optimal number of bins š’Œ for histogram creation and optimal bin width š’‰ using various statistical methods.

https://github.com/tomkyle/binning

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • ā—‹
    CITATION.cff file
  • āœ“
    codemeta.json file
    Found codemeta.json file
  • āœ“
    .zenodo.json file
    Found .zenodo.json file
  • āœ“
    DOI references
    Found 3 DOI reference(s) in README
  • ā—‹
    Academic publication links
  • ā—‹
    Academic email domains
  • ā—‹
    Institutional organization owner
  • ā—‹
    JOSS paper metadata
  • ā—‹
    Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary

Keywords

binning data-analysis distributions doanes-rule freedman-diaconis histogram histogram-binning math php-math rice-rule scotts-rule square-root statistics sturges-rule terrell-scotts-rule
Last synced: 5 months ago · JSON representation

Repository

Determine optimal number of bins š’Œ for histogram creation and optimal bin width š’‰ using various statistical methods.

Basic Info
  • Host: GitHub
  • Owner: tomkyle
  • License: mit
  • Language: PHP
  • Default Branch: main
  • Homepage:
  • Size: 66.4 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
binning data-analysis distributions doanes-rule freedman-diaconis histogram histogram-binning math php-math rice-rule scotts-rule square-root statistics sturges-rule terrell-scotts-rule
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme License

README.md

tomkyle/binning

Composer Version PHP version GitHub Actions Workflow Status Packagist License

Determine the optimal š’Œ number of bins for histogram creation and optimal bin width š’‰ using various statistical methods. Its unified interface includes implementations of well-known binning rules such as:

  • Square Root Rule (1892)
  • Sturges’ Rule (1926)
  • Doane’s Rule (1976)
  • Scott’s Rule (1979)
  • Freedman-Diaconis Rule (1981)
  • Terrell-Scott’s Rule (1985)
  • Rice University Rule

Requirements

This library requires PHP 8.3 or newer. Support of older versions like markrogoyski/math-php provides for PHP 7.2+ is not planned.

Installation

bash composer require tomkyle/binning

Usage

The BinSelection class provides several methods for determining the optimal number of bins for histogram creation and optimal bin width. You can either use specific methods directly or the general suggestBins() and suggestBinWidth() methods with different strategies.

Determine Bin Width

Use the suggestBinWidth method to get the optimal bin width based on the selected method. The method returns the bin width, often referred to as š’‰, as a float value.

```php <?php use tomkyle\Binning\BinSelection;

$data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15];

// Default method: Freedman-Diaconis Rule (1981) $h = BinSelection::suggestBinWidth($data); $h = BinSelection::suggestBinWidth($data, BinSelection::DEFAULT);

// Explicitly set method $h = BinSelection::suggestBinWidth($data, BinSelection::FREEDMAN_DIACONIS); $h = BinSelection::suggestBinWidth($data, BinSelection::SCOTT); ```

Determine Number of Bins

Use the suggestBins method to get the optimal number of bins based on the selected method. The method returns the number of bins, often referred to as š’Œ, as an integer value.

```php <?php use tomkyle\Binning\BinSelection;

$data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15];

// Defaults to Freedman-Diaconis Rule $k = BinSelection::suggestBins($data); $k = BinSelection::suggestBins($data, BinSelection::DEFAULT);

// Square Root Rule (Pearson, 1892) $k = BinSelection::suggestBins($data, BinSelection::SQUARE_ROOT); $k = BinSelection::suggestBins($data, BinSelection::PEARSON);

// Sturges' Rule (1926) $k = BinSelection::suggestBins($data, BinSelection::STURGES);

// Doane's Rule (1976) in 2 variants for samples (default) or populations $k = BinSelection::suggestBins($data, BinSelection::DOANE); $k = BinSelection::suggestBins($data, BinSelection::DOANE, population: true);

// Scott's Rule (1979) $k = BinSelection::suggestBins($data, BinSelection::SCOTT);

// Freedman-Diaconis Rule (1981) $k = BinSelection::suggestBins($data, BinSelection::FREEDMAN_DIACONIS);

// Terrell-Scott’s Rule (1985) $k = BinSelection::suggestBins($data, BinSelection::TERRELL_SCOTT);

// Rice University Rule $k = BinSelection::suggestBins($data, BinSelection::RICE); ```


Explicit method calls

You can also call the specific methods directly to get the bin width š’‰ or number of bins š’Œ.

  • Most of the methods return the bin number š’Œ as an integer value.
  • Two methods, Scotts’ Rule and Freedman-Diaconis Rule, provide both š’Œ and š’‰ as an array.

The result array contains additional information like the data range š‘¹, the inter-quartile range IQR, or standard deviation stddev, which can be useful for further analysis.


1. Pearson’s Square Root Rule (1892)

Simple rule using the square root of the sample size.

$$ k = \left \lceil \sqrt{n} \ \right \rceil $$

php $k = BinSelection::squareRoot($data);


2. Sturges’s Rule (1926)

Based on the logarithm of the sample size. Good for normal distributions.

$$ k = 1 + \left \lceil \ \log_2(n) \ \right \rceil $$

php $k = BinSelection::sturges($data);


3. Doane’s Rule (1976)

Improvement of Sturges’ rule that accounts for data skewness.

$$ k = 1 + \left\lceil \ \log2(n) + \log2\left(1 + \frac{|g1|}{\sigma{g_1}}\right) \ \right \rceil $$

```php // Using sample-based calculation (default) $k = BinSelection::doane($data);

// Using population-based calculation $k = BinSelection::doane($data, population: true); ```


4. Scott’s Rule (1979)

Based on the standard deviation and sample size. Good for continuous data.

$$ h = \frac{3.49\,\hat{\sigma}}{\sqrt[3]{n}} $$

$$ R = \maxi xi - \mini xi $$

$$ k = \left \lceil \ \frac{R}{h} \ \right \rceil $$

The result is an array with keys width, bins, range, and stddev. Map them to variables like so:

php list($h, $k, $R, stddev) = BinSelection::scott($data);


5. Freedman-Diaconis Rule (1981)

Based on the interquartile range (IQR). Robust against outliers.

$$ IQR = Q3 - Q1 $$

$$ h = 2 \times \frac{\mathrm{IQR}}{\sqrt[3]{n}} $$

$$ R = \text{max}i xi - \text{min}i xi $$

$$ k = \left \lceil \frac{R}{h} \right \rceil $$

The result is an array with keys width, bins, range, and IQR. Map them to variables like so:

php list($h, $k, $R, $IQR) = BinSelection::freedmanDiaconis($data);


6. Terrell-Scott’s Rule (1985)

Uses the cube root of the sample size, generally provides more bins than Sturges. This is the original Rice Rule:

$$ k = \left \lceil \ \sqrt[3]{2n} \enspace \right \rceil = \left \lceil \ (2n)^{1/3} \ \right \rceil $$

php $k = BinSelection::terrellScott($data);


7. Rice University Rule

Uses the cube root of the sample size, generally provides more bins than Sturges. Formula as taught by David M. Lane at Rice University. — N.B. This Rice Rule seems to be not the original. In fact, Terrell-Scott’s (1985) seems to be. Also note that both variants can yield different results under certain circumstances. This Lane’s variant from the early 2000s is however more commonly cited:

$$ k = 2 \times \left \lceil \ \sqrt[3]{n} \enspace \right \rceil = 2 \times \left \lceil \ n^{1/3} \ \right \rceil $$

php $k = BinSelection::rice($data);


Method Selection Guidelines

| Rule | Strengths & Weaknesses | | --------------------- | ------------------------------------------------------------ | | Freedman–Diaconis | Uses the IQR to set š’‰, so it is robust against outliers and adapts to data spread.
āš ļø May over‐smooth heavily skewed or multi‐modal data when IQR is small. | | Sturges’ Rule | Very simple, works well for roughly normal, moderate-sized datasets.
āš ļø Ignores outliers and underestimates bin count for large or skewed samples. | | Rice Rule | Independent of data shape and easy to compute.
āš ļø Prone to over‐ or under‐smoothing when the distribution is heavy‐tailed or skewed. | | Terrell–Scott | Similar approach as Rice Rule but with asymptotically optimal MISE properties; gives more bins than Sturges and adapts better at large š’.
āš ļø Still ignores skewness and outliers. | | Square Root Rule | Simply the square root, so it requires no distributional estimates.
āš ļø May produce too few bins for complex distributions — or too many for very noisy data. | | Doane’s Rule | Extends Sturges’ Rule by adding a skewness correction. Improving performance on asymmetric data.
āš ļø Requires estimating the third moment (skewness), which can be unstable for small š’. | | Scott’s Rule | Uses standard deviation to minimize MISE, providing good balance for unimodal, symmetric data.
āš ļø Sensitive to outliers (inflated $\sigma$) and may underperform on skewed distributions. |

Literature

Rubia, J.M.D.L. (2024): Rice University Rule to Determine the Number of Bins. Open Journal of Statistics, 14, 119-149. DOI: 10.4236/ojs.2024.141006

Wikipedia: Histogram / Number of bins and width https://en.wikipedia.org/wiki/Histogram#Numberofbinsandwidth

Practical Example

```php <?php use tomkyle\Binning\BinSelection;

// Generate sample data (e.g., from measurements) $measurements = [ 12.3, 14.1, 13.8, 15.2, 12.9, 14.7, 13.1, 15.8, 12.5, 14.3, 13.6, 15.1, 12.8, 14.9, 13.4, 15.5, 12.7, 14.2, 13.9, 15.0 ];

echo "Data points: " . count($measurements) . "\n\n";

// Compare different methods $methods = [ 'Sturges’s Rule' => BinSelection::STURGES, 'Rice University Rule' => BinSelection::RICE, 'Terrell-Scott’s Rule' => BinSelection::TERRELLSCOTT, 'Square Root Rule' => BinSelection::SQUAREROOT, 'Doane’s Rule' => BinSelection::DOANE, 'Scott’s Rule' => BinSelection::SCOTT, 'Freedman-Diaconis Rule' => BinSelection::FREEDMAN_DIACONIS, ];

foreach ($methods as $name => $method) { $bins = BinSelection::suggestBins($measurements, $method); echo sprintf("%-18s: %2d bins\n", $name, $bins); } ```

Error Handling

All methods will throw InvalidArgumentException for invalid inputs:

```php try { // This will throw an exception $bins = BinSelection::sturges([]); } catch (InvalidArgumentException $e) { echo "Error: " . $e->getMessage(); // Output: "Dataset cannot be empty to apply the Sturges' Rule." }

try { // This will throw an exception
$bins = BinSelection::suggestBins($data, 'invalid-method'); } catch (InvalidArgumentException $e) { echo "Error: " . $e->getMessage(); // Output: "Unknown binning method: invalid-method" } ```

Development

Clone repo and install requirements

bash $ git clone git@github.com:tomkyle/binning.git $ composer install $ pnpm install

Watch source and run various tests

This will watch changes inside the src/ and tests/ directories and run a series of tests:

  1. Find and run the according unit test with PHPUnit.
  2. Find possible bugs and documentation isses using phpstan.
  3. Analyse code style and give hints on newer syntax using Rector.

bash $ npm run watch

Run PhpUnit

bash $ npm run phpunit

Owner

  • Name: Carsten Witt
  • Login: tomkyle
  • Kind: user
  • Location: Kiel, Germany
  • Company: PLƖTZBROT

GitHub Events

Total
  • Push event: 5
  • Create event: 2
Last Year
  • Push event: 5
  • Create event: 2

Issues and Pull Requests

Last synced: 6 months ago

Packages

  • Total packages: 1
  • Total downloads:
    • packagist 3 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
packagist.org: tomkyle/binning

Determine optimal number of bins š’Œ for histogram creation and optimal bin width š’‰ using various statistical methods.

  • License: MIT
  • Latest release: 1.0.2
    published 8 months ago
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3 Total
Rankings
Dependent packages count: 17.0%
Dependent repos count: 29.1%
Average: 44.5%
Downloads: 87.4%
Maintainers (1)
Funding
Last synced: 6 months ago

Dependencies

.github/workflows/dependency-review.yml actions
  • actions/checkout v4 composite
  • actions/dependency-review-action v4 composite
.github/workflows/php.yml actions
  • actions/checkout v4 composite
  • ramsey/composer-install v2 composite
  • shivammathur/setup-php v2 composite
package.json npm
  • chokidar-cli ^3.0.0 development
  • npm-run-all ^4.1.5 development
pnpm-lock.yaml npm
composer.json packagist
  • friendsofphp/php-cs-fixer ^3.67 development
  • phpstan/phpstan ^2.1 development
  • phpunit/phpunit ^12.0 development
  • rector/rector ^2.1 development
  • tomkyle/find-run-test ^1.0 development
  • markrogoyski/math-php ^2.11
  • php ^8.3
composer.lock packagist
  • clue/ndjson-react v1.3.0 development
  • composer/pcre 3.3.2 development
  • composer/semver 3.4.3 development
  • composer/xdebug-handler 3.0.5 development
  • evenement/evenement v3.0.2 development
  • fidry/cpu-core-counter 1.2.0 development
  • friendsofphp/php-cs-fixer v3.75.0 development
  • myclabs/deep-copy 1.13.1 development
  • nikic/php-parser v5.5.0 development
  • phar-io/manifest 2.0.4 development
  • phar-io/version 3.2.1 development
  • phpstan/phpstan 2.1.17 development
  • phpunit/php-code-coverage 12.3.1 development
  • phpunit/php-file-iterator 6.0.0 development
  • phpunit/php-invoker 6.0.0 development
  • phpunit/php-text-template 5.0.0 development
  • phpunit/php-timer 8.0.0 development
  • phpunit/phpunit 12.2.3 development
  • psr/container 2.0.2 development
  • psr/event-dispatcher 1.0.0 development
  • psr/log 3.0.2 development
  • react/cache v1.2.0 development
  • react/child-process v0.6.6 development
  • react/dns v1.13.0 development
  • react/event-loop v1.5.0 development
  • react/promise v3.2.0 development
  • react/socket v1.16.0 development
  • react/stream v1.4.0 development
  • rector/rector 2.1.0 development
  • sebastian/cli-parser 4.0.0 development
  • sebastian/comparator 7.1.0 development
  • sebastian/complexity 5.0.0 development
  • sebastian/diff 7.0.0 development
  • sebastian/environment 8.0.2 development
  • sebastian/exporter 7.0.0 development
  • sebastian/global-state 8.0.0 development
  • sebastian/lines-of-code 4.0.0 development
  • sebastian/object-enumerator 7.0.0 development
  • sebastian/object-reflector 5.0.0 development
  • sebastian/recursion-context 7.0.0 development
  • sebastian/type 6.0.2 development
  • sebastian/version 6.0.0 development
  • staabm/side-effects-detector 1.0.5 development
  • symfony/console v7.3.0 development
  • symfony/deprecation-contracts v3.6.0 development
  • symfony/event-dispatcher v7.3.0 development
  • symfony/event-dispatcher-contracts v3.6.0 development
  • symfony/filesystem v7.3.0 development
  • symfony/finder v7.3.0 development
  • symfony/options-resolver v7.3.0 development
  • symfony/polyfill-ctype v1.32.0 development
  • symfony/polyfill-intl-grapheme v1.32.0 development
  • symfony/polyfill-intl-normalizer v1.32.0 development
  • symfony/polyfill-mbstring v1.32.0 development
  • symfony/polyfill-php80 v1.32.0 development
  • symfony/polyfill-php81 v1.32.0 development
  • symfony/process v7.3.0 development
  • symfony/service-contracts v3.6.0 development
  • symfony/stopwatch v7.3.0 development
  • symfony/string v7.3.0 development
  • theseer/tokenizer 1.2.3 development
  • tomkyle/find-run-test 1.0.7 development
  • markrogoyski/math-php v2.11.0