https://github.com/dallaylaen/stats-logscale-js

Memory efficient, fast approximate statistical analysis tool

https://github.com/dallaylaen/stats-logscale-js

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.2%) to scientific vocabulary

Keywords

approximate math statistics univariate
Last synced: 6 months ago · JSON representation

Repository

Memory efficient, fast approximate statistical analysis tool

Basic Info
  • Host: GitHub
  • Owner: dallaylaen
  • Language: JavaScript
  • Default Branch: main
  • Homepage:
  • Size: 841 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
approximate math statistics univariate
Created almost 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog

README.md

stats-logscale

A memory-efficient approximate statistical analysis tool using logarithmic binning.

Example: repeated setTimeout(0) execution times Example: repeated setTimeout(0) execution times

Description

  • data is split into bins (aka buckets), linear close to zero and logarithmic for large numbers (hence the name), thus maintaining desired absolute and relative precision;

  • can calculate mean, variance, median, moments, percentiles, cumulative distribution function (i.e. probability that a value is less than x), and expected values of arbitrary functions over the sample;

  • can generate histograms for plotting the data;

  • all calculated values are cached. Cache is reset upon adding new data;

  • (almost) every function has a "neat" counterpart which rounds the result to the shortest possible number within the precision bounds. E.g. foo.mean() // 1.0100047, but foo.neat.mean() // 1.01;

  • is (de)serializable;

  • can split out partial data or combine multiple samples into one.

Usage

Creating the sample container:

javascript const { Univariate } = require( 'stats-logscale' ); const stat = new Univariate();

Specifying absolute and relative precision. The defaults are 10-9 and 1.001, respectivele. Less precision = less memory usage and faster data querying (but not insertion). javascript const stat = new Univariate({base: 1.01, precision: 0.001});

Use flat switch to avoid using logarithmic binning at all: javascript // this assumes the data is just integer numbers const stat = new Univariate({precision: 1, flat: true});

Adding data points, wither one by one, or as (value, frequency) pairs. Strings are OK (e.g. after parsing user input) but non-numeric values will cause an exception: javascript stat.add (3.14); stat.add ("Foo"); // Nope! stat.add ("3.14 3.15 3.16".split(" ")); stat.addWeighted([[0.5, 1], [1.5, 3], [2.5, 5]]);

Querying data: javascript stat.count(); // number of data points stat.mean(); // average stat.stdev(); // standard deviation stat.median(); // half of data is lower than this value stat.percentile(90); // 90% of data below this point stat.quantile(0.9); // ditto stat.cdf(0.5); // Cumulative distribution function, which means // the probability that a data point is less than 0.5 stat.moment(power); // central moment of an integer power stat.momentAbs(power); // < |x-<x>| ** power >, power may be fractional stat.E( x => x\*x ); // expected value of an arbitrary function

Each querying primitive has a "neat" counterpart that rounds its output to the shortest possible decimal number in the respective bin:

javascript stat.neat.mean(); stat.neat.stdev(); stat.neat.median();

Extract partial samples:

javascript stat.clone( { min: 0.5, max: 0.7 } ); stat.clone( { ltrim: 1, rtrim: 1 }); // cut off outer 1% of data stat.clone( { ltrim: 1, rtrim: 1, winsorize: true }}); // ditto but truncate outliers instead of discarding

Serialize, deserialize, and combine data from multiple sources

```javascript const str = JSON.stringify(stat); // send over the network here const copy = new Univariate (JSON.parse(str));

main.addWeighted( partialStat.getBins() ); main.addWeighted( JSON.parse(str).bins ); // ditto ```

Create histograms and plot data:

```javascript stat.histogram({scale: 768, count:1024}); // this produces 1024 bars of the form // [ barheight, lowerboundary, upper_boundary ] // The intervals are consecutive. // The bar heights are limited to 768.

stat.histogram({scale: 70, count:20}) .map( x => stat.shorten(x[1], x[2]) + '\t' + '+'.repeat(x[0]) ) .join('\n') // "Draw" a vertical histogram for text console // You'll use PNG in production instead, right? Right? ```

See the playground.

See also full documentation.

Performance

Data inserts are optimized for speed, and querying is cached where possible. The script example/speed.js can be used to benchmark the module on your system.

Memory usage for a dense sample spanning 6 orders of magnitude was around 1.6MB in Chromium, ~230KB for the data itself + ~1.2MB for the cache.

Bugs

Please report bugs and request features via the github bugtracker.

Copyright and license

Copyright (c) 2022-2023 Konstantin Uvarin

This software is free software available under MIT license.

Owner

  • Name: Konstantin S. Uvarin
  • Login: dallaylaen
  • Kind: user

I'm a humble software developer. I love to sing, cycle, and make jokes & puns about my job.

GitHub Events

Total
Last Year

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 190
  • Total Committers: 1
  • Avg Commits per committer: 190.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Konstantin S. Uvarin k****n@g****m 190

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dallaylaen (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • npm 597 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 10
  • Total maintainers: 1
npmjs.org: stats-logscale

Approximate statistical analysis using logarithmic bins

  • Versions: 10
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 597 Last month
Rankings
Downloads: 10.0%
Dependent repos count: 10.3%
Forks count: 15.4%
Stargazers count: 20.9%
Average: 21.7%
Dependent packages count: 51.9%
Maintainers (1)
Last synced: 7 months ago

Dependencies

package.json npm
  • chai ^4.3.4 development
  • eslint ^7.32.0 development
  • eslint-config-standard ^16.0.3 development
  • eslint-plugin-align-assignments ^1.1.2 development
  • eslint-plugin-import ^2.25.4 development
  • eslint-plugin-node ^11.1.0 development
  • eslint-plugin-promise ^5.2.0 development
  • mocha ^9.1.3 development
  • nyc ^15.1.0 development
  • webpack ^5.65.0 development