permutations-stats
Statistical package for fast exact tests and simulations
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Keywords
Repository
Statistical package for fast exact tests and simulations
Basic Info
Statistics
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 3
- Releases: 2
Topics
Metadata Files
README.md
permutations-stats
Python-only permutation-based statistical tests, accelerated with numba.
Status
Statistical tests
Brunner Munzel [1], Mann Whitney Wilcoxon [2, 3], Wilcoxon signed rank test [3], and Friedman [4] tests are implemented.
Exact tests (all combinations) and approximate method (simulation) are available.
Functions for bootstrapping-based confidence intervals calculations for mean, median and standard deviation are also implemented (not documented yet).
Why this package ?
This work aims to provide fast permutation-based statistical tests in Python.
Some tests are not available publicly in an exact mode (computing all
possible permutations) or with simulations. If certain assumptions cannot be
made about the data (such as normality) or if the sample is not large enough, the
existing implementations should not be used.
For example, the Brunner Munzel [1] test is implemented in scipy but not with
an exact calculation. The statistic can be used with the public API but it can
take some time if ran several thousands of times (e.g. the p-value is also
calculated for each iteration).
This packages reimplements the looping of permutations and statistical tests
with numba. A few seconds are required to compile on the fly for the first
function call. Then, acceleration is critical as shown in this output from
tests comparing the Brunner Munzel statistic calculation with scipy:
tests/stat_tests.py
200 tests with 5 and 4 data points - Permutations-stats: 2.375s, Scipy: 0.076s, diff: -2.299s
.
200 tests with 3 and 3 data points - Permutations-stats: 0.002s, Scipy: 0.077s, diff: 0.075s
.
200 tests with 10 and 12 data points - Permutations-stats: 0.004s, Scipy: 0.079s, diff: 0.075s
.
10000 tests with 18 and 10 data points - Permutations-stats: 0.220s, Scipy: 3.806s, diff: 3.586s
.
20000 tests with 18 and 15 data points - Permutations-stats: 0.477s, Scipy: 7.645s, diff: 7.168s
.
30000 tests with 18 and 19 data points - Permutations-stats: 0.769s, Scipy: 11.468s, diff: 10.699s
Dependencies
And for development testing only * scipy ≥1.5 * pytest
Usage
Basic usage: ```python import numpy as np from permutationsstats.permutations import permutationtest
Sample data
x = np.arange(9) y = (np.arange(9) -0.2) * 1.1
permutationtest(x, y, test="brunnermunzel")
PermutationsResults(statistic=0.2776044311308564, pvalue=0.7475935828877005, permutations=24310, test='brunnermunzel', alternative='TWOSIDED', method='exact')
```
More examples on usage.ipynb and a detailed demonstration on doc/demo.ipynb.
Perspective
- If sample sizes and/or the number of iterations are small, acceleration is
not expected with
numba, and usingnumpyalone should be the fastest option.
Thresholds fornumbause will be better determined to decide the function to call (without user intervention).
License
GNU General Public License v3.0 only.
Cite
If you find this software useful for your academic work, please cite as below:
Florian Charlier. (2022). permutations-stats (v0.2). Zenodo.
https://doi.org/10.5281/zenodo.7213305
Acknowledgements
We would like to thank Marianne Paesmans, Lieveke Ameye, and Luigi Moretti at Institut Jules Bordet for their support during the development of this package.
References
[1] Brunner, E. and Munzel, U. (2000), The Nonparametric Behrens‐Fisher Problem: Asymptotic Theory and a Small‐Sample Approximation. Biom. J., 42: 17-25. doi:10.1002/(SICI)1521-4036(200001)42:117::AID-BIMJ173.0.CO;2-U
[2] Mann, H. B. and D. R. Whitney (1947). "On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other." Ann. Math. Statist. 18(1): 50-60.
[3] Wilcoxon, F. (1945). "Individual Comparisons by Ranking Methods." Biometrics Bulletin 1(6): 80-83.
[4] Friedman, M. (1937). "The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance." Journal of the American Statistical Association 32(200): 675-701.
Owner
- Name: Florian Charlier
- Login: trevismd
- Kind: user
- Location: Belgium
- Repositories: 3
- Profile: https://github.com/trevismd
Developing with passion
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 46
- Total Committers: 4
- Avg Commits per committer: 11.5
- Development Distribution Score (DDS): 0.196
Top Committers
| Name | Commits | |
|---|---|---|
| Florian Charlier | f****r@u****e | 37 |
| Florian Charlier | f****r@u****e | 4 |
| Florian Charlier | 4****d@u****m | 4 |
| Florian Charlier | t****s@c****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 3
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 1 year
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 0.67
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- trevismd (2)
- mesamuels (1)
Pull Request Authors
- trevismd (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 15 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
pypi.org: permutations-stats
Permutation-based statistical tests in Python
- Homepage: https://github.com/trevismd/permutations-stats
- Documentation: https://permutations-stats.readthedocs.io/
- License: GPL-3.0-only
-
Latest release: 0.2
published over 3 years ago
Rankings
Maintainers (1)
Dependencies
- numba *
- numpy *
- numpy *
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite