https://github.com/kukuster/ci_methods_analyser
Analyse efficacy of your own confidence interval (CI) methods
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Keywords
Repository
Analyse efficacy of your own confidence interval (CI) methods
Basic Info
- Host: GitHub
- Owner: Kukuster
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://pypi.org/project/CI-methods-analyser/
- Size: 5.89 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
CI methods analyser
A toolkit for measuring the efficacy of various methods for calculating a confidence interval. Currently provides a toolkit for measuring the efficacy of methods for a confidence interval for the following statistics:
- proportion
- the difference between two proportions
This library was mainly inspired by the library: "Five Confidence Intervals for Proportions That You Should Know About" by Dr. Dennis Robert
Dependencies
- python >=3.8
- python libs:
- numpy
- scipy
- matplotlib
- tqdm
Installation
https://pypi.org/project/CI-methods-analyser/
Applications
Applied statistics and data science: compare multiple CI methods to select the most appropriate for specific scenarios (by its accuracy at a specific range of true population properties, by computational performance, etc.)
Education on statistics and CI: demonstrates how different CI methods perform under various conditions, helps to understand the concept of CI by comparing methods for evaluation of accuracy of CI methods
Usage
Testing Wald Interval - a popular method for calculating a confidence interval for proportion
Wald Interval is defined as so:
How well does it approximate the confidence interval?
Let's assess what would be the quality of produced 95%CI with this method by testing on a range of proportions. We'll take 100 true proportions, with 1% step [0.001, 0.011, 0.021, ..., 0.991].
```python from CImethodsanalyser import CImethodForProportionefficacyToolkit as toolkit, methodsforCIfor_proportion
toolkit( method=methodsforCIforproportion.waldinterval, methodname="Wald Interval" ).calculatecoverageandshowplot( samplesize=100, proportions=('0.001', '0.999', '0.01'), confidence=0.95, pltfigure_title="Wald Interval coverage" )
input('press Enter to exit') ```
This outputs the image:

The plot indicates the overall bad performance of the method and particularly poor performance for extreme proportions. While for some true proportions the calculated CI has true confidence of around 95%, most of the time the confidence is significantly lower. For the true proportions of <0.05 and >0.95 the true confidence of the generated CI is generally lower than 90%, as indicated by the steep descent on the left-most and right-most parts of the plot.
You really might want to use a different method. Check out this wonderful medium.com article by *Dr. Dennis Robert:* - ***Five Confidence Intervals for Proportions That You Should Know About [code in R]*
The function calculate_coverage_and_show_plot that we just used is a shortcut. The code below does the same calculations and yields the same result. It relies on the public properties and methods, giving more control over parts of the calculation:
```python from CImethodsanalyser import CImethodForProportionefficacyToolkit as toolkit, methodsforCIfor_proportion
take an already implemented method for calculating CI for proportions
waldinterval = methodsforCIforproportion.waldinterval
initialize the toolkit
waldintervaltesttoolkit = toolkit( method=waldinterval, method_name="Wald Interval")
calculate the real coverage that the method produces
for each case of a true population proportion (taken from the list proportions)
waldintervaltesttoolkit.calculatecoverageanalytically( samplesize=100, proportions=('0.001', '0.999', '0.01'), confidence=0.95)
now you can access the calculated coverage and a few statistics:
waldintervaltest_toolkit.coverage # 1-d array of 0-100, the same shape as passed proportions
NOTE: proportions, when passed as a tuple of 3 float strings, expands to a list of evenly spaced float values where the #0 value is begin, #1 is end, #2 is step.
waldintervaltesttoolkit.averagecoverage # np.longdouble 0-100, avg of coverage
waldintervaltesttoolkit.averagedeviation # np.longdouble 0-100, avg abs diff w/ confidence
plots the calculated coverage in a matplotlib.pyplot figure
waldintervaltesttoolkit.plotcoverage( pltfiguretitle="Wald Interval coverage")
you can access the figure here:
waldintervaltest_toolkit.figure
shows the figure (non-blocking)
waldintervaltesttoolkit.showplot()
because show_plot() is non-blocking,
you have to pause the execution in order for the figure to be rendered completely
input('press Enter to exit') ```
I expose some style/color settings used by matplotlib.
My preference goes to the night light-friendly styling:
```python from CImethodsanalyser import CImethodForProportionefficacyToolkit as toolkit, methodsforCIfor_proportion
toolkit( method=methodsforCIforproportion.waldinterval, methodname="Wald Interval" ).calculatecoverageandshowplot( samplesize=100, proportions=('0.001', '0.999', '0.01'), confidence=0.95, pltfiguretitle="Wald Interval coverage", theme='darkbackground', plotcolor="green", linecolor="orange" )
input('press Enter to exit') ```

Testing custom method for CI for proportion
You can implement your own methods and test them:
```python from CImethodsanalyser import CImethodForProportionefficacyToolkit as toolkit from CImethodsanalyser.mathfunctions import normalzscoretwotailed from functools import lru_cache
not a particularly good method for calculating CI for proportion
@lrucache(100000) def imtellingyatest(x: int, n: int, conflevel: float = 0.95): z = normalzscoretwo_tailed(conflevel)
p = float(x)/n
return (
p - 0.02*z,
p + 0.02*z
)
toolkit( method=imtellingyatest, methodname='"I\'m telling ya" test' ).calculatecoverageandshowplot( samplesize=100, proportions=('0.001', '0.999', '0.01'), confidence=0.95, pltfiguretitle='"I\'m telling ya" coverage', theme='darkbackground', plotcolor="green", linecolor="orange" )
input('press Enter to exit')
```

This is the kind of test one would not trust. It shows very unreliable performance for the majority of the true proportions, as indicated by an extremely high discrepancy between the "ordered" confidence level of 95% and the true confidence of the CI range provided by this method. This means the output CIs are generally smaller than should be, therefore there's less confidence that the true value lies within the range of a CI. One could say, this method overestimates its ability to generate a confident range.
Let's try another custom method: "God is my witness" score
```python from CImethodsanalyser import CImethodForProportionefficacyToolkit as toolkit from CImethodsanalyser.mathfunctions import normalzscoretwotailed from functools import lru_cache
you could say, this method is "too good"
@lrucache(100000) def Godismywitnessscore(x: int, n: int, conflevel: float = 0.95): z = normalzscoretwotailed(conflevel)
p = float(x)/n
return (
(0 + p)/2 - 0.005*z,
(1 + p)/2 + 0.005*z
)
toolkit( method=Godismywitnessscore, methodname='"God is my witness" score' ).calculatecoverageandshowplot( samplesize=100, proportions=('0.001', '0.999', '0.01'), confidence=0.95, pltfiguretitle='"God is my witness" score coverage', theme='dark_background' )
input('press Enter to exit') ```

This method clearly overdid the estimates. While one expects 95%CI, the output range is less clear, as it allows for a very wide range of possibilities. In a stats lingo one would say that this method is way too conservative.
Testing methods for CI for the difference between two proportions
Let's use the implemented Pooled Z test:
, where:
```python from CImethodsanalyser import CImethodForDiffBetwTwoProportionsefficacyToolkit as toolkitd, methodsforCIfordiffbetwtwo_proportions as methods
toolkitd( method=methods.Ztestpooled, methodname='Z test pooled' ).calculatecoverageandshowplot( samplesize1=100, samplesize2=100, proportions=('0.001', '0.999', '0.01'), confidence=0.95, pltfiguretitle='Z test pooled', theme='dark_background', )
input('press Enter to exit') ```

As you can see, this test is generally perfect for close proportions (along y = x line) [WHITE], unless proportions have extreme values, where confidence of the outputted CIs is lower than expected [PURPLE]
Also, this test is extremely conservative for the high and extreme differences between two proportions, i.e. for proportions whose values are far apart [GREEN]
You may want to change the color palette (although I wouldn't):
```python from CImethodsanalyser import CImethodForDiffBetwTwoProportionsefficacyToolkit as toolkitd, methodsforCIfordiffbetwtwo_proportions as methods
toolkitd( method=methods.Ztestpooled, methodname='Z test pooled' ).calculatecoverageandshowplot( samplesize1=100, samplesize2=100, proportions=('0.001', '0.999', '0.01'), confidence=0.95, pltfiguretitle='Z test pooled', theme='dark_background', colors=("gray", "purple", "white", "orange", "#d62728") )
input('press Enter to exit') ```

NOTES
Methods for measuring the efficacy of CI methods
Two ways can be used to calculate the efficacy of CI methods for a given confidence and a true population proportion:
- approximately, with random simulation (as implemented in R by Dr. Dennis Robert, see link above). Here: calculate_coverage_randomly.
- precisely, with the analytical solution. Here: calculate_coverage_analytically
By default, always prefer the analytical solution.
Sampling the same binomial distribution n times, as it's typically done, (called "random experiments", or "simulations") is inefficient, because the binomial distribution is already fully determined by the given true population proportion.
By relying on the binomial distribution from scipy, the analytical solution provides 100% accuracy for any method (defined as a python function), any confidence level, any true population proportion(s), any sample and population size(s).
Mathematical proof of the analytical solution:

Both "simulation" and "analytical" methods are implemented for CI for both statistics: proportion, and the difference between two proportions. For the precise analytical solution, an optimization was made. Theoretically, it is lossy, but practically, the error is always negligible (as shown by test_z_precision_difference.py) and is less significant than a 64-bit floating point precision error between the closest float representation and the true Real value. Optimization is regulated with the parameter z_precision, which is automatically estimated by default.
Various links
1. Equivalence and Noninferiority Testing (as I understand, are fancy terms for 2-sided and 1-sided p tests for the difference between two proportions) - https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/ConfidenceIntervalsfortheDifferenceBetweenTwo_Proportions.pdf - https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/Non-InferiorityTestsfortheDifferenceBetweenTwo_Proportions.pdf - https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/TwoProportions-Non-Inferiority,Superiority,Equivalence,andTwo-SidedTestsvsa_Margin.pdf - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019319/ - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2701110/ - https://pubmed.ncbi.nlm.nih.gov/9595617/ - http://thescipub.com/pdf/10.3844/amjbsp.2010.23.31
2. Biostatistics course (Dr. Nicolas Padilla Raygoza, et al.) - https://docs.google.com/presentation/d/1t1DowyVDDRFYGHDlJgmYMRN4JCrvFl3q/edit#slide=id.p1 - https://www.google.com/search?q=Dr.+Sc.+Nicolas+Padilla+Raygoza+Biostatistics+course+Part+10&oq=Dr.+Sc.+Nicolas+Padilla+Raygoza+Biostatistics+course+Part+10&aqs=chrome..69i57.3448j0j7&sourceid=chrome&ie=UTF-8 - https://slideplayer.com/slide/9837395/
3. Using z-test instead of a binomial test: - When can use https://stats.stackexchange.com/questions/424446/when-can-we-use-a-z-test-instead-of-a-binomial-test - How to use https://cogsci.ucsd.edu/~dgroppe/STATZ/binomial_ztest.pdf
I accept donations!
Paypal
Cryptocurrency
You can add a transaction message with the name of a project or a custom message if your wallet and the blockchain support this
Preferred blockchains:
blockchain | address |
--- | --- | ---
|
bc1pjd2c4xcgq978979htc9admycue4nqqhda3vwsc38agked8yya50qz454xc |
|
0x176D1b6c3Fc1db5f7f967Fdc735f8267cCe741F3 | supports USDT ERC-20
|
TMuNqEgEeBQ2GseWsqgaSdbtqasnJi8ePw | supports USDT TRC-20
Owner
- Name: Mykyta Matushyn
- Login: Kukuster
- Kind: user
- Location: Ukraine
- Repositories: 4
- Profile: https://github.com/Kukuster
GitHub Events
Total
- Push event: 6
Last Year
- Push event: 6
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 21
- Total Committers: 1
- Avg Commits per committer: 21.0
- Development Distribution Score (DDS): 0.0
Top Committers
| Name | Commits | |
|---|---|---|
| Kukuster | K****P@g****m | 21 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 14 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
pypi.org: ci-methods-analyser
Analyse efficacy of your own confidence interval (CI) methods
- Homepage: https://github.com/Kukuster/CI_methods_analyser
- Documentation: https://ci-methods-analyser.readthedocs.io/
- License: MIT
-
Latest release: 1.1.0
published over 4 years ago