statistical_missteps
Supplement to Three common statistical missteps we make in reservoir characterization
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary
Repository
Supplement to Three common statistical missteps we make in reservoir characterization
Basic Info
- Host: GitHub
- Owner: frank1010111
- License: gpl-3.0
- Language: Jupyter Notebook
- Default Branch: master
- Size: 3.95 MB
Statistics
- Stars: 14
- Watchers: 2
- Forks: 6
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Statistical Missteps
Supplement to Three common statistical missteps we make in reservoir characterization Authors: Frank Male and Jerry Jensen
Here we show, through Monte Carlo experiments, three examples of statistical missteps that we have seen in the reservoir characterization literature.
The mistakes are
- Applying algebra to linear least squares regression models
- Improperly de-transforming a log-transformed variable in a linear least square model without accounting for bias
- Mis-applying R2
All of the examples are in Statistics pitfalls.ipynb. Exposition around the
first misstep is available in the notebook
Applying algebra to regression.ipynb; the second is discussed in
Regression on transformed variables.ipynb. The third misstep is detailed in
Misinterpreting R-squared.ipynb.
Interactive examples
Anyone can run these examples on Binder or Google Colab by clicking on the buttons above.
Citing this work
The citation is
Male, F. and Jensen, J.L., 2022. Three common statistical missteps we make in reservoir characterization. AAPG Bulletin, 106(11), pp.2149-2161. https://doi.org/10.1306/07202120076
The official version is at the AAPG Bulletin. A preprint is available at EarthArXiV.
Owner
- Name: Frank Male
- Login: frank1010111
- Kind: user
- Location: State College, PA
- Company: Penn State University
- Repositories: 20
- Profile: https://github.com/frank1010111
Full stack scientific programmer - from raw data to decisions
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Three common statistical missteps we make in reservoir
characterization
message: >-
If these notebooks are useful for you, consider citing
this paper
authors:
- given-names: Frank
family-names: Male
email: frank.male@psu.edu
affiliation: Penn State University
orcid: 'https://orcid.org/0000-0002-3402-5578'
- given-names: Jerry L.
family-names: Jensen
affiliation: University of Texas at Austin
identifiers:
- type: doi
value: 10.1306/07202120076
- type: url
value: 'https://doi.org/10.1306/07202120076'
repository-code: 'https://github.com/frank1010111/statistical_missteps'
abstract: >-
Reservoir characterization analysis resulting from
incorrect applications of statistics can be found in the
literature, particularly in applications where integration
of various disciplines is needed. Here, we look at three
misapplications of ordinary least squares linear
regression (LSLR), show how they can lead to poor results,
and offer better alternatives, where available. The issues
are Application of algebra to an LSLR-derived model to
reverse the roles of explanatory and response variables
that may give biased predictions. In particular, we
examine pore-throat size equations (e.g., Winland’s and
Pittman’s equations) and find that claims of overpredicted
permeability may in part be because of statistical
mistakes.Using a log-transformed variable in an LSLR
model, detransforming without accounting for the role of
noise. This gives an equation that underpredicts the mean
value. Several approaches exist to address this
problem.Misapplication of the coefficient of determination
(R2) in three cases that lead to misleading results. For
example, model fitting in decline curve analysis gives
optimistic R2 values, as is also the case where a
multimodal explanatory variable is present. Using actual
and synthetic data sets, we illustrate the effects that
these errors have on analysis and some implications for
using machine learning results.
date-released: '2022-11-01'
GitHub Events
Total
- Delete event: 1
- Push event: 1
- Pull request event: 1
- Pull request review event: 1
Last Year
- Delete event: 1
- Push event: 1
- Pull request event: 1
- Pull request review event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: over 1 year
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- matplotlib >=3.1
- notebook >=5.0
- numpy >=1.18
- pandas >=1.0
- scipy >=1.4
- seaborn >=0.10.0
- statsmodels >=0.11.0
- matplotlib ==3.5.2
- numpy ==1.23.1
- pandas ==1.4.3
- scipy ==1.6.1
- seaborn ==0.11.2
- statsmodels ==0.13.2