statistical_missteps

Supplement to Three common statistical missteps we make in reservoir characterization

https://github.com/frank1010111/statistical_missteps

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Supplement to Three common statistical missteps we make in reservoir characterization

Basic Info

Host: GitHub
Owner: frank1010111
License: gpl-3.0
Language: Jupyter Notebook
Default Branch: master
Size: 3.95 MB

Statistics

Stars: 14
Watchers: 2
Forks: 6
Open Issues: 0
Releases: 0

Created about 6 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

Statistical Missteps

Supplement to Three common statistical missteps we make in reservoir characterization Authors: Frank Male and Jerry Jensen

Here we show, through Monte Carlo experiments, three examples of statistical missteps that we have seen in the reservoir characterization literature.

The mistakes are

Applying algebra to linear least squares regression models
Improperly de-transforming a log-transformed variable in a linear least square model without accounting for bias
Mis-applying R²

All of the examples are in Statistics pitfalls.ipynb. Exposition around the first misstep is available in the notebook Applying algebra to regression.ipynb; the second is discussed in Regression on transformed variables.ipynb. The third misstep is detailed in Misinterpreting R-squared.ipynb.

Interactive examples

Anyone can run these examples on Binder or Google Colab by clicking on the buttons above.

Citing this work

The citation is

Male, F. and Jensen, J.L., 2022. Three common statistical missteps we make in reservoir characterization. AAPG Bulletin, 106(11), pp.2149-2161. https://doi.org/10.1306/07202120076

The official version is at the AAPG Bulletin. A preprint is available at EarthArXiV.

Owner

Name: Frank Male
Login: frank1010111
Kind: user
Location: State College, PA
Company: Penn State University

Repositories: 20
Profile: https://github.com/frank1010111

Full stack scientific programmer - from raw data to decisions

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Three common statistical missteps we make in reservoir
  characterization
message: >-
  If these notebooks are useful for you, consider citing
  this paper
authors:
  - given-names: Frank
    family-names: Male
    email: frank.male@psu.edu
    affiliation: Penn State University
    orcid: 'https://orcid.org/0000-0002-3402-5578'
  - given-names: Jerry L.
    family-names: Jensen
    affiliation: University of Texas at Austin
identifiers:
  - type: doi
    value: 10.1306/07202120076
  - type: url
    value: 'https://doi.org/10.1306/07202120076'
repository-code: 'https://github.com/frank1010111/statistical_missteps'
abstract: >-
  Reservoir characterization analysis resulting from
  incorrect applications of statistics can be found in the
  literature, particularly in applications where integration
  of various disciplines is needed. Here, we look at three
  misapplications of ordinary least squares linear
  regression (LSLR), show how they can lead to poor results,
  and offer better alternatives, where available. The issues
  are Application of algebra to an LSLR-derived model to
  reverse the roles of explanatory and response variables
  that may give biased predictions. In particular, we
  examine pore-throat size equations (e.g., Winland’s and
  Pittman’s equations) and find that claims of overpredicted
  permeability may in part be because of statistical
  mistakes.Using a log-transformed variable in an LSLR
  model, detransforming without accounting for the role of
  noise. This gives an equation that underpredicts the mean
  value. Several approaches exist to address this
  problem.Misapplication of the coefficient of determination
  (R2) in three cases that lead to misleading results. For
  example, model fitting in decline curve analysis gives
  optimistic R2 values, as is also the case where a
  multimodal explanatory variable is present. Using actual
  and synthetic data sets, we illustrate the effects that
  these errors have on analysis and some implications for
  using machine learning results.
date-released: '2022-11-01'

GitHub Events

Total

Delete event: 1
Push event: 1
Pull request event: 1
Pull request review event: 1

Last Year

Delete event: 1
Push event: 1
Pull request event: 1
Pull request review event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: over 1 year
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

dependabot[bot] (2)

Top Labels

Issue Labels

Pull Request Labels

dependencies (2)

Dependencies

pyproject.toml pypi

matplotlib >=3.1
notebook >=5.0
numpy >=1.18
pandas >=1.0
scipy >=1.4
seaborn >=0.10.0
statsmodels >=0.11.0

requirements.txt pypi

matplotlib ==3.5.2
numpy ==1.23.1
pandas ==1.4.3
scipy ==1.6.1
seaborn ==0.11.2
statsmodels ==0.13.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science