statistical_missteps

Supplement to Three common statistical missteps we make in reservoir characterization

https://github.com/frank1010111/statistical_missteps

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Supplement to Three common statistical missteps we make in reservoir characterization

Basic Info
  • Host: GitHub
  • Owner: frank1010111
  • License: gpl-3.0
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 3.95 MB
Statistics
  • Stars: 14
  • Watchers: 2
  • Forks: 6
  • Open Issues: 0
  • Releases: 0
Created almost 6 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Statistical Missteps

Supplement to Three common statistical missteps we make in reservoir characterization Authors: Frank Male and Jerry Jensen

Binder Open In Colab

Here we show, through Monte Carlo experiments, three examples of statistical missteps that we have seen in the reservoir characterization literature.

The mistakes are

  1. Applying algebra to linear least squares regression models
  2. Improperly de-transforming a log-transformed variable in a linear least square model without accounting for bias
  3. Mis-applying R2

All of the examples are in Statistics pitfalls.ipynb. Exposition around the first misstep is available in the notebook Applying algebra to regression.ipynb; the second is discussed in Regression on transformed variables.ipynb. The third misstep is detailed in Misinterpreting R-squared.ipynb.

Interactive examples

Anyone can run these examples on Binder or Google Colab by clicking on the buttons above.

Citing this work

The citation is

Male, F. and Jensen, J.L., 2022. Three common statistical missteps we make in reservoir characterization. AAPG Bulletin, 106(11), pp.2149-2161. https://doi.org/10.1306/07202120076

The official version is at the AAPG Bulletin. A preprint is available at EarthArXiV.

Owner

  • Name: Frank Male
  • Login: frank1010111
  • Kind: user
  • Location: State College, PA
  • Company: Penn State University

Full stack scientific programmer - from raw data to decisions

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Three common statistical missteps we make in reservoir
  characterization
message: >-
  If these notebooks are useful for you, consider citing
  this paper
authors:
  - given-names: Frank
    family-names: Male
    email: frank.male@psu.edu
    affiliation: Penn State University
    orcid: 'https://orcid.org/0000-0002-3402-5578'
  - given-names: Jerry L.
    family-names: Jensen
    affiliation: University of Texas at Austin
identifiers:
  - type: doi
    value: 10.1306/07202120076
  - type: url
    value: 'https://doi.org/10.1306/07202120076'
repository-code: 'https://github.com/frank1010111/statistical_missteps'
abstract: >-
  Reservoir characterization analysis resulting from
  incorrect applications of statistics can be found in the
  literature, particularly in applications where integration
  of various disciplines is needed. Here, we look at three
  misapplications of ordinary least squares linear
  regression (LSLR), show how they can lead to poor results,
  and offer better alternatives, where available. The issues
  are Application of algebra to an LSLR-derived model to
  reverse the roles of explanatory and response variables
  that may give biased predictions. In particular, we
  examine pore-throat size equations (e.g., Winland’s and
  Pittman’s equations) and find that claims of overpredicted
  permeability may in part be because of statistical
  mistakes.Using a log-transformed variable in an LSLR
  model, detransforming without accounting for the role of
  noise. This gives an equation that underpredicts the mean
  value. Several approaches exist to address this
  problem.Misapplication of the coefficient of determination
  (R2) in three cases that lead to misleading results. For
  example, model fitting in decline curve analysis gives
  optimistic R2 values, as is also the case where a
  multimodal explanatory variable is present. Using actual
  and synthetic data sets, we illustrate the effects that
  these errors have on analysis and some implications for
  using machine learning results.
date-released: '2022-11-01'

GitHub Events

Total
  • Delete event: 1
  • Push event: 1
  • Pull request event: 1
  • Pull request review event: 1
Last Year
  • Delete event: 1
  • Push event: 1
  • Pull request event: 1
  • Pull request review event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: over 1 year
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2)

Dependencies

pyproject.toml pypi
  • matplotlib >=3.1
  • notebook >=5.0
  • numpy >=1.18
  • pandas >=1.0
  • scipy >=1.4
  • seaborn >=0.10.0
  • statsmodels >=0.11.0
requirements.txt pypi
  • matplotlib ==3.5.2
  • numpy ==1.23.1
  • pandas ==1.4.3
  • scipy ==1.6.1
  • seaborn ==0.11.2
  • statsmodels ==0.13.2