thresholdmodeling

thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution - Published in JOSS (2020)

https://github.com/iagolemos1/thresholdmodeling

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: springer.com, joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Mathematics Computer Science - 63% confidence
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: iagolemos1
  • License: lgpl-3.0
  • Language: Python
  • Default Branch: master
  • Size: 534 KB
Statistics
  • Stars: 34
  • Watchers: 3
  • Forks: 15
  • Open Issues: 1
  • Releases: 1
Created about 6 years ago · Last pushed about 5 years ago
Metadata Files
Readme License

README.md

DOI DOI

thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution

This package is intended for those who wish to conduct an extreme values analysis. It provides the whole toolkit necessary to create a threshold model in a simple and efficient way, presenting the main methods towards the Peak-Over-Threshold method and the fit in the Generalized Pareto Distribution.

In this repository you can find the main files of the package, the Functions Documenation, the dataset used in some examples, the paper submitted to the Jounal of Open Source Software and some tutorials.

Installing Package

It is necessary to have internet connection and use Anaconda distribution (Python 3).

  • For installing Anaconda on Linux, go to this link. For installing on Windows, go to this one. For istalling on macOS, go to this one.

  • For creating your own environment by using the terminal or Anaconda Prompt, go here.

Windows Users

Firstly, it will necessary to install R on your environment and considering that rpy2 (a python dependency package for thresholdmodeling) does not have Windows support, installing it from pip install thresholdmodeling will result in an error, the same occurs with pip install rpy2. Then, it is necessary to download it from an unuofficial website: https://www.lfd.uci.edu/~gohlke/pythonlibs/ Here, you must find the rpy2 realese which works on your machine and install it manually going to the download folder with the Anaconda Prompt and run this line, for example (it will depend on the name of the downloaded file): pip install rpy2‑2.9.5‑cp37‑cp37m‑win_amd64.whl Or you can install it from the the Anaconda Prompt by activating your environment and running: conda activate my_env conda install r conda install -c r rpy2=2.9.4 After that, rpy2 and R will be installed on your machine. Follow the next steps.

For installing the package just use the following command on your Anaconda Prompt (it is already in PyPi): pip install thresholdmodeling The others Python dependencies for runing the software will install automatically with this command.

Once the package is installed, it is necessary to run these lines on your IDE for installing POT R package (package that our software uses by means of rpy2 for computing GPD estimatives): ```python from rpy2.robjects.packages import importr import rpy2.robjects.packages as rpackages

base = importr('base') utils = importr('utils') utils.chooseCRANmirror(ind=1) utils.install_packages('POT') #installing POT package ```

Linux Users

Firstly, run this lines on your terminal in order to install R and rpy2 package on your environment: conda activate my_env (my_env is your environment name) conda install r conda install -c r rpy2=2.9.4 After installing R and rpy2, find your anaconda directory, and find the actual environment folder. It should be somewhere like ~/anaconda3/envs/my_env. Open the terminal in this folder and run this line (the others dependencies will automatically install): pip install thresholdmodeling Once the package is installed, it is necessary to run this lines on your IDE for installing POT R package (package that our software uses by means of rpy2 for computing GPD estimatives):

```python from rpy2.robjects.packages import importr import rpy2.robjects.packages as rpackages

base = importr('base') utils = importr('utils') utils.chooseCRANmirror(ind=1) utils.install_packages('POT') #installing POT package Or, it is possible to download this file in order to run it in yout IDE and installing POT. ```

User's guide and Reproducibility

In the file example it is possible to see how the package should be used. In Functions Documenation it may be seen a complete documentation on how to use the functions presented in the package.

In order to present a tutorial on how to use the package and its results, a guide is presented below, using the example on the Coles's book with the Daily Rainfall in South-West England dataset.

Threshold Selection

Firstly, it is necessary to conduct a threshold value analysis using the first two functions of the package: MRL and Parameter_Stability_Plot, in order to select a reasonable threshold value. Runing this: ```python from thresholdmodeling import thresh_modeling #importing package import pandas as pd #importing pandas

url = 'https://raw.githubusercontent.com/iagolemos1/thresholdmodeling/master/dataset/rain.csv' #saving url df = pd.readcsv(url, errorbad_lines=False) #getting data data = df.values.ravel() #turning data into an array

threshmodeling.MRL(data, 0.05)
thresh
modeling.ParameterStabilityplot(data, 0.05) ``` The results must be:

Then, by analysing the three graphics, it is reasonable taking the threshold value as 30.

Model Fit

Once the threshold value is defined, it is possible to fit the dataset to a GPD model by using the function gpdfitrunning the following line and using the maximum likelihood estimation method:

python thresh_modeling.gpdfit(data, 30, 'mle')

The results must be in Terminal like: ``` Estimator: MLE

Deviance: 970.1874

  AIC: 974.1874

Varying Threshold: FALSE

Threshold Call: 30L

Number Above: 152

Proportion Above: 0.0087

Estimates

scale shape

7.4411 0.1845

Standard Error Type: observed

Standard Errors

scale shape

0.9587 0.1012

Asymptotic Variance Covariance

   scale     shape

scale 0.91920 -0.06554

shape -0.06554 0.01025

Optimization Information

Convergence: successful

Function Evaluations: 14

Gradient Evaluations: 6 ``` These are the GPD model estimatives using the maximum likelihood estimator.

Model Checking

Once the GPD model is defined, it is necessary to verify if the model is reasonable and describes well the empirical observations. Plots like probability density function, cumulative distribution function, quantile-quantile and probability-probability can show to us if the model is good. It is possible to obtain these plots using some functions of the package: gpdpdf, gpdcdf, qqplot and ppplot. By running these lines: python thresh_modeling.gpdpdf(data, 30, 'mle', 'sturges', 0.05) thresh_modeling.gpdcdf(data, 30, 'mle', 0.05) thresh_modeling.qqplot(data,30, 'mle', 0.05) thresh_modeling.ppplot(data, 30, 'mle', 0.05) The results must be:

Once it is possible to verifiy that the theoretical model describes very well the empirical observations, the next step is to use the main tool of the extreme values approach: extrapolation over the unit of the return period.

Return Value Analysis

The first thing that must be defined is: what is the unit of the return period? In this example, the unit is days because the observations are daily, but in other applications, like corrosion engineering, the unit may be number of observations.

Using the function return_value is possible to get two informations: * 1 : The return value for a given return period and; * 2 : The return value plot, that works very well for a model diagnostic.

By running this line (go to Model Diagnostics and Return Level Analysis for more information about the function): python thresh_modeling.return_value(data, 30, 0.05, 365, 36500, 'mle') It means, the return period we want to know the exact return value is 36500 days or 100 years. With the 365, we are saying that the annual number of observations is 365.

The results must be:

The return value for the given return period is 106.34386649996667 ± 40.86691363790978 Hence, by the graphic, it is possible to say that the theoretical model is very well fitted. Also, it was possible to compute the return value in 100 years. In other words, the rainfall preciptation once in every 100 years must be between 65.4470 and 147.2108 mm.

Declustering

Stuart Coles's in his book says that if the extremes assume a tendency to be clustered in a stationary series, another pratice would be need to model these values. The pratice consists in declustering, which is: cluster data and decluster by its maximums. For this example, it is clear that, at least initialy, the dataset is not orgnanized in clusters. With the function decluster it is possible to observe the dataset plot against its unit of return period, but, also it is possible to cluster it using a given block size (in this example it will be monthly, then the block size will be 30 days), and then decluster it by taking the maximum of each block.

By running these lines: python thresh_modeling.decluster(data, 30, 30) The result must be:

It is important to say that the unit of the return period after the decluster changes (monthly). With the first graph is possible to observe that, at least initialy, there is not any pattern. However, it does not means that it is not possible to descluter the data set to a given block size, which is possible to see in the second graphic.

In a case that it is necessary to decluster the dataset, the second one, shown in the declustered graphic must be used.

Further Functions

The other functions that are not in this tutorial can be used as it is shown in the test file. The discription of each one is in the Functions Documenation.

Doubts

Any doubts about the package, don't hesitate to contact me.

General License

Copyright (c) 2019 Iago Pereira Lemos

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/

Referencing

For referencing the repository, use the following code: @misc{thresholdmodeling, author = {Iago P. Lemos and Antonio Marcos G. Lima and Marcus Antonio Viana Duarte}, title = {thresholdmodeling package}, month = Feb, year = 2020, doi = {10.5281/zenodo.3661338}, version = {0.0.1}, publisher = {Zenodo}, url = {https://github.com/iagolemos1/thresholdmodeling} }

Background

I am a mechanical engineering undergraduate student in the Federal University of Uberlândia and this package was made in the Acoustics and Vibration Laboratory, in the School of Mechanical Engineering.

JOSS Publication

thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution
Published
February 10, 2020
Volume 5, Issue 46, Page 2013
Authors
Iago Pereira Lemos ORCID
Acoustics and Vibration Laboratory, School of Mechanical Engineering, Federal University of Uberlândia
Antônio Marcos Gonçalves Lima ORCID
Associate Professor, School of Mechanical Engineering, Federal University of Uberlândia
Marcus Antônio Viana Duarte ORCID
Associate Professor, Acoustics and Vibration Laboratory, School of Mechanical Engineering, Federal University of Uberlândia
Editor
Vincent Knight ORCID
Tags
Threshold Models Peak-Over-Threshold Method Generalized Pareto Distribution Estatistical Modeling

GitHub Events

Total
  • Watch event: 3
Last Year
  • Watch event: 3

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 183
  • Total Committers: 4
  • Avg Commits per committer: 45.75
  • Development Distribution Score (DDS): 0.022
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Iago Pereira Lemos l****3@g****m 179
Joshua Harrison j****a@c****y 2
Vince Knight v****t@g****m 1
Daniel S. Katz d****z@i****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 9
  • Total pull requests: 3
  • Average time to close issues: 4 days
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 4.44
  • Average comments per pull request: 0.33
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kellieotto (4)
  • bahung (2)
  • rjskene (1)
  • IanBurger (1)
  • andreowhite (1)
Pull Request Authors
  • drvinceknight (1)
  • JoshuaCrestone (1)
  • danielskatz (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 10 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 1
  • Total maintainers: 1
pypi.org: thresholdmodeling

This package is intended for those who wish to conduct an extreme values analysis. It provides the whole toolkit necessary to create a threshold model in a simple and efficient way, presenting the main methods towards the Peak-Over-Threshold Method and the fit in the Generalized Pareto Distribution. For installing and use it, go to https://github.com/iagolemos1/thresholdmodeling

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 10 Last month
Rankings
Forks count: 9.8%
Dependent packages count: 10.1%
Stargazers count: 11.5%
Average: 18.7%
Dependent repos count: 21.6%
Downloads: 40.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • matplotlib *
  • numpy *
  • pandas *
  • scipy *
  • thresholdmodeling *
setup.py pypi
  • matplotlib *
  • numpy *
  • rpy2 *
  • scipy *
  • seaborn *