thresholdmodeling

thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution - Published in JOSS (2020)

https://github.com/iagolemos1/thresholdmodeling

Scientific Fields

Mathematics Computer Science - 63% confidence

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: iagolemos1
License: lgpl-3.0
Language: Python
Default Branch: master
Size: 534 KB

Statistics

Stars: 34
Watchers: 3
Forks: 15
Open Issues: 1
Releases: 1

Created over 6 years ago · Last pushed over 5 years ago

Metadata Files

Readme License

`thresholdmodeling`: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution

This package is intended for those who wish to conduct an extreme values analysis. It provides the whole toolkit necessary to create a threshold model in a simple and efficient way, presenting the main methods towards the Peak-Over-Threshold method and the fit in the Generalized Pareto Distribution.

In this repository you can find the main files of the package, the Functions Documenation, the dataset used in some examples, the paper submitted to the Jounal of Open Source Software and some tutorials.

Installing Package

It is necessary to have internet connection and use Anaconda distribution (Python 3).

For installing Anaconda on Linux, go to this link. For installing on Windows, go to this one. For istalling on macOS, go to this one.
For creating your own environment by using the terminal or Anaconda Prompt, go here.

Windows Users

Firstly, it will necessary to install R on your environment and considering that rpy2 (a python dependency package for thresholdmodeling) does not have Windows support, installing it from pip install thresholdmodeling will result in an error, the same occurs with pip install rpy2. Then, it is necessary to download it from an unuofficial website: https://www.lfd.uci.edu/~gohlke/pythonlibs/ Here, you must find the rpy2 realese which works on your machine and install it manually going to the download folder with the Anaconda Prompt and run this line, for example (it will depend on the name of the downloaded file): pip install rpy2‑2.9.5‑cp37‑cp37m‑win_amd64.whl Or you can install it from the the Anaconda Prompt by activating your environment and running: conda activate my_env conda install r conda install -c r rpy2=2.9.4 After that, rpy2 and R will be installed on your machine. Follow the next steps.

For installing the package just use the following command on your Anaconda Prompt (it is already in PyPi): pip install thresholdmodeling The others Python dependencies for runing the software will install automatically with this command.

Once the package is installed, it is necessary to run these lines on your IDE for installing POT R package (package that our software uses by means of rpy2 for computing GPD estimatives): ```python from rpy2.robjects.packages import importr import rpy2.robjects.packages as rpackages

base = importr('base') utils = importr('utils') utils.chooseCRANmirror(ind=1) utils.install_packages('POT') #installing POT package ```

Linux Users

Firstly, run this lines on your terminal in order to install R and rpy2 package on your environment: conda activate my_env (my_env is your environment name) conda install r conda install -c r rpy2=2.9.4 After installing R and rpy2, find your anaconda directory, and find the actual environment folder. It should be somewhere like ~/anaconda3/envs/my_env. Open the terminal in this folder and run this line (the others dependencies will automatically install): pip install thresholdmodeling Once the package is installed, it is necessary to run this lines on your IDE for installing POT R package (package that our software uses by means of rpy2 for computing GPD estimatives):

```python from rpy2.robjects.packages import importr import rpy2.robjects.packages as rpackages

base = importr('base') utils = importr('utils') utils.chooseCRANmirror(ind=1) utils.install_packages('POT') #installing POT package Or, it is possible to download this file in order to run it in yout IDE and installing POT. ```

User's guide and Reproducibility

In the file example it is possible to see how the package should be used. In Functions Documenation it may be seen a complete documentation on how to use the functions presented in the package.

In order to present a tutorial on how to use the package and its results, a guide is presented below, using the example on the Coles's book with the Daily Rainfall in South-West England dataset.

Threshold Selection

Firstly, it is necessary to conduct a threshold value analysis using the first two functions of the package: MRL and Parameter_Stability_Plot, in order to select a reasonable threshold value. Runing this: ```python from thresholdmodeling import thresh_modeling #importing package import pandas as pd #importing pandas

url = 'https://raw.githubusercontent.com/iagolemos1/thresholdmodeling/master/dataset/rain.csv' #saving url df = pd.readcsv(url, errorbad_lines=False) #getting data data = df.values.ravel() #turning data into an array

threshmodeling.MRL(data, 0.05)
threshmodeling.ParameterStabilityplot(data, 0.05) ``` The results must be:

Then, by analysing the three graphics, it is reasonable taking the threshold value as 30.

Model Fit

Once the threshold value is defined, it is possible to fit the dataset to a GPD model by using the function gpdfitrunning the following line and using the maximum likelihood estimation method:

python thresh_modeling.gpdfit(data, 30, 'mle')

The results must be in Terminal like: ``` Estimator: MLE

Deviance: 970.1874

  AIC: 974.1874

Varying Threshold: FALSE

Threshold Call: 30L

Number Above: 152

Proportion Above: 0.0087

Estimates

scale shape

7.4411 0.1845

Standard Error Type: observed

Standard Errors

scale shape

0.9587 0.1012

Asymptotic Variance Covariance

   scale     shape

scale 0.91920 -0.06554

shape -0.06554 0.01025

Optimization Information

Convergence: successful

Function Evaluations: 14

Gradient Evaluations: 6 ``` These are the GPD model estimatives using the maximum likelihood estimator.

Model Checking

Once the GPD model is defined, it is necessary to verify if the model is reasonable and describes well the empirical observations. Plots like probability density function, cumulative distribution function, quantile-quantile and probability-probability can show to us if the model is good. It is possible to obtain these plots using some functions of the package: gpdpdf, gpdcdf, qqplot and ppplot. By running these lines: python thresh_modeling.gpdpdf(data, 30, 'mle', 'sturges', 0.05) thresh_modeling.gpdcdf(data, 30, 'mle', 0.05) thresh_modeling.qqplot(data,30, 'mle', 0.05) thresh_modeling.ppplot(data, 30, 'mle', 0.05) The results must be:

Once it is possible to verifiy that the theoretical model describes very well the empirical observations, the next step is to use the main tool of the extreme values approach: extrapolation over the unit of the return period.

Return Value Analysis

The first thing that must be defined is: what is the unit of the return period? In this example, the unit is days because the observations are daily, but in other applications, like corrosion engineering, the unit may be number of observations.

Using the function return_value is possible to get two informations: * 1 : The return value for a given return period and; * 2 : The return value plot, that works very well for a model diagnostic.

By running this line (go to Model Diagnostics and Return Level Analysis for more information about the function): python thresh_modeling.return_value(data, 30, 0.05, 365, 36500, 'mle') It means, the return period we want to know the exact return value is 36500 days or 100 years. With the 365, we are saying that the annual number of observations is 365.

The results must be:

The return value for the given return period is 106.34386649996667 ± 40.86691363790978 Hence, by the graphic, it is possible to say that the theoretical model is very well fitted. Also, it was possible to compute the return value in 100 years. In other words, the rainfall preciptation once in every 100 years must be between 65.4470 and 147.2108 mm.

Declustering

Stuart Coles's in his book says that if the extremes assume a tendency to be clustered in a stationary series, another pratice would be need to model these values. The pratice consists in declustering, which is: cluster data and decluster by its maximums. For this example, it is clear that, at least initialy, the dataset is not orgnanized in clusters. With the function decluster it is possible to observe the dataset plot against its unit of return period, but, also it is possible to cluster it using a given block size (in this example it will be monthly, then the block size will be 30 days), and then decluster it by taking the maximum of each block.

By running these lines: python thresh_modeling.decluster(data, 30, 30) The result must be:

It is important to say that the unit of the return period after the decluster changes (monthly). With the first graph is possible to observe that, at least initialy, there is not any pattern. However, it does not means that it is not possible to descluter the data set to a given block size, which is possible to see in the second graphic.

In a case that it is necessary to decluster the dataset, the second one, shown in the declustered graphic must be used.

Further Functions

The other functions that are not in this tutorial can be used as it is shown in the test file. The discription of each one is in the Functions Documenation.

Doubts

Any doubts about the package, don't hesitate to contact me.

General License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/

Referencing

For referencing the repository, use the following code: @misc{thresholdmodeling, author = {Iago P. Lemos and Antonio Marcos G. Lima and Marcus Antonio Viana Duarte}, title = {thresholdmodeling package}, month = Feb, year = 2020, doi = {10.5281/zenodo.3661338}, version = {0.0.1}, publisher = {Zenodo}, url = {https://github.com/iagolemos1/thresholdmodeling} }

Background

I am a mechanical engineering undergraduate student in the Federal University of Uberlândia and this package was made in the Acoustics and Vibration Laboratory, in the School of Mechanical Engineering.

JOSS Publication

thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution

Published

February 10, 2020

DOI

10.21105/joss.02013

Volume 5, Issue 46, Page 2013

Authors

Iago Pereira Lemos

Acoustics and Vibration Laboratory, School of Mechanical Engineering, Federal University of Uberlândia

Antônio Marcos Gonçalves Lima

Associate Professor, School of Mechanical Engineering, Federal University of Uberlândia

Marcus Antônio Viana Duarte

Associate Professor, Acoustics and Vibration Laboratory, School of Mechanical Engineering, Federal University of Uberlândia

Editor

Vincent Knight

GitHub Events

Total

Watch event: 3

Last Year

Watch event: 3

Committers

Last synced: 10 months ago

All Time

Total Commits: 183
Total Committers: 4
Avg Commits per committer: 45.75
Development Distribution Score (DDS): 0.022

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Iago Pereira Lemos	l**3@g**m	179
Joshua Harrison	j**a@c**y	2
Vince Knight	v**t@g**m	1
Daniel S. Katz	d**z@i**g	1

Committer Domains (Top 20 + Academic)

ieee.org: 1 crestone.energy: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 9
Total pull requests: 3
Average time to close issues: 4 days
Average time to close pull requests: about 1 hour
Total issue authors: 5
Total pull request authors: 3
Average comments per issue: 4.44
Average comments per pull request: 0.33
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

kellieotto (4)
bahung (2)
rjskene (1)
IanBurger (1)
andreowhite (1)

Pull Request Authors

drvinceknight (1)
JoshuaCrestone (1)
danielskatz (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 10 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 1
Total maintainers: 1

pypi.org: thresholdmodeling

This package is intended for those who wish to conduct an extreme values analysis. It provides the whole toolkit necessary to create a threshold model in a simple and efficient way, presenting the main methods towards the Peak-Over-Threshold Method and the fit in the Generalized Pareto Distribution. For installing and use it, go to https://github.com/iagolemos1/thresholdmodeling

Homepage: https://github.com/iagolemos1/thresholdmodeling
Documentation: https://thresholdmodeling.readthedocs.io/
License: GNU General Public License (GPL)
Latest release: 0.0.1
published over 6 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 10 Last month

Rankings

Forks count: 9.8%

Dependent packages count: 10.1%

Stargazers count: 11.5%

Average: 18.7%

Dependent repos count: 21.6%

Downloads: 40.5%

Maintainers (1)

iagolemos1

Last synced: 9 months ago

thresholdmodeling

Science Score: 95.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.md

thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution

Installing Package

Windows Users

Linux Users

User's guide and Reproducibility

Threshold Selection

Model Fit

Model Checking

Return Value Analysis

Declustering

Further Functions

Doubts

General License

Referencing

Background

JOSS Publication

thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: thresholdmodeling

Rankings

Maintainers (1)

Dependencies

`thresholdmodeling`: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution