thresholdmodeling
thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution - Published in JOSS (2020)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: springer.com, joss.theoj.org, zenodo.org -
✓Committers with academic emails
1 of 4 committers (25.0%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
Basic Info
- Host: GitHub
- Owner: iagolemos1
- License: lgpl-3.0
- Language: Python
- Default Branch: master
- Size: 534 KB
Statistics
- Stars: 34
- Watchers: 3
- Forks: 15
- Open Issues: 1
- Releases: 1
Metadata Files
README.md
thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution
This package is intended for those who wish to conduct an extreme values analysis. It provides the whole toolkit necessary to create a threshold model in a simple and efficient way, presenting the main methods towards the Peak-Over-Threshold method and the fit in the Generalized Pareto Distribution.
In this repository you can find the main files of the package, the Functions Documenation, the dataset used in some examples, the paper submitted to the Jounal of Open Source Software and some tutorials.
Installing Package
It is necessary to have internet connection and use Anaconda distribution (Python 3).
For installing Anaconda on Linux, go to this link. For installing on Windows, go to this one. For istalling on macOS, go to this one.
For creating your own environment by using the terminal or Anaconda Prompt, go here.
Windows Users
Firstly, it will necessary to install R on your environment and considering that rpy2 (a python dependency package for thresholdmodeling) does not have Windows support, installing it from pip install thresholdmodeling will result in an error, the same occurs with pip install rpy2. Then, it is necessary to download it from an unuofficial website:
https://www.lfd.uci.edu/~gohlke/pythonlibs/
Here, you must find the rpy2 realese which works on your machine and install it manually going to the download folder with the Anaconda Prompt and run this line, for example (it will depend on the name of the downloaded file):
pip install rpy2‑2.9.5‑cp37‑cp37m‑win_amd64.whl
Or you can install it from the the Anaconda Prompt by activating your environment and running:
conda activate my_env
conda install r
conda install -c r rpy2=2.9.4
After that, rpy2 and R will be installed on your machine. Follow the next steps.
For installing the package just use the following command on your Anaconda Prompt (it is already in PyPi):
pip install thresholdmodeling
The others Python dependencies for runing the software will install automatically with this command.
Once the package is installed, it is necessary to run these lines on your IDE for installing POT R package (package that our software uses by means of rpy2 for computing GPD estimatives):
```python
from rpy2.robjects.packages import importr
import rpy2.robjects.packages as rpackages
base = importr('base') utils = importr('utils') utils.chooseCRANmirror(ind=1) utils.install_packages('POT') #installing POT package ```
Linux Users
Firstly, run this lines on your terminal in order to install R and rpy2 package on your environment:
conda activate my_env (my_env is your environment name)
conda install r
conda install -c r rpy2=2.9.4
After installing R and rpy2, find your anaconda directory, and find the actual environment folder. It should be somewhere like ~/anaconda3/envs/my_env. Open the terminal in this folder and run this line (the others dependencies will automatically install):
pip install thresholdmodeling
Once the package is installed, it is necessary to run this lines on your IDE for installing POT R package (package that our software uses by means of rpy2 for computing GPD estimatives):
```python from rpy2.robjects.packages import importr import rpy2.robjects.packages as rpackages
base = importr('base')
utils = importr('utils')
utils.chooseCRANmirror(ind=1)
utils.install_packages('POT') #installing POT package
Or, it is possible to download this file in order to run it in yout IDE and installing POT.
```
User's guide and Reproducibility
In the file example it is possible to see how the package should be used. In Functions Documenation it may be seen a complete documentation on how to use the functions presented in the package.
In order to present a tutorial on how to use the package and its results, a guide is presented below, using the example on the Coles's book with the Daily Rainfall in South-West England dataset.
Threshold Selection
Firstly, it is necessary to conduct a threshold value analysis using the first two functions of the package: MRL and Parameter_Stability_Plot, in order to select a reasonable threshold value.
Runing this:
```python
from thresholdmodeling import thresh_modeling #importing package
import pandas as pd #importing pandas
url = 'https://raw.githubusercontent.com/iagolemos1/thresholdmodeling/master/dataset/rain.csv' #saving url df = pd.readcsv(url, errorbad_lines=False) #getting data data = df.values.ravel() #turning data into an array
threshmodeling.MRL(data, 0.05)
threshmodeling.ParameterStabilityplot(data, 0.05)
```
The results must be:



Then, by analysing the three graphics, it is reasonable taking the threshold value as 30.
Model Fit
Once the threshold value is defined, it is possible to fit the dataset to a GPD model by using the function gpdfitrunning the following line and using the maximum likelihood estimation method:
python
thresh_modeling.gpdfit(data, 30, 'mle')
The results must be in Terminal like: ``` Estimator: MLE
Deviance: 970.1874
AIC: 974.1874
Varying Threshold: FALSE
Threshold Call: 30L
Number Above: 152
Proportion Above: 0.0087
Estimates
scale shape
7.4411 0.1845
Standard Error Type: observed
Standard Errors
scale shape
0.9587 0.1012
Asymptotic Variance Covariance
scale shape
scale 0.91920 -0.06554
shape -0.06554 0.01025
Optimization Information
Convergence: successful
Function Evaluations: 14
Gradient Evaluations: 6 ``` These are the GPD model estimatives using the maximum likelihood estimator.
Model Checking
Once the GPD model is defined, it is necessary to verify if the model is reasonable and describes well the empirical observations. Plots like probability density function, cumulative distribution function, quantile-quantile and probability-probability can show to us if the model is good. It is possible to obtain these plots using some functions of the package: gpdpdf, gpdcdf, qqplot and ppplot. By running these lines:
python
thresh_modeling.gpdpdf(data, 30, 'mle', 'sturges', 0.05)
thresh_modeling.gpdcdf(data, 30, 'mle', 0.05)
thresh_modeling.qqplot(data,30, 'mle', 0.05)
thresh_modeling.ppplot(data, 30, 'mle', 0.05)
The results must be:




Once it is possible to verifiy that the theoretical model describes very well the empirical observations, the next step is to use the main tool of the extreme values approach: extrapolation over the unit of the return period.
Return Value Analysis
The first thing that must be defined is: what is the unit of the return period? In this example, the unit is days because the observations are daily, but in other applications, like corrosion engineering, the unit may be number of observations.
Using the function return_value is possible to get two informations:
* 1 : The return value for a given return period and;
* 2 : The return value plot, that works very well for a model diagnostic.
By running this line (go to Model Diagnostics and Return Level Analysis for more information about the function):
python
thresh_modeling.return_value(data, 30, 0.05, 365, 36500, 'mle')
It means, the return period we want to know the exact return value is 36500 days or 100 years. With the 365, we are saying that the annual number of observations is 365.
The results must be:

The return value for the given return period is 106.34386649996667 ± 40.86691363790978
Hence, by the graphic, it is possible to say that the theoretical model is very well fitted.
Also, it was possible to compute the return value in 100 years. In other words, the rainfall preciptation once in every 100 years must be between 65.4470 and 147.2108 mm.
Declustering
Stuart Coles's in his book says that if the extremes assume a tendency to be clustered in a stationary series, another pratice would be need to model these values. The pratice consists in declustering, which is: cluster data and decluster by its maximums. For this example, it is clear that, at least initialy, the dataset is not orgnanized in clusters. With the function decluster it is possible to observe the dataset plot against its unit of return period, but, also it is possible to cluster it using a given block size (in this example it will be monthly, then the block size will be 30 days), and then decluster it by taking the maximum of each block.
By running these lines:
python
thresh_modeling.decluster(data, 30, 30)
The result must be:


It is important to say that the unit of the return period after the decluster changes (monthly). With the first graph is possible to observe that, at least initialy, there is not any pattern. However, it does not means that it is not possible to descluter the data set to a given block size, which is possible to see in the second graphic.
In a case that it is necessary to decluster the dataset, the second one, shown in the declustered graphic must be used.
Further Functions
The other functions that are not in this tutorial can be used as it is shown in the test file. The discription of each one is in the Functions Documenation.
Doubts
Any doubts about the package, don't hesitate to contact me.
General License
Copyright (c) 2019 Iago Pereira Lemos
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/
Referencing
For referencing the repository, use the following code:
@misc{thresholdmodeling,
author = {Iago P. Lemos and Antonio Marcos G. Lima and Marcus Antonio Viana Duarte},
title = {thresholdmodeling package},
month = Feb,
year = 2020,
doi = {10.5281/zenodo.3661338},
version = {0.0.1},
publisher = {Zenodo},
url = {https://github.com/iagolemos1/thresholdmodeling}
}
Background
I am a mechanical engineering undergraduate student in the Federal University of Uberlândia and this package was made in the Acoustics and Vibration Laboratory, in the School of Mechanical Engineering.
JOSS Publication
thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution
Authors
Acoustics and Vibration Laboratory, School of Mechanical Engineering, Federal University of Uberlândia
Tags
Threshold Models Peak-Over-Threshold Method Generalized Pareto Distribution Estatistical ModelingGitHub Events
Total
- Watch event: 3
Last Year
- Watch event: 3
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Iago Pereira Lemos | l****3@g****m | 179 |
| Joshua Harrison | j****a@c****y | 2 |
| Vince Knight | v****t@g****m | 1 |
| Daniel S. Katz | d****z@i****g | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 9
- Total pull requests: 3
- Average time to close issues: 4 days
- Average time to close pull requests: about 1 hour
- Total issue authors: 5
- Total pull request authors: 3
- Average comments per issue: 4.44
- Average comments per pull request: 0.33
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kellieotto (4)
- bahung (2)
- rjskene (1)
- IanBurger (1)
- andreowhite (1)
Pull Request Authors
- drvinceknight (1)
- JoshuaCrestone (1)
- danielskatz (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 10 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 1
- Total maintainers: 1
pypi.org: thresholdmodeling
This package is intended for those who wish to conduct an extreme values analysis. It provides the whole toolkit necessary to create a threshold model in a simple and efficient way, presenting the main methods towards the Peak-Over-Threshold Method and the fit in the Generalized Pareto Distribution. For installing and use it, go to https://github.com/iagolemos1/thresholdmodeling
- Homepage: https://github.com/iagolemos1/thresholdmodeling
- Documentation: https://thresholdmodeling.readthedocs.io/
- License: GNU General Public License (GPL)
-
Latest release: 0.0.1
published about 6 years ago
Rankings
Maintainers (1)
Dependencies
- matplotlib *
- numpy *
- pandas *
- scipy *
- thresholdmodeling *
- matplotlib *
- numpy *
- rpy2 *
- scipy *
- seaborn *