frouros
Frouros: an open-source Python library for drift detection in machine learning systems.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 64 DOI reference(s) in README -
✓Academic publication links
Links to: researchgate.net, sciencedirect.com, acm.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Frouros: an open-source Python library for drift detection in machine learning systems.
Basic Info
- Host: GitHub
- Owner: IFCA-Advanced-Computing
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://frouros.readthedocs.io
- Size: 22.3 MB
Statistics
- Stars: 224
- Watchers: 4
- Forks: 17
- Open Issues: 17
- Releases: 22
Topics
Metadata Files
README.md
Frouros is a Python library for drift detection in machine learning systems that provides a combination of classical and more recent algorithms for both concept and data drift detection.
"Everything changes and nothing stands still"
"You could not step twice into the same river"
Heraclitus of Ephesus (535-475 BCE.)
⚡️ Quickstart
🔄 Concept drift
As a quick example, we can use the breast cancer dataset to which concept drift it is induced and show the use of a concept drift detector like DDM (Drift Detection Method). We can see how concept drift affects the performance in terms of accuracy.
```python import numpy as np from sklearn.datasets import loadbreastcancer from sklearn.linearmodel import LogisticRegression from sklearn.modelselection import traintestsplit from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler
from frouros.detectors.concept_drift import DDM, DDMConfig from frouros.metrics import PrequentialError
np.random.seed(seed=31)
Load breast cancer dataset
X, y = loadbreastcancer(returnXy=True)
Split train (70%) and test (30%)
( Xtrain, Xtest, ytrain, ytest, ) = traintestsplit(X, y, trainsize=0.7, randomstate=31)
Define and fit model
pipeline = Pipeline( [ ("scaler", StandardScaler()), ("model", LogisticRegression()), ] ) pipeline.fit(X=Xtrain, y=ytrain)
Detector configuration and instantiation
config = DDMConfig( warninglevel=2.0, driftlevel=3.0, minnuminstances=25, # minimum number of instances before checking for concept drift ) detector = DDM(config=config)
Metric to compute accuracy
metric = PrequentialError(alpha=1.0) # alpha=1.0 is equivalent to normal accuracy
def streamtest(Xtest, ytest, y, metric, detector): """Simulate data stream over Xtest and ytest. y is the true label.""" driftflag = False for i, (X, y) in enumerate(zip(Xtest, ytest)): ypred = pipeline.predict(X.reshape(1, -1)) error = 1 - (ypred.item() == y.item()) metricerror = metric(errorvalue=error) _ = detector.update(value=error) status = detector.status if status["drift"] and not driftflag: driftflag = True print(f"Concept drift detected at step {i}. Accuracy: {1 - metricerror:.4f}") if not driftflag: print("No concept drift detected") print(f"Final accuracy: {1 - metric_error:.4f}\n")
Simulate data stream (assuming test label available after each prediction)
No concept drift is expected to occur
streamtest( Xtest=Xtest, ytest=y_test, y=y, metric=metric, detector=detector, )
>> No concept drift detected
>> Final accuracy: 0.9766
IMPORTANT: Induce/simulate concept drift in the last part (20%)
of y_test by modifying some labels (50% approx). Therefore, changing P(y|X))
driftsize = int(ytest.shape[0] * 0.2) ytestdrift = ytest[-driftsize:] modifyidx = np.random.rand(*ytestdrift.shape) <= 0.5 ytestdrift[modifyidx] = (ytestdrift[modifyidx] + 1) % len(np.unique(ytest)) ytest[-driftsize:] = ytestdrift
Reset detector and metric
detector.reset() metric.reset()
Simulate data stream (assuming test label available after each prediction)
Concept drift is expected to occur because of the label modification
streamtest( Xtest=Xtest, ytest=y_test, y=y, metric=metric, detector=detector, )
>> Concept drift detected at step 142. Accuracy: 0.9510
>> Final accuracy: 0.8480
```
More concept drift examples can be found here.
📊 Data drift
As a quick example, we can use the iris dataset to which data drift is induced and show the use of a data drift detector like Kolmogorov-Smirnov test.
```python import numpy as np from sklearn.datasets import loadiris from sklearn.modelselection import traintestsplit from sklearn.tree import DecisionTreeClassifier
from frouros.detectors.data_drift import KSTest
np.random.seed(seed=31)
Load iris dataset
X, y = loadiris(returnX_y=True)
Split train (70%) and test (30%)
( Xtrain, Xtest, ytrain, ytest, ) = traintestsplit(X, y, trainsize=0.7, randomstate=31)
Set the feature index to which detector is applied
feature_idx = 0
IMPORTANT: Induce/simulate data drift in the selected feature of y_test by
applying some gaussian noise. Therefore, changing P(X))
Xtest[:, featureidx] += np.random.normal( loc=0.0, scale=3.0, size=X_test.shape[0], )
Define and fit model
model = DecisionTreeClassifier(randomstate=31) model.fit(X=Xtrain, y=y_train)
Set significance level for hypothesis testing
alpha = 0.001
Define and fit detector
detector = KSTest() _ = detector.fit(X=Xtrain[:, featureidx])
Apply detector to the selected feature of X_test
result, _ = detector.compare(X=Xtest[:, featureidx])
Check if drift is taking place
if result.pvalue <= alpha: print(f"Data drift detected at feature {featureidx}") else: print(f"No data drift detected at feature {feature_idx}")
>> Data drift detected at feature 0
Therefore, we can reject H0 (both samples come from the same distribution).
```
More data drift examples can be found here.
🛠 Installation
Frouros can be installed via pip:
bash
pip install frouros
🕵🏻♂️️ Drift detection methods
The currently implemented detectors are listed in the following table.
| Drift detector | Type | Family | Univariate (U) / Multivariate (M) | Numerical (N) / Categorical (C) | Method | Reference |
|---|---|---|---|---|---|---|
| Concept drift | Streaming | Change detection | U | N | BOCD | Adams and MacKay (2007) |
| U | N | CUSUM | Page (1954) | |||
| U | N | Geometric moving average | Roberts (1959) | |||
| U | N | Page Hinkley | Page (1954) | |||
| Statistical process control | U | N | DDM | Gama et al. (2004) | ||
| U | N | ECDD-WT | Ross et al. (2012) | |||
| U | N | EDDM | Baena-Garcıa et al. (2006) | |||
| U | N | HDDM-A | Frias-Blanco et al. (2014) | |||
| U | N | HDDM-W | Frias-Blanco et al. (2014) | |||
| U | N | RDDM | Barros et al. (2017) | |||
| Window based | U | N | ADWIN | Bifet and Gavalda (2007) | ||
| U | N | KSWIN | Raab et al. (2020) | |||
| U | N | STEPD | Nishida and Yamauchi (2007) | |||
| Data drift | Batch | Distance based | U | N | Bhattacharyya distance | Bhattacharyya (1946) |
| U | N | Earth Mover's distance | Rubner et al. (2000) | |||
| U | N | Energy distance | Székely et al. (2013) | |||
| U | N | Hellinger distance | Hellinger (1909) | |||
| U | N | Histogram intersection normalized complement | Swain and Ballard (1991) | |||
| U | N | Jensen-Shannon distance | Lin (1991) | |||
| U | N | Kullback-Leibler divergence | Kullback and Leibler (1951) | |||
| M | N | Maximum Mean Discrepancy | Gretton et al. (2012) | |||
| U | N | Population Stability Index | Wu and Olson (2010) | |||
| Statistical test | U | N | Anderson-Darling test | Scholz and Stephens (1987) | ||
| U | N | Baumgartner-Weiss-Schindler test | Baumgartner et al. (1998) | U | C | Chi-square test | Pearson (1900) |
| U | N | Cramér-von Mises test | Cramér (1902) | |||
| U | N | Kolmogorov-Smirnov test | Massey Jr (1951) | |||
| U | N | Kuiper's test | Kuiper (1960) | |||
| U | N | Mann-Whitney U test | Mann and Whitney (1947) | |||
| U | N | Welch's t-test | Welch (1947) | |||
| Streaming | Distance based | M | N | Maximum Mean Discrepancy | Gretton et al. (2012) | |
| Statistical test | U | N | Incremental Kolmogorov-Smirnov test | dos Reis et al. (2016) |
❗ What is and what is not Frouros?
Unlike other libraries that in addition to provide drift detection algorithms, include other functionalities such as anomaly/outlier detection, adversarial detection, imbalance learning, among others, Frouros has and will ONLY have one purpose: drift detection.
We firmly believe that machine learning related libraries or frameworks should not follow Jack of all trades, master of none principle. Instead, they should be focused on a single task and do it well.
✅ Who is using Frouros?
Frouros is actively being used by the following projects to implement drift detection in machine learning pipelines:
If you want your project listed here, do not hesitate to send us a pull request.
👍 Contributing
Check out the contribution section.
💬 Citation
If you want to cite Frouros you can use the SoftwareX publication.
bibtex
@article{CESPEDESSISNIEGA2024101733,
title = {Frouros: An open-source Python library for drift detection in machine learning systems},
journal = {SoftwareX},
volume = {26},
pages = {101733},
year = {2024},
issn = {2352-7110},
doi = {https://doi.org/10.1016/j.softx.2024.101733},
url = {https://www.sciencedirect.com/science/article/pii/S2352711024001043},
author = {Jaime {Céspedes Sisniega} and Álvaro {López García}},
keywords = {Machine learning, Drift detection, Concept drift, Data drift, Python},
abstract = {Frouros is an open-source Python library capable of detecting drift in machine learning systems. It provides a combination of classical and more recent algorithms for drift detection, covering both concept and data drift. We have designed it to be compatible with any machine learning framework and easily adaptable to real-world use cases. The library is developed following best development and continuous integration practices to ensure ease of maintenance and extensibility.}
}
📝 License
Frouros is an open-source software licensed under the BSD-3-Clause license.
🙏 Acknowledgements
Frouros has received funding from the Agencia Estatal de Investigación, Unidad de Excelencia María de Maeztu, ref. MDM-2017-0765.
Owner
- Name: IFCA Advanced Computing and e-Science group
- Login: IFCA-Advanced-Computing
- Kind: organization
- Location: Santander, Spain
- Website: http://computing.ifca.es/
- Twitter: IFCA_Computing
- Repositories: 56
- Profile: https://github.com/IFCA-Advanced-Computing
Citation (CITATION.cff)
cff-version: 1.2.0
title: >-
Frouros: An open-source Python library for drift detection
in machine learning systems
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Jaime
family-names: Céspedes Sisniega
email: cespedes@ifca.unican.es
orcid: 'https://orcid.org/0000-0002-6010-1212'
affiliation: >-
Institute of Physics of Cantabria, Spanish National
Research Council — IFCA (CSIC—UC)
- given-names: Álvaro
family-names: López García
email: aloga@ifca.unican.es
orcid: 'https://orcid.org/0000-0002-0013-4602'
affiliation: >-
Institute of Physics of Cantabria, Spanish National
Research Council — IFCA (CSIC—UC)
identifiers:
- type: doi
value: 10.1016/j.softx.2024.101733
description: SoftwareX
- type: doi
value: 10.48550/arXiv.2208.06868
description: arXiv
repository-code: 'https://github.com/IFCA-Advanced-Computing/frouros'
url: 'https://frouros.readthedocs.io'
repository: 'https://github.com/ElsevierSoftwareX/SOFTX-D-24-00119'
repository-artifact: 'https://pypi.org/project/frouros'
abstract: >-
Frouros is an open-source Python library capable of detecting drift in machine learning systems. It provides a combination of classical and more recent algorithms for drift detection, covering both concept and data drift. We have designed it to be compatible with any machine learning framework and easily adaptable to real-world use cases. The library is developed following best development and continuous integration practices to ensure ease of maintenance and extensibility.
keywords:
- Machine learning
- Drift detection
- Concept drift
- Data drift
- Python
license: BSD-3-Clause
commit: 4e1e27ee73507b15090f0038d8dda7c67485b728
version: 0.8.0
date-released: '2024-04-03'
GitHub Events
Total
- Issues event: 1
- Watch event: 37
- Delete event: 11
- Issue comment event: 9
- Push event: 19
- Pull request event: 33
- Pull request review event: 8
- Fork event: 3
- Create event: 20
Last Year
- Issues event: 1
- Watch event: 37
- Delete event: 11
- Issue comment event: 9
- Push event: 19
- Pull request event: 33
- Pull request review event: 8
- Fork event: 3
- Create event: 20
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jaime Céspedes Sisniega | j****a@g****m | 611 |
| dependabot[bot] | 4****] | 23 |
| Jaime Céspedes Sisniega | c****s@i****s | 6 |
| Alvaro Lopez Garcia | a****a@i****s | 4 |
| Jaime Céspedes Sisniega | 7****a | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 12
- Total pull requests: 174
- Average time to close issues: 6 months
- Average time to close pull requests: 12 days
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 0.75
- Average comments per pull request: 0.09
- Merged pull requests: 159
- Bot issues: 0
- Bot pull requests: 61
Past Year
- Issues: 0
- Pull requests: 31
- Average time to close issues: N/A
- Average time to close pull requests: 3 months
- Issue authors: 0
- Pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.39
- Merged pull requests: 17
- Bot issues: 0
- Bot pull requests: 24
Top Authors
Issue Authors
- jaime-cespedes-sisniega (11)
- Tiffany-TW (1)
Pull Request Authors
- jaime-cespedes-sisniega (136)
- dependabot[bot] (92)
- MarcBresson (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 4,531 last-month
- Total dependent packages: 0
- Total dependent repositories: 2
- Total versions: 22
- Total maintainers: 2
pypi.org: frouros
An open-source Python library for drift detection in machine learning systems
- Homepage: https://github.com/IFCA-Advanced-Computing/frouros
- Documentation: https://frouros.readthedocs.io/
- License: BSD-3-Clause
-
Latest release: 0.9.0
published over 1 year ago
Rankings
Maintainers (2)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- readthedocs/actions/preview v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- matplotlib >=3.6.0,<3.8
- numpy >=1.24.0,<1.26
- requests >=2.31.0,<2.32
- scipy >=1.10.0,<1.11
- tqdm >=4.65,<5