https://github.com/atrcheema/ai4water

framework for developing machine (and deep) learning models for structured data

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary

Keywords

deep-learning machine-learning tabular-data time-series

Last synced: 5 months ago · JSON representation

Repository

framework for developing machine (and deep) learning models for structured data

Basic Info

Host: GitHub
Owner: AtrCheema
License: mit
Language: Python
Default Branch: master
Homepage: https://ai4water.readthedocs.io
Size: 77.9 MB

Statistics

Stars: 70
Watchers: 3
Forks: 23
Open Issues: 3
Releases: 4

Topics

deep-learning machine-learning tabular-data time-series

Created over 5 years ago · Last pushed 11 months ago

Metadata Files

Readme

AI4Water

GitHub code size in bytes GitHub last commit (branch)

A uniform and simplified framework for rapid experimentation with deep leaning and machine learning based models for time series and tabular data. To put into Andrej Karapathy's words

Because deep learning is so empirical, success in it is to a large extent proportional to raw experimental throughput, the ability to babysit a large number of experiments at once, staring at plots and tweaking/re-launching what works. This is necessary, but not sufficient.

The specific purposes of the repository are

compliment the functionality of keras/pytorch/sklearn by making pre and post-processing easier for time-series prediction/classification problems (also holds true for any tabular data).
save, load/reload or build models from readable json file. This repository provides a framework to build layered models using python dictionary and with several helper tools which fasten the process of modeling time-series forecasting.
provide a uniform interface for optimizing hyper-parameters for skopt; sklearn based grid and random; hyperopt based tpe, atpe or optuna based tpe, cmaes etc. See example
using its application.
cut short the time to write boilerplate code in developing machine learning based models.
It should be possible to overwrite/customize any of the functionality of the AI4Water's Model by subclassing the Model. So at the highest level you just need to initiate the Model, and then need fit, predict and view_model methods of Model class, but you can go as low as you could go with tensorflow/keras.
All the above functionalities should be available without complicating keras implementation.

Installation

An easy way to install ai4water is using pip

pip install ai4water

You can also use GitHub link

python -m pip install git+https://github.com/AtrCheema/AI4Water.git

or using setup file, go to folder where repo is downloaded

python setup.py install

The latest code however (possibly with fewer bugs and more features) can be installed from dev branch instead

python -m pip install git+https://github.com/AtrCheema/AI4Water.git@dev

To install the latest branch (dev) with all requirements use the following command

python -m pip install "AI4Water[all] @ git+https://github.com/AtrCheema/AI4Water.git@dev"

installation options

all keyword will install all the dependencies. You can choose the dependencies of particular sub-module by using the specific keyword. Following keywords are available

hpo if you want hyperparameter optimization
post_process if you want postprocessing
exp for experiments sub-module

Sub-modules

AI4Water consists of several submodules, each of wich responsible for a specific tasks. The modules are also liked with each other. For understanding sub-module structure of ai4water, see this article

How to use

Build a Model by providing all the arguments to initiate it.

```python from ai4water import Model from ai4water.models import MLP from ai4water.datasets import mgphotodegradation data, * = mg_photodegradation(encoding="le")

model = Model( # define the model/algorithm model=MLP(units=24, activation="relu", dropout=0.2), # columns in data file to be used as input inputfeatures=data.columns.tolist()[0:-1], # columns in csv file to be used as output outputfeatures=data.columns.tolist()[-1:], lr=0.001, # learning rate batch_size=8, # batch size epochs=500, # number of epochs to train the neural network patience=50, # used for early stopping ) ```

Train the model by calling the fit() method python history = model.fit(data=data)

After training, we can make predictions from it on test/training data python prediction = model.predict_on_test_data(data=data)

The model object returned from initiating AI4Water's Model is same as that of Keras' Model We can verify it by checking its type python import tensorflow as tf isinstance(model, tf.keras.Model) # True

Using your own pre-processed data

You can use your own pre-processed data without using any of pre-processing tools of AI4Water. You will need to provide input output paris to data argument to fit and/or predict methods. ```python import numpy as np from ai4water import Model # import any of the above model from ai4water.models import LSTM

batch_size = 16 lookback = 15 inputs = ['dummy1', 'dummy2', 'dummy3', 'dummy4', 'dummy5'] # just dummy names for plotting and saving results. outputs=['DummyTarget']

model = Model( model = LSTM(units=64), batchsize=batchsize, tsargs={'lookback':lookback}, inputfeatures=inputs, outputfeatures=outputs, lr=0.001 ) x = np.random.random((batchsize10, lookback, len(inputs))) y = np.random.random((batch_size10, len(outputs)))

model.fit(x=x,y=y)

```

using for `scikit-learn`/`xgboost`/`lgbm`/`catboost` based models

The repository can also be used for machine learning based models such as scikit-learn/xgboost based models for both classification and regression problems by making use of model keyword arguments in Model function. However, integration of ML based models is not complete yet. ```python from ai4water import Model from ai4water.datasets import busan_beach

data = busan_beach() # path for data file

model = Model( # columns in data to be used as input inputfeatures=['tidecm', 'wattempc', 'salpsu', 'relhum', 'pcpmm'], outputfeatures = ['tetxcoppml'], # columns in data file to be used as input seed=1872, valfraction=0.0, split_random=True, # any regressor from https://scikit-learn.org/stable/modules/classes.html model={"RandomForestRegressor": {}}, # set any of regressor's parameters. e.g. for RandomForestRegressor above used, # some of the paramters are https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor )

history = model.fit(data=data)

model.predictontest_data(data=data) ```

Hyperparameter optimization

For hyperparameter optimization, replace the actual values of hyperparameters with the space. ```python

from ai4water.functional import Model from ai4water.datasets import MtropicsLaos from ai4water.hyperopt import Real, Integer

data = MtropicsLaos().makeregression(lookbacksteps=1)

model = Model( model = {"RandomForestRegressor": { "nestimators": Integer(low=5, high=30, name='nestimators', numsamples=10), "maxleafnodes": Integer(low=2, high=30, prior='log', name='maxleafnodes', numsamples=10), "minweightfractionleaf": Real(low=0.0, high=0.5, name='minweightfractionleaf', numsamples=10), "maxdepth": Integer(low=2, high=10, name='maxdepth', numsamples=10), "minsamplessplit": Integer(low=2, high=10, name='minsamplessplit', numsamples=10), "minsamplesleaf": Integer(low=1, high=5, name='minsamplesleaf', numsamples=10), }}, inputfeatures=data.columns.tolist()[0:-1], outputfeatures=data.columns.tolist()[-1:], crossvalidator = {"KFold": {"nsplits": 5}}, xtransformation="zscore", ytransformation="log", )

First check the performance on test data with default parameters

model.fitonalltrainingdata(data=data) print(model.evaluateontestdata(data=data, metrics=["r2score", "r2"]))

optimize the hyperparameters

optimizer = model.optimizehyperparameters( algorithm = "bayes", # you can choose between random, grid or tpe data=data, numiterations=60, )

Now check the performance on test data with default parameters

print(model.evaluateontestdata(data=data, metrics=["r2score", "r2"])) ```

Running the above code will optimize the hyperparameters and generate following figures

Experiments

The experiments module is for comparison of multiple models on a single data or for comparison of one model under different conditions.

```python from ai4water.datasets import busan_beach from ai4water.experiments import MLRegressionExperiments

data = busan_beach()

comparisons = MLRegressionExperiments( inputfeatures=data.columns.tolist()[0:-1], outputfeatures=data.columns.tolist()[-1:], split_random=True )

train all the available machine learning models

comparisons.fit(data=data)

Compare R2 of models

bestmodels = comparisons.compareerrors( 'r2', data=data, cutofftype='greater', cutoffval=0.1, figsize=(8, 9), colors=['salmon', 'cadetblue'] )

Compare model performance using Taylor diagram

_ = comparisons.taylorplot( data=data, figsize=(5, 9), exclude=["DummyRegressor", "XGBRFRegressor", "SGDRegressor", "KernelRidge", "PoissonRegressor"], legkws={'facecolor': 'white', 'edgecolor': 'black','bboxtoanchor':(2.0, 0.9), 'fontsize': 10, 'labelspacing': 1.0, 'ncol': 2 }, ) ```

For more comprehensive and detailed examples see

Disclaimer

The library is still under development. Fundamental changes are expected without prior notice or without regard of backward compatability.

sktime: A Unified Interface for Machine Learning with Time Series

Seglearn: A Python Package for Learning Sequences and Time Series

Pastas: Open Source Software for the Analysis of Groundwater Time Series

Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh -- A Python package)

MLAir

pyts: A Python Package for Time Series Classification

Tslearn, A Machine Learning Toolkit for Time Series Data

TSFEL: Time Series Feature Extraction Library

catch22

vest

pyunicorn (Unified Complex Network and RecurreNce analysis toolbox

TSFuse Python package for automatically constructing features from multi-view time series data

Catalyst

tsai - A state-of-the-art deep learning library for time series and sequential data

Owner

Name: Ather Abbas
Login: AtrCheema
Kind: user
Location: South Korea
Company: Environmental Modeling and Monitoring Lab, UNIST

Repositories: 7
Profile: https://github.com/AtrCheema

GitHub Events

Total

Watch event: 5
Push event: 2

Last Year

Watch event: 5
Push event: 2

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 1,546
Total Committers: 4
Avg Commits per committer: 386.5
Development Distribution Score (DDS): 0.061

Top Committers

Name	Email	Commits
AtrCheema	a**6@y**m	1,451
Sara Iftikhar	s**k@g**m	92
kwon3969	k**9@g**m	2
eggworld	e**2@g**m	1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 4
Total pull requests: 35
Average time to close issues: 7 months
Average time to close pull requests: 6 days
Total issue authors: 3
Total pull request authors: 3
Average comments per issue: 1.0
Average comments per pull request: 0.23
Merged pull requests: 28
Bot issues: 0
Bot pull requests: 7

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jmp75 (2)
Zahid600 (1)
binxiaoxiaobin (1)

Pull Request Authors

AtrCheema (27)
dependabot[bot] (7)
Sara-Iftikhar (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (7)

Packages

Total packages: 1
Total downloads:
- pypi 41 last-month

Total dependent packages: 0
Total dependent repositories: 4
Total versions: 14
Total maintainers: 1

pypi.org: ai4water

Platform for developing data driven based models for sequential/tabular data

Homepage: https://github.com/AtrCheema/AI4Water
Documentation: https://ai4water.readthedocs.io/
License: mit
Latest release: 1.6
published about 3 years ago

Versions: 14
Dependent Packages: 0
Dependent Repositories: 4
Downloads: 41 Last month

Rankings

Dependent repos count: 7.5%

Forks count: 8.7%

Stargazers count: 9.5%

Dependent packages count: 10.0%

Average: 11.0%

Downloads: 19.1%

Maintainers (1)

atrcheema

Last synced: 6 months ago

Dependencies

.binder/requirements.txt pypi

ai4water *
catboost *
lightgbm *
optuna *
seaborn *
sphinx-gallery *
tensorflow ==2.7
xgboost *

docs/requirements.txt pypi

SeqMetrics *
catboost *
easy_mpl *
keras-tcn *
lightgbm *
matplotlib *
numpy *
optuna *
pandas *
plotly *
pyshp *
scikit-learn *
scikit-optimize *
scipy *
seaborn *
shapely *
sphinx *
sphinx-gallery *
sphinx-prompt *
sphinx_copybutton *
sphinx_issues *
sphinx_rtd_theme *
sphinx_toggleprompt *
tensorflow ==2.7
torch *
xgboost *

requirements.txt pypi

SeqMetrics >=1.3.3
easy_mpl >=0.20.4
joblib *
matplotlib *
numpy *
pandas *
requests *
scikit-learn *

requirements_all.txt pypi

SHAP *
SeqMetrics >=1.3.3
catboost *
dill *
easy_mpl >=0.20.4
h5py <2.11.0
hyperopt *
imageio *
joblib *
lightgbm *
matplotlib *
numpy >=1.16.5
openpyxl *
optuna *
pandas *
plotly *
psutil *
pyshp *
scikit-learn >=0.22
scikit-optimize >=0.8.1
seaborn *
tensorflow *
tpot *
wandb *
wrapt *
xarray *
xgboost *

.github/workflows/release.yml actions

marvinpinto/action-automatic-releases latest composite

.github/workflows/tf.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

setup.py pypi

https://github.com/atrcheema/ai4water

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

AI4Water

Installation

installation options

Sub-modules

How to use

Using your own pre-processed data

using for scikit-learn/xgboost/lgbm/catboost based models

Hyperparameter optimization

First check the performance on test data with default parameters

optimize the hyperparameters

Now check the performance on test data with default parameters

Experiments

train all the available machine learning models

Compare R2 of models

Compare model performance using Taylor diagram

Disclaimer

Related

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: ai4water

Rankings

Maintainers (1)

Dependencies

using for `scikit-learn`/`xgboost`/`lgbm`/`catboost` based models