https://github.com/atrcheema/ai4water
framework for developing machine (and deep) learning models for structured data
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Keywords
Repository
framework for developing machine (and deep) learning models for structured data
Basic Info
- Host: GitHub
- Owner: AtrCheema
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://ai4water.readthedocs.io
- Size: 77.9 MB
Statistics
- Stars: 70
- Watchers: 3
- Forks: 23
- Open Issues: 3
- Releases: 4
Topics
Metadata Files
README.md
AI4Water
A uniform and simplified framework for rapid experimentation with deep leaning and machine learning based models for time series and tabular data. To put into Andrej Karapathy's words
Because deep learning is so empirical, success in it is to a large extent proportional to raw experimental throughput,
the ability to babysit a large number of experiments at once, staring at plots and tweaking/re-launching what works.
This is necessary, but not sufficient.
The specific purposes of the repository are
compliment the functionality of
keras/pytorch/sklearnby making pre and post-processing easier for time-series prediction/classification problems (also holds true for any tabular data).save, load/reload or build models from readable json file. This repository provides a framework to build layered models using python dictionary and with several helper tools which fasten the process of modeling time-series forecasting.
provide a uniform interface for optimizing hyper-parameters for skopt; sklearn based grid and random; hyperopt based tpe, atpe or optuna based tpe, cmaes etc. See example
using its application.cut short the time to write boilerplate code in developing machine learning based models.
It should be possible to overwrite/customize any of the functionality of the AI4Water's
Modelby subclassing theModel. So at the highest level you just need to initiate theModel, and then needfit,predictandview_modelmethods ofModelclass, but you can go as low as you could go with tensorflow/keras.All the above functionalities should be available without complicating keras implementation.
Installation
An easy way to install ai4water is using pip
pip install ai4water
You can also use GitHub link
python -m pip install git+https://github.com/AtrCheema/AI4Water.git
or using setup file, go to folder where repo is downloaded
python setup.py install
The latest code however (possibly with fewer bugs and more features) can be installed from dev branch instead
python -m pip install git+https://github.com/AtrCheema/AI4Water.git@dev
To install the latest branch (dev) with all requirements use the following command
python -m pip install "AI4Water[all] @ git+https://github.com/AtrCheema/AI4Water.git@dev"
installation options
all keyword will install all the dependencies. You can choose the dependencies of particular sub-module
by using the specific keyword. Following keywords are available
hpoif you want hyperparameter optimizationpost_processif you want postprocessingexpfor experiments sub-module
Sub-modules
AI4Water consists of several submodules, each of wich responsible for a specific tasks. The modules are also liked with each other. For understanding sub-module structure of ai4water, see this article
How to use
Build a Model by providing all the arguments to initiate it.
```python from ai4water import Model from ai4water.models import MLP from ai4water.datasets import mgphotodegradation data, * = mg_photodegradation(encoding="le")
model = Model( # define the model/algorithm model=MLP(units=24, activation="relu", dropout=0.2), # columns in data file to be used as input inputfeatures=data.columns.tolist()[0:-1], # columns in csv file to be used as output outputfeatures=data.columns.tolist()[-1:], lr=0.001, # learning rate batch_size=8, # batch size epochs=500, # number of epochs to train the neural network patience=50, # used for early stopping ) ```
Train the model by calling the fit() method
python
history = model.fit(data=data)
After training, we can make predictions from it on test/training data
python
prediction = model.predict_on_test_data(data=data)
The model object returned from initiating AI4Water's Model is same as that of Keras' Model
We can verify it by checking its type
python
import tensorflow as tf
isinstance(model, tf.keras.Model) # True
Using your own pre-processed data
You can use your own pre-processed data without using any of pre-processing tools of AI4Water. You will need to provide
input output paris to data argument to fit and/or predict methods.
```python
import numpy as np
from ai4water import Model # import any of the above model
from ai4water.models import LSTM
batch_size = 16 lookback = 15 inputs = ['dummy1', 'dummy2', 'dummy3', 'dummy4', 'dummy5'] # just dummy names for plotting and saving results. outputs=['DummyTarget']
model = Model( model = LSTM(units=64), batchsize=batchsize, tsargs={'lookback':lookback}, inputfeatures=inputs, outputfeatures=outputs, lr=0.001 ) x = np.random.random((batchsize10, lookback, len(inputs))) y = np.random.random((batch_size10, len(outputs)))
model.fit(x=x,y=y)
```
using for scikit-learn/xgboost/lgbm/catboost based models
The repository can also be used for machine learning based models such as scikit-learn/xgboost based models for both
classification and regression problems by making use of model keyword arguments in Model function.
However, integration of ML based models is not complete yet.
```python
from ai4water import Model
from ai4water.datasets import busan_beach
data = busan_beach() # path for data file
model = Model( # columns in data to be used as input inputfeatures=['tidecm', 'wattempc', 'salpsu', 'relhum', 'pcpmm'], outputfeatures = ['tetxcoppml'], # columns in data file to be used as input seed=1872, valfraction=0.0, split_random=True, # any regressor from https://scikit-learn.org/stable/modules/classes.html model={"RandomForestRegressor": {}}, # set any of regressor's parameters. e.g. for RandomForestRegressor above used, # some of the paramters are https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor )
history = model.fit(data=data)
model.predictontest_data(data=data) ```
Hyperparameter optimization
For hyperparameter optimization, replace the actual values of hyperparameters with the space. ```python
from ai4water.functional import Model from ai4water.datasets import MtropicsLaos from ai4water.hyperopt import Real, Integer
data = MtropicsLaos().makeregression(lookbacksteps=1)
model = Model( model = {"RandomForestRegressor": { "nestimators": Integer(low=5, high=30, name='nestimators', numsamples=10), "maxleafnodes": Integer(low=2, high=30, prior='log', name='maxleafnodes', numsamples=10), "minweightfractionleaf": Real(low=0.0, high=0.5, name='minweightfractionleaf', numsamples=10), "maxdepth": Integer(low=2, high=10, name='maxdepth', numsamples=10), "minsamplessplit": Integer(low=2, high=10, name='minsamplessplit', numsamples=10), "minsamplesleaf": Integer(low=1, high=5, name='minsamplesleaf', numsamples=10), }}, inputfeatures=data.columns.tolist()[0:-1], outputfeatures=data.columns.tolist()[-1:], crossvalidator = {"KFold": {"nsplits": 5}}, xtransformation="zscore", ytransformation="log", )
First check the performance on test data with default parameters
model.fitonalltrainingdata(data=data) print(model.evaluateontestdata(data=data, metrics=["r2score", "r2"]))
optimize the hyperparameters
optimizer = model.optimizehyperparameters(
algorithm = "bayes", # you can choose between random, grid or tpe
data=data,
numiterations=60,
)
Now check the performance on test data with default parameters
print(model.evaluateontestdata(data=data, metrics=["r2score", "r2"])) ```
Running the above code will optimize the hyperparameters and generate following figures
Experiments
The experiments module is for comparison of multiple models on a single data or for comparison of one model under different conditions.
```python from ai4water.datasets import busan_beach from ai4water.experiments import MLRegressionExperiments
data = busan_beach()
comparisons = MLRegressionExperiments( inputfeatures=data.columns.tolist()[0:-1], outputfeatures=data.columns.tolist()[-1:], split_random=True )
train all the available machine learning models
comparisons.fit(data=data)
Compare R2 of models
bestmodels = comparisons.compareerrors( 'r2', data=data, cutofftype='greater', cutoffval=0.1, figsize=(8, 9), colors=['salmon', 'cadetblue'] )
Compare model performance using Taylor diagram
_ = comparisons.taylorplot( data=data, figsize=(5, 9), exclude=["DummyRegressor", "XGBRFRegressor", "SGDRegressor", "KernelRidge", "PoissonRegressor"], legkws={'facecolor': 'white', 'edgecolor': 'black','bboxtoanchor':(2.0, 0.9), 'fontsize': 10, 'labelspacing': 1.0, 'ncol': 2 }, ) ```
For more comprehensive and detailed examples see
Disclaimer
The library is still under development. Fundamental changes are expected without prior notice or without regard of backward compatability.
Related
sktime: A Unified Interface for Machine Learning with Time Series
Seglearn: A Python Package for Learning Sequences and Time Series
Pastas: Open Source Software for the Analysis of Groundwater Time Series
Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh -- A Python package)
pyts: A Python Package for Time Series Classification
Tslearn, A Machine Learning Toolkit for Time Series Data
TSFEL: Time Series Feature Extraction Library
pyunicorn (Unified Complex Network and RecurreNce analysis toolbox
TSFuse Python package for automatically constructing features from multi-view time series data
tsai - A state-of-the-art deep learning library for time series and sequential data
Owner
- Name: Ather Abbas
- Login: AtrCheema
- Kind: user
- Location: South Korea
- Company: Environmental Modeling and Monitoring Lab, UNIST
- Repositories: 7
- Profile: https://github.com/AtrCheema
GitHub Events
Total
- Watch event: 5
- Push event: 2
Last Year
- Watch event: 5
- Push event: 2
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 1,546
- Total Committers: 4
- Avg Commits per committer: 386.5
- Development Distribution Score (DDS): 0.061
Top Committers
| Name | Commits | |
|---|---|---|
| AtrCheema | a****6@y****m | 1,451 |
| Sara Iftikhar | s****k@g****m | 92 |
| kwon3969 | k****9@g****m | 2 |
| eggworld | e****2@g****m | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 35
- Average time to close issues: 7 months
- Average time to close pull requests: 6 days
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.23
- Merged pull requests: 28
- Bot issues: 0
- Bot pull requests: 7
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jmp75 (2)
- Zahid600 (1)
- binxiaoxiaobin (1)
Pull Request Authors
- AtrCheema (27)
- dependabot[bot] (7)
- Sara-Iftikhar (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 41 last-month
- Total dependent packages: 0
- Total dependent repositories: 4
- Total versions: 14
- Total maintainers: 1
pypi.org: ai4water
Platform for developing data driven based models for sequential/tabular data
- Homepage: https://github.com/AtrCheema/AI4Water
- Documentation: https://ai4water.readthedocs.io/
- License: mit
-
Latest release: 1.6
published about 3 years ago
Rankings
Maintainers (1)
Dependencies
- ai4water *
- catboost *
- lightgbm *
- optuna *
- seaborn *
- sphinx-gallery *
- tensorflow ==2.7
- xgboost *
- SeqMetrics *
- catboost *
- easy_mpl *
- keras-tcn *
- lightgbm *
- matplotlib *
- numpy *
- optuna *
- pandas *
- plotly *
- pyshp *
- scikit-learn *
- scikit-optimize *
- scipy *
- seaborn *
- shapely *
- sphinx *
- sphinx-gallery *
- sphinx-prompt *
- sphinx_copybutton *
- sphinx_issues *
- sphinx_rtd_theme *
- sphinx_toggleprompt *
- tensorflow ==2.7
- torch *
- xgboost *
- SeqMetrics >=1.3.3
- easy_mpl >=0.20.4
- joblib *
- matplotlib *
- numpy *
- pandas *
- requests *
- scikit-learn *
- SHAP *
- SeqMetrics >=1.3.3
- catboost *
- dill *
- easy_mpl >=0.20.4
- h5py <2.11.0
- hyperopt *
- imageio *
- joblib *
- lightgbm *
- matplotlib *
- numpy >=1.16.5
- openpyxl *
- optuna *
- pandas *
- plotly *
- psutil *
- pyshp *
- scikit-learn >=0.22
- scikit-optimize >=0.8.1
- seaborn *
- tensorflow *
- tpot *
- wandb *
- wrapt *
- xarray *
- xgboost *
- marvinpinto/action-automatic-releases latest composite
- actions/checkout v2 composite
- actions/setup-python v2 composite