tfaip - a Generic and Powerful Research Framework for Deep Learning based on Tensorflow

tfaip - a Generic and Powerful Research Framework for Deep Learning based on Tensorflow - Published in JOSS (2021)

https://github.com/planet-ai-gmbh/tfaip

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 78% confidence
Last synced: 4 months ago · JSON representation

Repository

Python-based research framework for developing, organizing, and deploying Deep Learning models powered by Tensorflow.

Basic Info
  • Host: GitHub
  • Owner: Planet-AI-GmbH
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage: https://tfaip.readthedocs.io
  • Size: 2.77 MB
Statistics
  • Stars: 12
  • Watchers: 3
  • Forks: 3
  • Open Issues: 0
  • Releases: 10
Archived
Created about 5 years ago · Last pushed over 3 years ago
Metadata Files
Readme License

README.md

Python Test Python Test

tfaip - A Generic and Powerful Research Framework for Deep Learning based on Tensorflow

tfaip is a Python-based research framework for developing, organizing, and deploying Deep Learning models powered by Tensorflow. It enables to implement both simple and complex scenarios that are structured and highly configurable by parameters that can directly be modified by the command line (read the docs). For example, the tutorial.full-scenario for learning MNIST allows to modify the graph during training but also other hyper-parameters such as the optimizer: ```bash export PYTHONPATH=$PWD # set the PYTHONPATH so that the examples dir is found

Change the graph

tfaip-train examples.tutorial.full --model.graph MLP --model.graph.nodes 200 100 50 --model.graph.activation relu tfaip-train examples.tutorial.full --model.graph MLP --model.graph.nodes 200 100 50 --model.graph.activation tanh tfaip-train examples.tutorial.full --model.graph CNN --model.graph.filters 40 20 --model.graph.dense 100

Change the optimizer

tfaip-train examples.tutorial.full --trainer.optimizer RMSprop --trainer.optimizer.beta1 0.01 --trainer.optimizer.clipglobalnorm 1

...

```

A trained model can then easily be integrated in a workflow to predict provided data: python predictor = TutorialScenario.create_predictor("PATH_TO_TRAINED_MODEL", PredictorParams()) for sample in predictor.predict(data): print(sample.outputs)

In practice, tfaip follows the rules of object orientation, i.e., the code for a scenario (e.g., image-classification (MNIST), text recognition, NLP, etc.) is organized by implementing classes. By default, each Scenario must implement Model, and Data. See here for the complete code to run the upper example for MNIST and see here for the minimal setup.

Setup

To setup tfaip create a virtual Python (at least 3.7) environment and install the tfaip pip package: pip install tfaip: bash virtualenv -p python3 venv source venv/bin/activate pip install tfaip pip install tfaip[devel] # to install additional development/test requirements Have a look at the wiki for further setup instructions.

Run the Tutorial

After the setup succeeded, launch a training of the tutorial which is an implementation of the common MNIST scenario: ```bash export PYTHONPATH=$PWD # set the PYTHONPATH so that the examples dir is found tfaip-train examples.tutorial.full

If you have a GPU, select it by specifying its ID

tfaip-train examples.tutorial.full --device.gpus 0 ```

Next Steps

Start reading the Minimum Tutorial, optionally have a look at the Full Tutorial to see more features. The docs provides a full description of tfaip.

To set up a new custom scenario, copy the general template and implement the abstract methods. Consider renaming the classes! Launch the training by providing the path or package-name of the new scenario which must be located in the PYTHONPATH!

Features of tfaip

tfaip provides different features which allow designing generic scenarios with maximum flexibility and high performance.

Code design

  • Fully Object-Oriented: Implement classes and abstract functions or overwrite any function to extend, adapt, or modify its default functionality.
  • Typing support: tfaip is fully typed with simplifies working with an IDE (e.g., use PyCharm!).
  • Using pythons dataclasses module to set up parameters which are automatically converted to parameters of the command line by our paiargparse package.

Data-Pipeline

Every scenario requires the setup of a data-pipeline to read and transform data. tfaip offers to easily implement and modify even complex pipelines by defining multiple DataProcessors which usually implement a small operation to map an input sample to an output sample. E.g., one DataProcessor loads the data (input=filename, output=image), another one applies normalization rules, again another one applies data augmentation, etc. The great advantage of this setup is that the data processors run in Python and can automatically be parallelized by tfaip for speed up by setting run_parallel=True.

Deep-Learning-Features

Since tfaip is based on Tensorflow the full API are available for designing models, graphs, and even data pipelines. Furthermore, tfaip supports additional common techniques for improving the performance of a Deep-Learning model out of the box:

  • Warm-starting (i.e., loading a pretrained model)
  • EMA-weights
  • Early-Stopping
  • Weight-Decay
  • various optimizers and learning-rate schedules

Contributing

We highly encourage users to contribute own scenarios and improvements of tfaip. Please read the contribution guidelines.

Benchmarks

All timings were obtained on a Intel Core i7, 10th Gen CPU.

MNIST

The following Table compares the MNIST Tutorial of Keras to the Minimum Tutorial. The keras code was adopted to use the same network architecture and hyperparemter settings (batch size of 16, 10 epochs of training).

Code | Time Per Epoch | Train Acc | Val Acc | Best Val Acc :---- | --------------: | ---------: | -------: | ------------: Keras | 16 s | 99.65% | 98.24% | 98.60% tfaip | 18 s | 99.76% | 98.66% | 98.66%

tfaip and Keras result in comparable accuracies, as to be expected since the actual code for training the graph is fundamentally identical. tfaip is however a bit slower due some overhead in the input pipeline and additional functionality (e.g., benchmarks, or automatic tracking of the best model). This overhead is negligible for almost any real-world scenario because due to a clearly larger network architecture, the computation times for inference and backpropagation become the bottleneck.

Data Pipeline

Integrating pure-python operations (e.g., numpy) into a tf.data.Datasetto apply high-level preprocessing is slow by default since tf.data.Dataset.map in cooperation with tf.py_function does not run in parallel and is therefore blocked by Python's GIL. tfaip curcumvents this issue by providing an (optional) parallelizable input pipeline. The following table shows the time in seconds for two different tasks:

  • PYTHON: applying some pure python functions on the data
  • NUMPY: applying several numpy operations on the data

| Mode | Task | Threads 1 | Threads 2 | Threads 4 | Threads 6 | |:---------------------|:--------------|--------------------:|--------------------:|--------------------:|--------------------:| | tf.pyfunction | PYTHON | 23.47| 22.78 | 24.38 | 25.76 | | _tfaip | PYTHON | 26.68| 14.48 | 8.11 | 8.13 | | tf.pyfunction | NUMPY | 104.10 | 82.78 | 76.33 | 77.56 | | _tfaip | NUMPY | 97.07 | 56.93 | 43.78 | 42.73 |

The PYTHON task clearly shows that tf.data.Dataset.map is not able to utilize multiple threads. The speed-up in the NUMPY tasks occurs possibly due to paralization in the numpy API to C.

Owner

  • Name: Planet AI GmbH
  • Login: Planet-AI-GmbH
  • Kind: organization
  • Location: Rostock

Creating truly intelligent, cognitive systems

JOSS Publication

tfaip - a Generic and Powerful Research Framework for Deep Learning based on Tensorflow
Published
June 22, 2021
Volume 6, Issue 62, Page 3297
Authors
Christoph Wick ORCID
Planet AI GmbH, Warnowufer 60, 18059 Rostock, Germany
Benjamin Kühn
Planet AI GmbH, Warnowufer 60, 18059 Rostock, Germany
Gundram Leifert
Planet AI GmbH, Warnowufer 60, 18059 Rostock, Germany
Konrad Sperfeld
Institute of Mathematics, University of Rostock, 18051 Rostock, Germany
Tobias Strauß
Planet AI GmbH, Warnowufer 60, 18059 Rostock, Germany
Jochen Zöllner ORCID
Planet AI GmbH, Warnowufer 60, 18059 Rostock, Germany, Institute of Mathematics, University of Rostock, 18051 Rostock, Germany
Tobias Grüning ORCID
Planet AI GmbH, Warnowufer 60, 18059 Rostock, Germany
Editor
Arfon Smith ORCID
Tags
Deep Learning Tensorflow Keras Research High-Level Framework Generic

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 72
  • Total Committers: 4
  • Avg Commits per committer: 18.0
  • Development Distribution Score (DDS): 0.194
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
C. Wick w****r@g****m 58
jochen j****r@p****e 9
planetai-gmbh 8****h 3
Arfon Smith a****n 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 8
  • Total pull requests: 4
  • Average time to close issues: 6 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 3.38
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • bertsky (3)
  • andbue (3)
  • Het-Shah (1)
  • swiftRetreat (1)
Pull Request Authors
  • arfon (2)
  • JochenZoellner (1)
  • p42ul (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

devel_requirements.txt pypi
  • black ==21.6b0 development
  • flake8 * development
  • pre-commit * development
  • pytest * development
  • pytest-timeout * development
  • pytest-xdist * development
docs/requirements.txt pypi
  • Sphinx-Substitution-Extensions *
  • myst-parser *
  • sphinx *
  • sphinx-rtd-theme *
examples/requirements.txt pypi
  • Levenshtein *
  • opencv-python-headless *
  • transformers *
requirements.txt pypi
  • GitPython *
  • adabelief-tf *
  • dataclasses-json *
  • imageio *
  • nptyping *
  • opencv-python-headless *
  • openpyxl *
  • paiargparse ==1.1.2
  • pandas *
  • pillow *
  • pooch ==1.4.0
  • prettytable *
  • python-Levenshtein *
  • scikit-image *
  • tensorflow >=2.4.0,<2.7.0
  • tensorflow_addons >=0.12.0
  • tqdm *
  • typeguard *
  • xlrd ==1.2.0