targetran

Python library for data augmentation in object detection or image classification model training

https://github.com/bhky/targetran

Keywords

data-augmentation image-classification machine-learning object-detection pytorch tensorflow2

Last synced: 9 months ago · JSON representation ·

Repository

Python library for data augmentation in object detection or image classification model training

Basic Info

Host: GitHub
Owner: bhky
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 2.75 MB

Statistics

Stars: 20
Watchers: 5
Forks: 2
Open Issues: 0
Releases: 29

Topics

data-augmentation image-classification machine-learning object-detection pytorch tensorflow2

Created almost 5 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

Motivation

Data augmentation is a technique commonly used for training machine learning models in the computer vision field, where one can increase the amount of image data by creating transformed copies of the original images.

In the object detection sub-field, the transformation has to be done also to the target rectangular bounding-boxes. However, such functionality is not readily available in frameworks such as TensorFlow and PyTorch.

While there are other powerful augmentation tools available, many of those do not work well with the TPU when accessing from Google Colab or Kaggle Notebooks, which are popular options nowadays for a lot of people who do not have their own hardware resources.

Here comes Targetran to fill the gap.

What is Targetran?

A light-weight data augmentation library to assist object detection or image classification model training.
Has simple Python API to transform both the images and the target rectangular bounding-boxes.
Use dataset-idiomatic approach for TensorFlow and PyTorch.
Can be used with the TPU for acceleration (TensorFlow Dataset only).

example

(Figure produced by the example code here.)

Installation

Tested for Python 3.9, 3.10, and 3.11.

The best way to install Targetran with its dependencies is from PyPI: shell python3 -m pip install --upgrade targetran Alternatively, to obtain the latest version from this repository: shell git clone https://github.com/bhky/targetran.git cd targetran python3 -m pip install .

Usage

Notations

NDFloatArray: NumPy float array type. The values are converted to np.float32 internally.
tf.Tensor: General TensorFlow Tensor type. The values are converted to tf.float32 internally.

Data format

For object detection model training, which is the primary usage here, the following data are needed. - image_seq (Sequence of NDFloatArray or tf.Tensor of shape (height, width, num_channels)): - images in channel-last format; - image sizes can be different. - bboxes_seq (Sequence of NDFloatArray or tf.Tensor of shape (num_bboxes_per_image, 4)): - each bboxes array/tensor provides the bounding-boxes associated with an image; - each single bounding-box is given as [top_left_x, top_left_y, bbox_width, bbox_height]; - empty array/tensor means no bounding-boxes (and labels) for that image. - labels_seq (Sequence of NDFloatArray or tf.Tensor of shape (num_bboxes_per_image,)): - each labels array/tensor provides the bounding-box labels associated with an image; - empty array/tensor means no labels (and bounding-boxes) for that image.

Some dummy data are created below for illustration. Please note the required format. ```python import numpy as np

Each image could have different sizes, but they must follow the channel-last format,

i.e., (height, width, num_channels).

image_seq = [np.random.rand(480, 512, 3) for _ in range(3)]

The bounding-boxes (bboxes) are given as a sequence of NumPy arrays (or TF tensors).

Each array represents the bboxes for one corresponding image.

Each bbox is given as [topleftx, toplefty, bboxwidth, bboxheight].

In case an image has no bboxes, an empty array should be provided.

bboxes_seq = [ np.array([ # Image with 2 bboxes. [214, 223, 10, 11], [345, 230, 21, 9], ]), np.array([]), # Empty array for image with no bboxes. np.array([ # Image with 3 bboxes. [104, 151, 22, 10], [99, 132, 20, 15], [340, 220, 31, 12], ]), ]

Labels for the bboxes are also given as a sequence of NumPy arrays (or TF tensors).

The number of bboxes and labels should match. An empty array indicates no bboxes/labels.

labels_seq = [ np.array([0, 1]), # 2 labels. np.array([]), # No labels. np.array([2, 3, 0]), # 3 labels. ]

During operation, all the data values will be converted to float32.

```

Design principles

Bounding-boxes will always be rectangular with sides parallel to the image frame.
After transformation, each resulting bounding-box is determined by the smallest rectangle (with sides parallel to the image frame) enclosing the original transformed bounding-box.
After transformation, resulting bounding-boxes with their centroids outside the image frame will be removed, together with the corresponding labels.

TensorFlow Dataset

```python import tensorflow as tf

from targetran.tf import ( totfdataset, TFCombineAffine, TFRandomFlipLeftRight, TFRandomFlipUpDown, TFRandomRotate, TFRandomShear, TFRandomTranslate, TFRandomCrop, TFResize, )

Convert the above data sequences into a TensorFlow Dataset.

Users can have their own way to create the Dataset, as long as for each iteration

it returns a tuple of tensors for a single example: (image, bboxes, labels).

ds = totfdataset(imageseq, bboxesseq, labels_seq)

Alternatively, users can provide a sequence of image paths instead of image tensors/arrays,

and set `image_seq_is_paths=True`. In that case, the actual image loading will be done during

the dataset operation (i.e., lazy-loading). This is useful when dealing with huge data.

ds = totfdataset(imagepaths, bboxesseq, labelsseq, imageseqispaths=True)

The affine transformations can be combined into one operation for better performance.

Note that cropping and resizing are not affine and cannot be combined.

Option (1):

affine_transform = TFCombineAffine( [TFRandomRotate(probability=0.8), # Probability to include each affine transformation step TFRandomShear(probability=0.6), # can be specified, otherwise the default value is used. TFRandomTranslate(), # Thus, the number of selected steps could vary. TFRandomFlipLeftRight(), TFRandomFlipUpDown()], probability=1.0 # Probability to apply this single combined transformation. )

Option (2):

Alternatively, one can decide the exact number of randomly selected transformations,

e.g., use only any two of them. This could be a better option because too many

transformation steps may deform the images too much.

affinetransform = TFCombineAffine( [TFRandomRotate(), # Individual probability has no effect in this approach. TFRandomShear(), TFRandomTranslate(), TFRandomFlipLeftRight(), TFRandomFlipUpDown()], numselectedtransforms=2, # Only two steps from the list will be selected. selectedprobabilities=[0.5, 0.0, 0.3, 0.2, 0.0], # Must sum up to 1.0, if given. keep_order=True, # If True, the selected steps must be performed in the given order. probability=1.0 # Probability to apply this single combined transformation. )

Please refer to the API manual for more parameter options.

Apply transformations.

autotune = tf.data.AUTOTUNE ds = ( ds .map(TFRandomCrop(probability=0.5), numparallelcalls=autotune) .map(affinetransform, numparallelcalls=autotune) .map(TFResize((256, 256)), numparallelcalls=auto_tune) )

In the Dataset `map` call, the parameter `num_parallel_calls` can be set to,

e.g., tf.data.AUTOTUNE, for better performance. See docs for TensorFlow Dataset.

python

Batching:

Since the array/tensor shape of each example could be different, conventional

way of batching may not work. Users will have to consider their own use cases.

One possibly useful way is the padded-batch.

ds = ds.paddedbatch(batchsize=2, padding_values=-1.0) ```

Using with KerasCV

The KerasCV API is a little bit confusing in terms of its input data format. The requirement is different between a preprocessing layer and a model.

Targetran provides easy conversion tools to make the process smoother. ```python import kerascv from targetran.tf import tokerascvdict, tokerascvmodelinput

Let's assume `ds` contains Targetran ops as in the above illustration, without batching.

To map the outputs to a KerasCV preprocessing layer, the following can be done.

ds = tokerascvdict(ds, batchsize=2)

The resulting dataset yields batches readily to be passed to a KerasCV preprocessing layer.

Batching in the appropriate format will be included, therefore the `padded_batch` example

is not relevant here.

Assume the user would like to add a jittered-resize op.

jitteredresize = kerascv.layers.JitteredResize( targetsize=(640, 640), scalefactor=(0.8, 1.25), boundingboxformat="xywh", )

ds = ds.map(jittered_resize) # Other KerasCV preprocessing layers can be added subsequently.

When the data is about to be passed to a KerasCV `model.fit`, the following can be done.

ds = tokerascvmodelinput(ds) ```

PyTorch Dataset

```python from typing import Optional, Sequence, Tuple

import numpy.typing from torch.utils.data import Dataset

from targetran.np import ( CombineAffine, RandomFlipLeftRight, RandomFlipUpDown, RandomRotate, RandomShear, RandomTranslate, RandomCrop, Resize, ) from targetran.utils import Compose

NDFloatArray = numpy.typing.NDArray[numpy.float_]

class PTDataset(Dataset): """ A very simple PyTorch Dataset. As per common practice, transforms are done on NumPy arrays. """

def __init__(
        self,
        image_seq: Sequence[NDFloatArray],
        bboxes_seq: Sequence[NDFloatArray],
        labels_seq: Sequence[NDFloatArray],
        transforms: Optional[Compose]
) -> None:
    # It is also possible to provide image paths instead of image arrays here,
    # and load the image in __getitem__. The details are skipped in this example.
    self.image_seq = image_seq
    self.bboxes_seq = bboxes_seq
    self.labels_seq = labels_seq
    self.transforms = transforms

def __len__(self) -> int:
    return len(self.image_seq)

def __getitem__(
        self,
        idx: int
) -> Tuple[NDFloatArray, NDFloatArray, NDFloatArray]:
    if self.transforms:
        return self.transforms(
            self.image_seq[idx],
            self.bboxes_seq[idx],
            self.labels_seq[idx]
        )
    return (
        self.image_seq[idx],
        self.bboxes_seq[idx],
        self.labels_seq[idx]
    )

The affine transformations can be combined into one operation for better performance.

Note that cropping and resizing are not affine and cannot be combined.

Option (1):

affine_transform = CombineAffine( [RandomRotate(probability=0.8), # Probability to include each affine transformation step RandomShear(probability=0.6), # can be specified, otherwise the default value is used. RandomTranslate(), # Thus, the number of selected steps could vary. RandomFlipLeftRight(), RandomFlipUpDown()], probability=1.0 # Probability to apply this single combined transformation. )

Option (2):

Alternatively, one can decide the exact number of randomly selected transformations,

e.g., use only any two of them. This could be a better option because too many

transformation steps may deform the images too much.

affinetransform = CombineAffine( [RandomRotate(), # Individual probability has no effect in this approach. RandomShear(), RandomTranslate(), RandomFlipLeftRight(), RandomFlipUpDown()], numselectedtransforms=2, # Only two steps from the list will be selected. selectedprobabilities=[0.5, 0.0, 0.3, 0.2, 0.0], # Must sum up to 1.0, if given. keep_order=True, # If True, the selected steps must be performed in the given order. probability=1.0 # Probability to apply this single combined transformation. )

Please refer to the API manual for more parameter options.

The `Compose` here is similar to that from the torchvision package, except

that here it also supports callables with multiple inputs and outputs needed

for objection detection tasks, i.e., (image, bboxes, labels).

transforms = Compose([ RandomCrop(probability=0.5), affine_transform, Resize((256, 256)), ])

Convert the above data sequences into a PyTorch Dataset.

Users can have their own way to create the Dataset, as long as for each iteration

it returns a tuple of arrays for a single example: (image, bboxes, labels).

ds = PTDataset(imageseq, bboxesseq, labels_seq, transforms=transforms) python

Batching:

In PyTorch, it is common to use a Dataset with a DataLoader, which provides

batching functionality. However, since the array/tensor shape of each example

could be different, the default batching may not work. Targetran provides

a `collate_fn` that helps producing batches of (imageseq, bboxesseq, labels_seq).

from torch.utils.data import DataLoader from targetran.utils import collate_fn

dataloader = DataLoader(ds, batchsize=2, collatefn=collatefn) ```

Image classification

While the tools here are primarily designed for object detection tasks, they can also be used for image classification in which only the images are to be transformed, e.g., given a dataset that returns (image, label) examples, or even only image examples. The image_only function can be used to convert a transformation class for this purpose.

If the dataset returns a tuple (image, ...) in each iteration, only the image will be transformed, other parameters that followed such as (..., label, weight) will be returned untouched.

If the dataset returns image only (not a tuple), then only the transformed image will be returned. python from targetran.utils import image_only ```python

TensorFlow.

ds = totfdataset(image_seq)

ds = ( ds .map(imageonly(TFRandomCrop())) .map(imageonly(affinetransform)) .map(imageonly(TFResize((256, 256)))) .batch(32) # Conventional batching can be used for classification setup. ) python

PyTorch.

transforms = Compose([ imageonly(RandomCrop()), imageonly(affinetransform), imageonly(Resize((256, 256))), ]) ds = PTDataset(..., transforms=transforms) dataloader = DataLoader(ds, batchsize=32) ```

Examples

Code examples in this repository
Construct a TensorFlow Dataset with Targetran and object detection data (Kaggle Notebook)
Image classification with TensorFlow and Targetran on TPU (Kaggle Notebook)

API

See here for API details.

Owner

Name: Bosco Yung
Login: bhky
Kind: user

Repositories: 3
Profile: https://github.com/bhky

Machine Learning Engineer

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using the following metadata."
title: "Targetran"
version: "0.12.1"
url: "https://github.com/bhky/targetran"
license: "MIT"
authors:
  - family-names: "Yung"
    given-names: "Bosco"
    orcid: "https://orcid.org/0000-0002-3776-1589"
date-released: "2023-12-05"

GitHub Events

Total

Release event: 1
Delete event: 1
Push event: 17
Pull request event: 2
Create event: 2

Last Year

Release event: 1
Delete event: 1
Push event: 17
Pull request event: 2
Create event: 2

Committers

Last synced: over 1 year ago

All Time

Total Commits: 614
Total Committers: 1
Avg Commits per committer: 614.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 13
Committers: 1
Avg Commits per committer: 13.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Bosco Yung	1****y	614

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 0
Total pull requests: 13
Average time to close issues: N/A
Average time to close pull requests: about 23 hours
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 13
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: about 6 hours
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

bhky (16)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 130 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 65
Total maintainers: 1

pypi.org: targetran

Target transformation for data augmentation in objection detection

Homepage: https://github.com/bhky/targetran
Documentation: https://targetran.readthedocs.io/
License: MIT License
Latest release: 0.13.2
published over 1 year ago

Versions: 65
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 130 Last month

Rankings

Dependent packages count: 10.1%

Downloads: 13.9%

Stargazers count: 14.0%

Average: 15.7%

Forks count: 19.2%

Dependent repos count: 21.6%

Maintainers (1)

bhky

Last synced: 10 months ago

targetran

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Motivation

What is Targetran?

Table of contents

Installation

Usage

Notations

Data format

Each image could have different sizes, but they must follow the channel-last format,

i.e., (height, width, num_channels).

The bounding-boxes (bboxes) are given as a sequence of NumPy arrays (or TF tensors).

Each array represents the bboxes for one corresponding image.

Each bbox is given as [topleftx, toplefty, bboxwidth, bboxheight].

In case an image has no bboxes, an empty array should be provided.

Labels for the bboxes are also given as a sequence of NumPy arrays (or TF tensors).

The number of bboxes and labels should match. An empty array indicates no bboxes/labels.

During operation, all the data values will be converted to float32.

Design principles

TensorFlow Dataset

Convert the above data sequences into a TensorFlow Dataset.

Users can have their own way to create the Dataset, as long as for each iteration

it returns a tuple of tensors for a single example: (image, bboxes, labels).

Alternatively, users can provide a sequence of image paths instead of image tensors/arrays,

and set image_seq_is_paths=True. In that case, the actual image loading will be done during

the dataset operation (i.e., lazy-loading). This is useful when dealing with huge data.

The affine transformations can be combined into one operation for better performance.

Note that cropping and resizing are not affine and cannot be combined.

Option (1):

Option (2):

Alternatively, one can decide the exact number of randomly selected transformations,

e.g., use only any two of them. This could be a better option because too many

transformation steps may deform the images too much.

Please refer to the API manual for more parameter options.

Apply transformations.

In the Dataset map call, the parameter num_parallel_calls can be set to,

e.g., tf.data.AUTOTUNE, for better performance. See docs for TensorFlow Dataset.

Batching:

Since the array/tensor shape of each example could be different, conventional

way of batching may not work. Users will have to consider their own use cases.

One possibly useful way is the padded-batch.

Using with KerasCV

Let's assume ds contains Targetran ops as in the above illustration, without batching.

To map the outputs to a KerasCV preprocessing layer, the following can be done.

The resulting dataset yields batches readily to be passed to a KerasCV preprocessing layer.

Batching in the appropriate format will be included, therefore the padded_batch example

is not relevant here.

Assume the user would like to add a jittered-resize op.

When the data is about to be passed to a KerasCV model.fit, the following can be done.

PyTorch Dataset

The affine transformations can be combined into one operation for better performance.

Note that cropping and resizing are not affine and cannot be combined.

Option (1):

Option (2):

Alternatively, one can decide the exact number of randomly selected transformations,

e.g., use only any two of them. This could be a better option because too many

transformation steps may deform the images too much.

Please refer to the API manual for more parameter options.

The Compose here is similar to that from the torchvision package, except

that here it also supports callables with multiple inputs and outputs needed

for objection detection tasks, i.e., (image, bboxes, labels).

Convert the above data sequences into a PyTorch Dataset.

Users can have their own way to create the Dataset, as long as for each iteration

it returns a tuple of arrays for a single example: (image, bboxes, labels).

Batching:

In PyTorch, it is common to use a Dataset with a DataLoader, which provides

batching functionality. However, since the array/tensor shape of each example

could be different, the default batching may not work. Targetran provides

a collate_fn that helps producing batches of (imageseq, bboxesseq, labels_seq).

Image classification

TensorFlow.

PyTorch.

Examples

and set `image_seq_is_paths=True`. In that case, the actual image loading will be done during

In the Dataset `map` call, the parameter `num_parallel_calls` can be set to,

Let's assume `ds` contains Targetran ops as in the above illustration, without batching.

Batching in the appropriate format will be included, therefore the `padded_batch` example

When the data is about to be passed to a KerasCV `model.fit`, the following can be done.

The `Compose` here is similar to that from the torchvision package, except

a `collate_fn` that helps producing batches of (imageseq, bboxesseq, labels_seq).