eotransform

Defines the protocol for transformations, to be used in a generic source to sink streaming concept, and provides some generic transformer implementations. Project badge

https://github.com/tuw-geo/eotransform

Science Score: 85.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
    Organization tuw-geo has institutional domain (geo.tuwien.ac.at)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Defines the protocol for transformations, to be used in a generic source to sink streaming concept, and provides some generic transformer implementations. Project badge

Basic Info
  • Host: GitHub
  • Owner: TUW-GEO
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 284 KB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 5
Created over 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

Coverage badge Documentation Status DOI

eotransform

Defines the basic transform protocol to be used in the streamed source to sink concept. Also provides some generic transformer implementations such as Compose or Result.

What can I use eotransform for?

The eotransform package defines Source, Transform, and Sink protocols, to facilitate the creation of modularised processing pipelines. Adhering to a common contract, makes it easier to mix and match processing blocks, allowing for better code reusage, and more flexible pipelines. We also provide a streamed_process function, which you can use for I/O hiding when implementing these protocols. The package also provides some common transformations, and sinks like Compose or Result.

Getting Started

Installation

bash pip install eotransform

Examples

Transformer protocol

This example shows how to implement the Transformer protocol for a simple multiplication:

```py class Multiply(Transformer[int, int]): def init(self, factor: int): self.factor = factor

def __call__(self, x: int) -> int:
    return x * self.factor

``` snippet source | anchor <!-- endSnippet -->

Sink protocol

This code snippet illustrates how to implement the Sink protocol, using a simple accumulation example:

```py class AccumulatingSink(Sink[int]): def init(self): self.result = 0

def __call__(self, x: int) -> None:
    self.result += x

``` snippet source | anchor <!-- endSnippet -->

Streamed pipeline using the "Result" pattern

In the following example we show how to combine ApplyToOkResult and SinkUnwrapped to process data in a streamed fashion with proper error handling across thread boundaries.

```py def adatasource(): for i in range(4): if i == 1: yield Result.error(RuntimeError("A runtime error occured!")) else: yield Result.ok(i)

accumulated = AccumulatingSink() sink = SinkUnwrapped(accumulated, ignoreexceptions={RuntimeError}) with ThreadPoolExecutor(maxworkers=3) as ex: streamedprocess(adata_source(), ApplyToOkResult(Multiply(2)), sink, ex)

assert accumulated.result == 10 ``` snippet source | anchor <!-- endSnippet -->

Streaming

The following briefly describes the concept of streaming, and how it can be used to hide I/O processes.

The most straightforward way to process data is to first load it and then process it:

serial process

This has the advantage of being simple to implement and maintain, as you don't need to be concerned with issues of parallelism.

For many cases this will work sufficiently well, however, it can stall your processing pipeline because it needs to wait for data to be fetched. Often an easy way to increase throughput, is to interleave the I/O or data fetching with processing chunks:

streamed process

With this streaming process you can utilise resources more effectively.

Support & Documentation

Dependencies

eotransform requires Python 3.8 and has these dependencies:

cfg more-itertools snippet source | anchor <!-- endSnippet -->

Citation

If you find this repository useful, please consider giving it a star or a citation: bibtex @software{raml_bernhard_2023_8002789, author = {Raml, Bernhard}, title = {eotransform}, month = jun, year = 2023, publisher = {Zenodo}, version = {1.8.2}, doi = {10.5281/zenodo.8002789}, url = {https://doi.org/10.5281/zenodo.8002789} }

Owner

  • Name: TU Wien - Department of Geodesy and Geoinformation
  • Login: TUW-GEO
  • Kind: organization
  • Location: Vienna, Austria

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: eotransform
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Bernhard
    family-names: Raml
    email: bernhard.raml@geo.tuwien.ac.at
    affiliation: TU Wien
    orcid: 'https://orcid.org/0000-0002-5357-0344'
identifiers:
  - type: doi
    value: 10.5281/zenodo.8002714
    description: URL of version 1.8.2
repository-code: 'https://github.com/TUW-GEO/eotransform'
url: 'https://eotransform.readthedocs.io/'
abstract: >-
  Defines the basic transform protocol to be used in the
  streamed source to sink concept. Also provides some
  generic  transformer implementations such as Compose or
  Result.
keywords:
  - earth obersvation
  - streaming
  - data pipeline
license: MIT
commit: d40c130
version: 1.8.2
date-released: '2023-06-04'

GitHub Events

Total
Last Year

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 83
  • Total Committers: 3
  • Avg Commits per committer: 27.667
  • Development Distribution Score (DDS): 0.084
Past Year
  • Commits: 55
  • Committers: 3
  • Avg Commits per committer: 18.333
  • Development Distribution Score (DDS): 0.127
Top Committers
Name Email Commits
Bernhard Raml b****l@g****t 76
GitHub Action a****n@g****m 6
Bernhard Raml r****d@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 307 last-month
  • Total dependent packages: 2
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
pypi.org: eotransform

Protocol definition for streamed source/transform/sink process

  • Versions: 4
  • Dependent Packages: 2
  • Dependent Repositories: 0
  • Downloads: 307 Last month
Rankings
Dependent packages count: 2.9%
Downloads: 10.3%
Average: 22.7%
Forks count: 30.5%
Dependent repos count: 30.6%
Stargazers count: 39.1%
Maintainers (1)
Last synced: 7 months ago