SeqTools

SeqTools: A python package for easy transformation, combination and evaluation of large datasets. - Published in JOSS (2018)

https://github.com/nlgranger/seqtools

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

delayedcall lazy-evaluation library machine-learning mapping on-the-fly pipeline preprocessing python

Scientific Fields

Mathematics Computer Science - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

A python library to manipulate and transform indexable data (lists, arrays, ...)

Basic Info
Statistics
  • Stars: 46
  • Watchers: 1
  • Forks: 4
  • Open Issues: 0
  • Releases: 8
Topics
delayedcall lazy-evaluation library machine-learning mapping on-the-fly pipeline preprocessing python
Created over 8 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License

README.rst

.. image:: https://badge.fury.io/py/SeqTools.svg
   :target: https://pypi.org/project/SeqTools
   :alt: PyPi package
.. image:: https://readthedocs.org/projects/seqtools-doc/badge
   :target: http://seqtools-doc.readthedocs.io
   :alt: Documentation

SeqTools
========

SeqTools extends the functionalities of itertools to indexable (list-like)
objects. Some of the provided functionalities include: element-wise function
mapping, reordering, reindexing, concatenation, joining, slicing, minibatching,
`etc `_.

SeqTools functions implement **on-demand evaluation** under the hood:
operations and transformations are only applied to individual items when they
are actually accessed. A simple but powerful prefetch function is also provided
to eagerly evaluate elements in background threads or processes.

SeqTools originally targets data science, more precisely the data preprocessing
stages. Being aware of the experimental nature of this usage,
on-demand execution is made as transparent as possible by providing
**fault-tolerant functions and insightful error message**.

Example
-------

>>> def count_lines(filename):
...     with open(filename) as f:
...         return len(f.readlines())
>>>
>>> def count_words(filename):
...     with open(filename) as f:
...         return len(f.read().split())
>>>
>>> filenames = ["a.txt", "b.txt", "c.txt", "d.txt"]
>>> lc = seqtools.smap(count_lines, filenames)
>>> wc = seqtools.smap(count_words, filenames)
>>> counts = seqtools.collate([lc, wc])
>>> # no computations so far!
>>> lc[2]  # only evaluates on index 2
3
>>> counts[1]  # same for index 1
(1, 2)

Batteries included!
-------------------

The library comes with a set of functions to manipulate sequences:

.. |concatenate| image:: docs/_static/concatenate.svg

.. _concatenate: https://seqtools-doc.readthedocs.io/en/stable/reference.html#seqtools.concatenate

.. |batch| image:: docs/_static/batch.svg

.. _batch: https://seqtools-doc.readthedocs.io/en/stable/reference.html#seqtools.batch

.. |gather| image:: docs/_static/gather.svg

.. _gather: https://seqtools-doc.readthedocs.io/en/stable/reference.html#seqtools.gather

.. |prefetch| image:: docs/_static/prefetch.svg

.. _prefetch: https://seqtools-doc.readthedocs.io/en/stable/reference.html#seqtools.prefetch

.. |interleave| image:: docs/_static/interleave.svg

.. _interleave: https://seqtools-doc.readthedocs.io/en/stable/reference.html#seqtools.interleave

.. |uniter| image:: docs/_static/uniter.svg

.. _uniter: https://seqtools-doc.readthedocs.io/en/stable/reference.html#seqtools.uniter

+-------------------+---------------+
| `concatenate`_    | |concatenate| |
+-------------------+---------------+
| `batch`_          | |batch|       |
+-------------------+---------------+
| `gather`_         | |gather|      |
+-------------------+---------------+
| `prefetch`_       | |prefetch|    |
+-------------------+---------------+
| `interleave`_     | |interleave|  |
+-------------------+---------------+
| `uniter`_         | |uniter|      |
+-------------------+---------------+

and others (suggestions are also welcome).

Installation
------------

.. code-block:: bash

   pip install seqtools

Documentation
-------------

The documentation is hosted at `https://seqtools-doc.readthedocs.io
`_.

Contributing and Support
------------------------

Use the `issue tracker `_
to request features, propose improvements or report issues. For questions
regarding usage, please send an `email
`_.

Related libraries
-----------------

`Joblib `_, proposes low-level functions with
many optimization settings to optimize pipelined transformations. This library
notably provides advanced caching mechanisms which are not the primary concern
of SeqTool. SeqTool uses a simpler container-oriented interface with multiple
utility functions in order to assist fast prototyping. On-demand evaluation is
its default behaviour and applies at all layers of a transformation pipeline.
Eager evaluation of elements in SeqTools does not break the list-like interface
and can be used in the middle of a transformation pipeline.

SeqTools is conceived to connect nicely to the data loading pipeline of Machine
Learning libraries such as PyTorch's `torch.utils.data
`_ and `torchvision.transforms
`_ or Tensorflow's
`tf.data `_. The interface of these
libraries focuses on `iterators
`_ to access
transformed elements, contrary to SeqTools which also provides arbitrary reads
via indexing.

Owner

  • Name: Nicolas Granger
  • Login: nlgranger
  • Kind: user

JOSS Publication

SeqTools: A python package for easy transformation, combination and evaluation of large datasets.
Published
October 26, 2018
Volume 3, Issue 30, Page 1006
Authors
Nicolas Granger ORCID
Télécom SudParis
Mounîm A. El Yacoubi ORCID
Télécom SudParis
Editor
Pjotr Prins ORCID
Tags
pre-processing pipeline dataset machine learning lazy evaluation on-demand

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 114
  • Total Committers: 1
  • Avg Commits per committer: 114.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Nicolas Granger n****m@g****m 114

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 7
  • Total pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Total issue authors: 4
  • Total pull request authors: 0
  • Average comments per issue: 2.57
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Benjamin-Lee (3)
  • stefanik12 (2)
  • nlgranger (1)
  • pluiez (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 146 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 2
    (may contain duplicates)
  • Total versions: 51
  • Total maintainers: 1
proxy.golang.org: github.com/nlgranger/SeqTools
  • Versions: 19
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 4 months ago
proxy.golang.org: github.com/nlgranger/seqtools
  • Versions: 19
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 4 months ago
pypi.org: seqtools

A library for transparent transformation of indexable containers (lists, etc.)

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 146 Last month
Rankings
Downloads: 9.1%
Stargazers count: 9.9%
Dependent packages count: 10.1%
Average: 11.2%
Dependent repos count: 11.5%
Forks count: 15.4%
Maintainers (1)
Last synced: 4 months ago

Dependencies

docs/requirements.txt pypi
  • Pillow *
  • ipykernel *
  • ipython *
  • nbconvert *
  • nbsphinx *
  • numpy *
  • scikit-learn *
  • scipy *
setup.py pypi
  • tblib *
.github/workflows/release.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/upload-artifact v3 composite
  • pypa/cibuildwheel v2.14.1 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/tests.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • codecov/codecov-action v3 composite
  • omnilib/ufmt action-v1 composite
pyproject.toml pypi
  • tblib *