corelay
CoRelAy is a tool to compose small-scale (single-machine) analysis pipelines.
Science Score: 64.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
3 of 7 committers (42.9%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (19.5%) to scientific vocabulary
Keywords
Repository
CoRelAy is a tool to compose small-scale (single-machine) analysis pipelines.
Basic Info
Statistics
- Stars: 28
- Watchers: 3
- Forks: 2
- Open Issues: 0
- Releases: 4
Topics
Metadata Files
README.md
# Composing Relevance Analysis
[](https://github.com/virelay/corelay/blob/main/COPYING.LESSER)
[](https://github.com/virelay/corelay/actions/workflows/tests.yml)
[](https://corelay.readthedocs.io/en/latest)
[](https://github.com/virelay/corelay/releases/latest)
[](https://pypi.org/project/corelay/)
**CoRelAy** is a library designed for composing efficient, single-machine data analysis pipelines. It enables the rapid implementation of pipelines that can be used to analyze and process data. CoRelAy is primarily meant for the use in explainable artificial intelligence (XAI), often with the goal of producing output suitable for visualization in tools like [**ViRelAy**](https://github.com/virelay/virelay).
|
At the core of CoRelAy are pipelines (Pipeline), which consist of a series of tasks (Task). Each task is a modular unit that can be populated with operations (Processor) to perform specific data processing tasks. These operations, known as processors, can be customized by assigning new instances or modifying their default configurations.
Tasks in CoRelAy are highly flexible and can be tailored to meet the needs of your analysis pipeline. By leveraging a wide range of configurable processors with their respective parameters (Param), you can easily adapt and optimize your data processing workflow.
For more information about CoRelAy, getting started guides, in-depth tutorials, and API documentation, please refer to the documentation.
If you find CoRelAy useful for your research, why not cite our related paper:
bibtex
@article{anders2021software,
author = {Anders, Christopher J. and
Neumann, David and
Samek, Wojciech and
Müller, Klaus-Robert and
Lapuschkin, Sebastian},
title = {Software for Dataset-wide XAI: From Local Explanations to Global Insights with {Zennit}, {CoRelAy}, and {ViRelAy}},
year = {2021},
volume = {abs/2106.13200},
journal = {CoRR}
}
Features
- Pipeline Composition: CoRelAy allows you to compose pipelines of processors, which can be executed in parallel or sequentially.
- Task-based Design: Each step in the pipeline is represented as a task, which can be easily modified or replaced.
- Processor Library: CoRelAy comes with a library of built-in processors for common tasks, such as clustering, embedding, and dimensionality reduction.
- Memoization: CoRelAy supports memoization of intermediate results, allowing you to reuse previously computed results and speed up your analysis.
Getting Started
Installation
To get started, you first have to install CoRelAy on your system. The recommended and easiest way to install CoRelAy is to use pip, the Python package manager. You can install CoRelAy using the following command:
shell
$ pip install corelay
[!NOTE] CoRelAy depends on the
metrohash-pythonlibrary, which requires a C++ compiler to be installed. This may mean that you will have to install extra packages (GCC or Clang) for the installation to succeed. For example, on Fedora, you may have to install thegcc-c++package in order to make thec++command available, which can be done using the following command:
shell $ sudo dnf install gcc-c++
To install CoRelAy with optional HDBSCAN and UMAP support, use
shell
$ pip install corelay[umap,hdbscan]
Usage
Examples to highlight some features of CoRelAy can be found in docs/examples.
We mainly use HDF5 files to store results. If you wish to visualize your analysis results using ViRelAy, please have a look at the ViRelAy documentation to find out more about its database specification. An example to create HDF5 files which can be used with ViRelAy is shown in docs/examples/hdf5_structure.py.
To do a full SpRAy analysis which can be visualized with ViRelAy, an advanced script can be found in docs/examples/virelay_analysis.py.
The following shows the contents of docs/examples/memoize_spectral_pipeline.py:
```python """An example script, which uses memoization to store (intermediate) results."""
import time import typing from collections.abc import Sequence from typing import Annotated, SupportsIndex
import h5py import numpy
from corelay.base import Param from corelay.io.storage import HashedHDF5 from corelay.pipeline.spectral import SpectralClustering from corelay.processor.base import Processor from corelay.processor.clustering import KMeans from corelay.processor.embedding import TSNEEmbedding, EigenDecomposition from corelay.processor.flow import Sequential, Parallel
class Flatten(Processor):
"""Represents a :py:class:~corelay.processor.base.Processor, which flattens its input data."""
def function(self, data: typing.Any) -> typing.Any:
"""Applies the flattening to the input data.
Args:
data (typing.Any): The input data that is to be flattened.
Returns:
typing.Any: Returns the flattened data.
"""
input_data: numpy.ndarray[typing.Any, typing.Any] = data
input_data.sum()
return input_data.reshape(input_data.shape[0], numpy.prod(input_data.shape[1:]))
class SumChannel(Processor):
"""Represents a :py:class:~corelay.processor.base.Processor, which sums its input data across channels, i.e., its second axis."""
def function(self, data: typing.Any) -> typing.Any:
"""Applies the summation over the channels to the input data.
Args:
data (typing.Any): The input data that is to be summed over its channels.
Returns:
typing.Any: Returns the data that was summed up over its channels.
"""
input_data: numpy.ndarray[typing.Any, typing.Any] = data
return input_data.sum(axis=1)
class Normalize(Processor):
"""Represents a :py:class:~corelay.processor.base.Processor, which normalizes its input data."""
axes: Annotated[SupportsIndex | Sequence[SupportsIndex], Param((SupportsIndex, Sequence), (1, 2))]
"""A parameter of the :py:class:`~corelay.processor.base.Processor`, which determines the axis over which the data is to be normalized. Defaults
to the second and third axes.
"""
def function(self, data: typing.Any) -> typing.Any:
"""Normalizes the specified input data.
Args:
data (typing.Any): The input data that is to be normalized.
Returns:
typing.Any: Returns the normalized input data.
"""
input_data: numpy.ndarray[typing.Any, typing.Any] = data
return input_data / input_data.sum(self.axes, keepdims=True)
def main() -> None:
"""The entrypoint to the :py:mod:memoize_spectral_pipeline script."""
# Fixes the random seed for reproducibility
numpy.random.seed(0xDEADBEEF)
# Opens an HDF5 file in append mode for the storing the results of the analysis and the memoization of intermediate pipeline results
with h5py.File('test.analysis.h5', 'a') as analysis_file:
# Creates a HashedHDF5 IO object, which is an IO object that stores outputs of processors based on hashes in an HDF5 file
io_object = HashedHDF5(analysis_file.require_group('proc_data'))
# Generates some exemplary data
data = numpy.random.normal(size=(64, 3, 32, 32))
number_of_clusters = range(2, 20)
# Creates a SpectralClustering pipeline, which is one of the pre-defined built-in pipelines
pipeline = SpectralClustering(
# Processors, such as EigenDecomposition, can be assigned to pre-defined tasks
embedding=EigenDecomposition(n_eigval=8, io=io_object),
# Flow-based processors, such as Parallel, can combine multiple processors; broadcast=True copies the input as many times as there are
# processors; broadcast=False instead attempts to match each input to a processor
clustering=Parallel([
Parallel([
KMeans(n_clusters=k, io=io_object) for k in number_of_clusters
], broadcast=True),
# IO objects will be used during computation when supplied to processors, if a corresponding output value (here identified by hashes)
# already exists, the value is not computed again but instead loaded from the IO object
TSNEEmbedding(io=io_object)
], broadcast=True, is_output=True)
)
# Processors (and Params) can be updated by simply assigning corresponding attributes
pipeline.preprocessing = Sequential([
SumChannel(),
Normalize(),
Flatten()
])
# Processors flagged with "is_output=True" will be accumulated in the output; the output will be a tree of tuples, with the same hierarchy as
# the pipeline (i.e., _clusterings here contains a tuple of the k-means outputs)
start_time = time.perf_counter()
_clusterings, _tsne = pipeline(data)
# Since we memoize our results in an HDF5 file, subsequent calls will not compute the values (for the same inputs), but rather load them from
# the HDF5 file; try running the script multiple times
duration = time.perf_counter() - start_time
print(f'Pipeline execution time: {duration:.4f} seconds')
if name == 'main': main() ```
Contributing
If you would like to contribute, there are multiple ways you can help out. If you find a bug or have a feature request, please feel free to open an issue on GitHub. If you want to contribute code, please fork the repository and use a feature branch. Pull requests are always welcome. Before forking, please open an issue where you describe what you want to do. This helps to align your ideas with ours and may prevent you from doing work, that we are already planning on doing. If you have contributed to the project, please add yourself to the contributors list.
To help speed up the merging of your pull request, please comment and document your code extensively, try to emulate the coding style of the project, and update the documentation if necessary.
For more information on how to contribute, please refer to our contributor's guide.
License
CoRelAy is dual-licensed under the GNU General Public License Version 3 (GPL-3.0) or later, and the GNU Lesser General Public License Version 3 (LGPL-3.0) or later. For more information see the GPL-3.0 and LGPL-3.0 license files.
Owner
- Name: virelay
- Login: virelay
- Kind: organization
- Repositories: 2
- Profile: https://github.com/virelay
Citation (CITATION.cff)
cff-version: 1.2.0
title: >-
Software for Dataset-wide XAI: From Local Explanations to
Global Insights with Zennit, CoRelAy, and ViRelAy
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Christopher J.
family-names: Anders
orcid: 'https://orcid.org/0000-0003-3295-8486'
- given-names: David
family-names: Neumann
orcid: 'https://orcid.org/0000-0003-1907-8329'
- given-names: Wojciech
family-names: Samek
orcid: 'https://orcid.org/0000-0002-6283-3265'
- given-names: Klaus-Robert
family-names: Müller
orcid: 'https://orcid.org/0000-0002-3861-7685'
- given-names: Sebastian
family-names: Lapuschkin
orcid: 'https://orcid.org/0000-0002-0762-7258'
identifiers:
- type: doi
value: 10.48550/arXiv.2106.13200
description: arXiv Preprint
- type: url
value: 'https://arxiv.org/abs/2106.13200'
description: arXiv Preprint
repository-code: 'https://github.com/virelay/corelay.git'
url: 'https://corelay.readthedocs.io/en/latest/'
abstract: >-
Deep Neural Networks (DNNs) are known to be strong
predictors, but their prediction strategies can rarely be
understood. With recent advances in Explainable Artificial
Intelligence (XAI), approaches are available to explore
the reasoning behind those complex models' predictions.
Among post-hoc attribution methods, Layer-wise Relevance
Propagation (LRP) shows high performance. For deeper
quantitative analysis, manual approaches exist, but
without the right tools they are unnecessarily labor
intensive. In this software paper, we introduce three
software packages targeted at scientists to explore model
reasoning using attribution approaches and beyond: (1)
Zennit - a highly customizable and intuitive attribution
framework implementing LRP and related approaches in
PyTorch, (2) CoRelAy - a framework to easily and quickly
construct quantitative analysis pipelines for dataset-wide
analyses of explanations, and (3) ViRelAy - a
web-application to interactively explore data,
attributions, and analysis results. With this, we provide
a standardized implementation solution for XAI, to
contribute towards more reproducibility in our field.
keywords:
- Explainable Artificial Intelligence
- XAI
- Layer-Wise Relevance Propagation
- LRP
- Spectral Relevance Analysis
- SpRAy
- Zennit
- CoRelAy
- ViRelAy
license: GPL-3.0-or-later AND LGPL-3.0-or-later
GitHub Events
Total
- Create event: 21
- Release event: 1
- Issues event: 21
- Watch event: 1
- Delete event: 19
- Issue comment event: 1
- Member event: 1
- Push event: 82
- Pull request event: 35
Last Year
- Create event: 21
- Release event: 1
- Issues event: 21
- Watch event: 1
- Delete event: 19
- Issue comment event: 1
- Member event: 1
- Push event: 82
- Pull request event: 35
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 301
- Total Committers: 7
- Avg Commits per committer: 43.0
- Development Distribution Score (DDS): 0.336
Top Committers
| Name | Commits | |
|---|---|---|
| chrstphr | c****r@p****u | 200 |
| David Neumann | d****n@h****e | 43 |
| Talmaj Marinc | t****c@h****e | 39 |
| Sebastian Lapuschkin | s****n@h****e | 11 |
| Sebastian Lapuschkin | s****n@h****m | 6 |
| Pattarawat Chormai | p****i@g****m | 1 |
| David Neumann | d****n@l****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 18
- Total pull requests: 30
- Average time to close issues: 10 days
- Average time to close pull requests: about 17 hours
- Total issue authors: 2
- Total pull request authors: 4
- Average comments per issue: 0.0
- Average comments per pull request: 0.03
- Merged pull requests: 30
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 17
- Pull requests: 16
- Average time to close issues: 10 days
- Average time to close pull requests: about 8 hours
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 16
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- lecode-official (18)
- sebastian-lapuschkin-sideprojects (1)
Pull Request Authors
- lecode-official (31)
- chr5tphr (18)
- sebastian-lapuschkin (2)
- p16i (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 379 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 3
- Total maintainers: 1
pypi.org: corelay
CoRelAy is a tool to compose small-scale (single-machine) analysis pipelines to generate analysis data which can then be visualized using ViRelAy.
- Documentation: https://corelay.readthedocs.io/en/latest/
- License: GNU General Public License v3 or later (GPLv3+),GNU Lesser General Public License v3 or later (LGPLv3+)
-
Latest release: 1.0.0
published 7 months ago
# Composing Relevance Analysis
[](https://github.com/virelay/corelay/blob/main/COPYING.LESSER)
[](https://github.com/virelay/corelay/actions/workflows/tests.yml)
[](https://corelay.readthedocs.io/en/latest)
[](https://github.com/virelay/corelay/releases/latest)
[](https://pypi.org/project/corelay/)
**CoRelAy** is a library designed for composing efficient, single-machine data analysis pipelines. It enables the rapid implementation of pipelines that can be used to analyze and process data. CoRelAy is primarily meant for the use in explainable artificial intelligence (XAI), often with the goal of producing output suitable for visualization in tools like [**ViRelAy**](https://github.com/virelay/virelay).