https://github.com/amedar-asterisk/u-statistics-python

A Python package for efficient computation of U-statistics via tensor contraction.

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.1%) to scientific vocabulary

Keywords

einsum statistics u-statistics

Last synced: 10 months ago · JSON representation

Repository

A Python package for efficient computation of U-statistics via tensor contraction.

Basic Info

Host: GitHub
Owner: Amedar-Asterisk
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 314 KB

Statistics

Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Topics

einsum statistics u-statistics

Created over 1 year ago · Last pushed 10 months ago

Metadata Files

Readme Changelog License

U-Statistics-python `u-stats`

U-statistics are fundamental tools in statistics, probability theory, theoretical computer science, economics, statistical physics, and machine learning. Named after Wassily Hoeffding, U-statistics provide unbiased estimators for population parameters and form the foundation for many statistical tests and methods. However, computing U-statistics can be computationally demanding, especially for high-order cases where the number of combinations grows exponentially.

This package provides a high-performance, tensor-based implementation for computing U-statistics and V-statistics with significant computational advantages:

Leverages the underlying structure of kernel functions to significantly reduce computational complexity in many cases
Utilizes optimized einsum engines—numpy.einsum and torch.einsum—to enable efficient computation on both CPU and GPU

If you find this library useful in your research, please consider citing our paper:

Title: On computing and the complexity of computing higher-order U-statistics, exactly
Authors: Xingyu Chen, Ruiqi Zhang And Lin Liu
Url: https://arxiv.org/abs/2508.12627

bibtex @article{CZL2025, title = {On computing and the complexity of computing higher-order $U$-statistics, exactly}, author = {Xingyu Chen and Ruiqi Zhang and Lin Liu}, year = {2025}, archivePrefix= {arXiv}, eprint = {2508.12627}, primaryClass = {stat.ML}, doi = {10.48550/arXiv.2508.12627}, url = {https://arxiv.org/abs/2508.12627}, }

Installation
Requirements
Example Usage
API Reference
Changelog
License
Contributing

1. Installation

Install the package from PyPI:

bash pip install u-stats

For development installation:

bash git clone https://github.com/Amedar-Asterisk/U-Statistics-python.git cd U-Statistics-python pip install -e .

2. Requirements

Required Dependencies

Python 3.11+
NumPy >= 1.20.0
opt_einsum >= 3.3.0

Optional Dependencies

torch >= 1.9.0: GPU acceleration and parallel CPU computation

3. Example Usage

3.1 Selection of Backend

The package supports two computation backends for different performance needs:

```python from ustats import setbackend, get_backend

Set backend to NumPy (default)

set_backend("numpy") # CPU computation, deterministic results

Set backend to PyTorch (optional)

set_backend("torch") # GPU acceleration and parallel CPU computation

Check current backend

currentbackend = getbackend() print(f"Current backend: {current_backend}") ```

3.2 Computing U-statistics

ustat is the main function for computing U-statistics.

Here we take a 7th-order U-statistic with kernel function

$$ h(x1,x2,\dots,x7) = h1(x1, x2) h2(x2, x3) \dots h6(x6, x7) $$

as an example. For samples $X = (X1, \dots, Xn)$, the U-statistic takes the form

$$ U = \frac{1}{n(n-1)\cdots (n-6)} \sum{(i1,\dots,i7) \in P7}h1(X{i1},X{i2})\cdots h6(X{i6},X{i7}) $$

where $P_7$ denotes all 7-tuples of distinct indices.

3.2.1 Tensor Assembly

We assume that the kernel function values on samples $X$ have been precomputed and assembled into matrices $H1, H2, \dots, H_6 \in \mathbb{R}^{n \times n}$:

$$ Hk[i,j] = hk(Xi, Xj) $$

3.2.2 Expression Formats

The expression defines how kernel matrices are connected in the computation. We take this U-statistic as an example to explain how to construct expression.

To express the structure of the kernel function $h(x1, x2, \dots, x7) = h1(x1, x2) \cdot h2(x2, x3) \cdots h{6}(x{6}, x7)$, we assign a unique index to each distinct variable $x1, x2, \dots, x7$. For each factor $hk(x{k}, x{k+1})$, we collect the indices of the variables it depends on into a pair. The sequence of pairs is then ordered according to the order of the factors in the product.

We can use the following notation to represent this structure:

Einstein Summation Notation: python expression = "ab,bc,cd,de,ef,fg->"

Nested List Notation: python expression = [[1,2],[2,3],[3,4],[4,5],[5,6],[6,7]]

Format Explanation: - In Einstein notation: each letter represents an index, each string like "ab" represents a factor of the kernel - In list notation: each sub-list [i,j] represents a factor of the kernel Both formats are equivalent and specify the same computation pattern in our package

3.2.3 Complete Example

```python from ustats import ustat, setbackend import numpy as np

Choose computation backend

set_backend("torch") # Use torch if available

The default is numpy

Set number of samples

n = 100

Create precomputed kernel matrices

In practice, these would be computed from your actual data

H1 = np.random.rand(n, n) H2 = np.random.rand(n, n) H3 = np.random.rand(n, n) H4 = np.random.rand(n, n) H5 = np.random.rand(n, n) H6 = np.random.rand(n, n)

tensors = [H1, H2, H3, H4, H5, H6] expression = "ab,bc,cd,de,ef,fg->"

Compute the U-statistic

result = ustat(tensors=tensors, expression=expression, average=True) print(f"7th-order U-statistic: {result}")

You can also compute the unaveraged sum

sumresult = ustat(tensors=tensors, expression=expression, average=False) print(f"Sum (before averaging): {sumresult}") ```

4. API Reference

4.1 Main Functions

`ustat(tensors, expression, average=True, optimize="greedy", **kwargs)`

Compute U-statistics from input tensors.

Parameters: - tensors (List[np.ndarray]): List of input tensors (numpy arrays or torch tensors) - expression (str | List | Tuple): Tensor contraction expression - String format: Einstein summation notation (e.g., "ij,jk->ik") - List format: Nested indices (e.g., [[1,2],[2,3]]) - average (bool, default=True): Whether to compute average (True) or sum (False) - optimize (str, default="greedy"): Optimization strategy for tensor contraction - "greedy": Fast heuristic optimization - "optimal": Exhaustive search for optimal contraction order - "dp": Dynamic programming approach - **kwargs: Additional keyword arguments passed to opt_einsum.contract

Returns: - float: Computed U-statistic value

Example: python result = ustat([H1, H2], "ij,jk->", average=True, optimize="optimal")

`vstat(tensors, expression, average=True, optimize="greedy", **kwargs)`

Compute V-statistics from input tensors.

Parameters: Same as ustat

Returns: Computed V-statistic value

`u_stats_loop(tensors, expression)`

Reference implementation using explicit loops (for validation and small computations).

Note: This function is primarily for testing and educational purposes. Use ustats for production code.

`set_backend(backend_name)`

Set the tensor computation backend.

Parameters: - backend_name (str): Backend identifier - "numpy": Use NumPy backend - "torch": Use PyTorch backend

Example: python set_backend("torch") # Switch to PyTorch backend

`get_backend()`

Get the current tensor computation backend.

Returns: - str: Current backend name ("numpy" or "torch")

4.2 Classes

`UStats(expression)`

Class-based interface for U-statistics computation with advanced features.

Parameters: - expression: Tensor contraction expression (same format as function interface)

Methods: - compute(tensors, average=True, **kwargs): Compute U-statistic - complexity_analysis(): Analyze computational complexity - get_contraction_path(): Get optimized contraction path

Example: python ustats_obj = UStats("ij,jk->") result = ustats_obj.compute([H1, H2], average=True) complexity = ustats_obj.complexity_analysis()

`VStats(expression)`

Class-based interface for V-statistics computation with advanced features.

Parameters and Methods: Same as UStats

5. Changelog

See CHANGELOG.md for detailed version history and changes.

6. License

This project is licensed under the MIT License – see the LICENSE file for details.

7. Contributing

We welcome contributions! Here's how you can help:

Reporting Issues

Use the GitHub issue tracker
Include minimal reproducible examples
Specify your environment (Python version, OS, backend)

Development Setup

```bash

Clone the repository

git clone https://github.com/Amedar-Asterisk/U-Statistics-python.git cd U-Statistics-python

Install in development mode

pip install -e ".[test]" ```

Pull Requests

Fork the repository and create a feature branch
Add tests for new functionality
Ensure all tests pass and type checking succeeds
Update documentation as needed
Follow the existing code style

For questions or discussions, feel free to open an issue or reach out to the maintainers.

Owner

Name: zrq
Login: Amedar-Asterisk
Kind: user
Location: shanghai
Company: East China Normal University

Repositories: 1
Profile: https://github.com/Amedar-Asterisk

Master student at ECNU in applied mathematics

GitHub Events

Total

Release event: 2
Watch event: 2
Delete event: 1
Public event: 1
Push event: 12
Pull request event: 1
Create event: 1

Last Year

Release event: 2
Watch event: 2
Delete event: 1
Public event: 1
Push event: 12
Pull request event: 1
Create event: 1

Committers

Last synced: 12 months ago

All Time

Total Commits: 140
Total Committers: 2
Avg Commits per committer: 70.0
Development Distribution Score (DDS): 0.043

Past Year

Commits: 140
Committers: 2
Avg Commits per committer: 70.0
Development Distribution Score (DDS): 0.043

Top Committers

Name	Email	Commits
ZRQ	D**7@o**m	134
cxy0714	1**6@q**m	6

Committer Domains (Top 20 + Academic)

qq.com: 1

Issues and Pull Requests

Last synced: 10 months ago

Dependencies

.github/workflows/forbidden_content_check.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite

.github/workflows/python-publish.yml actions

actions/checkout v4 composite
actions/download-artifact v4 composite
actions/setup-python v5 composite
actions/upload-artifact v4 composite
pypa/gh-action-pypi-publish release/v1 composite

.github/workflows/style_check.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite

pyproject.toml pypi

numpy >=1.20.0
opt_einsum >=3.3.0

requirements.txt pypi

black >=22.0.0
docformatter >=1.4.0
flake8 >=4.0.0
isort >=5.10.0
mypy >=0.910
numpy >=1.20.0
opt_einsum >=3.3.0
setuptools >=60.0.0
torch >=1.9.0
tox >=3.24.0
twine >=4.0.0
wheel >=0.37.0

https://github.com/amedar-asterisk/u-statistics-python

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

U-Statistics-python u-stats

Table of Contents

1. Installation

2. Requirements

Required Dependencies

Optional Dependencies

3. Example Usage

3.1 Selection of Backend

Set backend to NumPy (default)

Set backend to PyTorch (optional)

Check current backend

3.2 Computing U-statistics

3.2.1 Tensor Assembly

3.2.2 Expression Formats

3.2.3 Complete Example

Choose computation backend

The default is numpy

Set number of samples

Create precomputed kernel matrices

In practice, these would be computed from your actual data

Compute the U-statistic

You can also compute the unaveraged sum

4. API Reference

4.1 Main Functions

ustat(tensors, expression, average=True, optimize="greedy", **kwargs)

vstat(tensors, expression, average=True, optimize="greedy", **kwargs)

u_stats_loop(tensors, expression)

set_backend(backend_name)

get_backend()

4.2 Classes

UStats(expression)

VStats(expression)

5. Changelog

6. License

7. Contributing

Reporting Issues

Development Setup

Clone the repository

Install in development mode

Pull Requests

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Dependencies

U-Statistics-python `u-stats`

`ustat(tensors, expression, average=True, optimize="greedy", **kwargs)`

`vstat(tensors, expression, average=True, optimize="greedy", **kwargs)`

`u_stats_loop(tensors, expression)`

`set_backend(backend_name)`

`get_backend()`

`UStats(expression)`

`VStats(expression)`