https://github.com/amedar-asterisk/u-statistics-python
A Python package for efficient computation of U-statistics via tensor contraction.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.1%) to scientific vocabulary
Keywords
Repository
A Python package for efficient computation of U-statistics via tensor contraction.
Basic Info
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
U-Statistics-python u-stats
U-statistics are fundamental tools in statistics, probability theory, theoretical computer science, economics, statistical physics, and machine learning. Named after Wassily Hoeffding, U-statistics provide unbiased estimators for population parameters and form the foundation for many statistical tests and methods. However, computing U-statistics can be computationally demanding, especially for high-order cases where the number of combinations grows exponentially.
This package provides a high-performance, tensor-based implementation for computing U-statistics and V-statistics with significant computational advantages:
- Leverages the underlying structure of kernel functions to significantly reduce computational complexity in many cases
- Utilizes optimized einsum engines—
numpy.einsumandtorch.einsum—to enable efficient computation on both CPU and GPU
If you find this library useful in your research, please consider citing our paper:
- Title: On computing and the complexity of computing higher-order U-statistics, exactly
- Authors: Xingyu Chen, Ruiqi Zhang And Lin Liu
- Url: https://arxiv.org/abs/2508.12627
bibtex
@article{CZL2025,
title = {On computing and the complexity of computing higher-order $U$-statistics, exactly},
author = {Xingyu Chen and Ruiqi Zhang and Lin Liu},
year = {2025},
archivePrefix= {arXiv},
eprint = {2508.12627},
primaryClass = {stat.ML},
doi = {10.48550/arXiv.2508.12627},
url = {https://arxiv.org/abs/2508.12627},
}
Table of Contents
1. Installation
Install the package from PyPI:
bash
pip install u-stats
For development installation:
bash
git clone https://github.com/Amedar-Asterisk/U-Statistics-python.git
cd U-Statistics-python
pip install -e .
2. Requirements
Required Dependencies
- Python 3.11+
- NumPy >= 1.20.0
- opt_einsum >= 3.3.0
Optional Dependencies
- torch >= 1.9.0: GPU acceleration and parallel CPU computation
3. Example Usage
3.1 Selection of Backend
The package supports two computation backends for different performance needs:
```python from ustats import setbackend, get_backend
Set backend to NumPy (default)
set_backend("numpy") # CPU computation, deterministic results
Set backend to PyTorch (optional)
set_backend("torch") # GPU acceleration and parallel CPU computation
Check current backend
currentbackend = getbackend() print(f"Current backend: {current_backend}") ```
3.2 Computing U-statistics
ustat is the main function for computing U-statistics.
Here we take a 7th-order U-statistic with kernel function
$$ h(x1,x2,\dots,x7) = h1(x1, x2) h2(x2, x3) \dots h6(x6, x7) $$
as an example. For samples $X = (X1, \dots, Xn)$, the U-statistic takes the form
$$ U = \frac{1}{n(n-1)\cdots (n-6)} \sum{(i1,\dots,i7) \in P7}h1(X{i1},X{i2})\cdots h6(X{i6},X{i7}) $$
where $P_7$ denotes all 7-tuples of distinct indices.
3.2.1 Tensor Assembly
We assume that the kernel function values on samples $X$ have been precomputed and assembled into matrices $H1, H2, \dots, H_6 \in \mathbb{R}^{n \times n}$:
$$ Hk[i,j] = hk(Xi, Xj) $$
3.2.2 Expression Formats
The expression defines how kernel matrices are connected in the computation. We take this U-statistic as an example to explain how to construct expression.
To express the structure of the kernel function $h(x1, x2, \dots, x7) = h1(x1, x2) \cdot h2(x2, x3) \cdots h{6}(x{6}, x7)$, we assign a unique index to each distinct variable $x1, x2, \dots, x7$. For each factor $hk(x{k}, x{k+1})$, we collect the indices of the variables it depends on into a pair. The sequence of pairs is then ordered according to the order of the factors in the product.
We can use the following notation to represent this structure:
Einstein Summation Notation:
python
expression = "ab,bc,cd,de,ef,fg->"
Nested List Notation:
python
expression = [[1,2],[2,3],[3,4],[4,5],[5,6],[6,7]]
Format Explanation:
- In Einstein notation: each letter represents an index, each string like "ab" represents a factor of the kernel
- In list notation: each sub-list [i,j] represents a factor of the kernel
Both formats are equivalent and specify the same computation pattern in our package
3.2.3 Complete Example
```python from ustats import ustat, setbackend import numpy as np
Choose computation backend
set_backend("torch") # Use torch if available
The default is numpy
Set number of samples
n = 100
Create precomputed kernel matrices
In practice, these would be computed from your actual data
H1 = np.random.rand(n, n) H2 = np.random.rand(n, n) H3 = np.random.rand(n, n) H4 = np.random.rand(n, n) H5 = np.random.rand(n, n) H6 = np.random.rand(n, n)
tensors = [H1, H2, H3, H4, H5, H6] expression = "ab,bc,cd,de,ef,fg->"
Compute the U-statistic
result = ustat(tensors=tensors, expression=expression, average=True) print(f"7th-order U-statistic: {result}")
You can also compute the unaveraged sum
sumresult = ustat(tensors=tensors, expression=expression, average=False) print(f"Sum (before averaging): {sumresult}") ```
4. API Reference
4.1 Main Functions
ustat(tensors, expression, average=True, optimize="greedy", **kwargs)
Compute U-statistics from input tensors.
Parameters:
- tensors (List[np.ndarray]): List of input tensors (numpy arrays or torch tensors)
- expression (str | List | Tuple): Tensor contraction expression
- String format: Einstein summation notation (e.g., "ij,jk->ik")
- List format: Nested indices (e.g., [[1,2],[2,3]])
- average (bool, default=True): Whether to compute average (True) or sum (False)
- optimize (str, default="greedy"): Optimization strategy for tensor contraction
- "greedy": Fast heuristic optimization
- "optimal": Exhaustive search for optimal contraction order
- "dp": Dynamic programming approach
- **kwargs: Additional keyword arguments passed to opt_einsum.contract
Returns:
- float: Computed U-statistic value
Example:
python
result = ustat([H1, H2], "ij,jk->", average=True, optimize="optimal")
vstat(tensors, expression, average=True, optimize="greedy", **kwargs)
Compute V-statistics from input tensors.
Parameters: Same as ustat
Returns: Computed V-statistic value
u_stats_loop(tensors, expression)
Reference implementation using explicit loops (for validation and small computations).
Note: This function is primarily for testing and educational purposes. Use ustats for production code.
set_backend(backend_name)
Set the tensor computation backend.
Parameters:
- backend_name (str): Backend identifier
- "numpy": Use NumPy backend
- "torch": Use PyTorch backend
Example:
python
set_backend("torch") # Switch to PyTorch backend
get_backend()
Get the current tensor computation backend.
Returns:
- str: Current backend name ("numpy" or "torch")
4.2 Classes
UStats(expression)
Class-based interface for U-statistics computation with advanced features.
Parameters:
- expression: Tensor contraction expression (same format as function interface)
Methods:
- compute(tensors, average=True, **kwargs): Compute U-statistic
- complexity_analysis(): Analyze computational complexity
- get_contraction_path(): Get optimized contraction path
Example:
python
ustats_obj = UStats("ij,jk->")
result = ustats_obj.compute([H1, H2], average=True)
complexity = ustats_obj.complexity_analysis()
VStats(expression)
Class-based interface for V-statistics computation with advanced features.
Parameters and Methods: Same as UStats
5. Changelog
See CHANGELOG.md for detailed version history and changes.
6. License
This project is licensed under the MIT License – see the LICENSE file for details.
7. Contributing
We welcome contributions! Here's how you can help:
Reporting Issues
- Use the GitHub issue tracker
- Include minimal reproducible examples
- Specify your environment (Python version, OS, backend)
Development Setup
```bash
Clone the repository
git clone https://github.com/Amedar-Asterisk/U-Statistics-python.git cd U-Statistics-python
Install in development mode
pip install -e ".[test]" ```
Pull Requests
- Fork the repository and create a feature branch
- Add tests for new functionality
- Ensure all tests pass and type checking succeeds
- Update documentation as needed
- Follow the existing code style
For questions or discussions, feel free to open an issue or reach out to the maintainers.
Owner
- Name: zrq
- Login: Amedar-Asterisk
- Kind: user
- Location: shanghai
- Company: East China Normal University
- Repositories: 1
- Profile: https://github.com/Amedar-Asterisk
Master student at ECNU in applied mathematics
GitHub Events
Total
- Release event: 2
- Watch event: 2
- Delete event: 1
- Public event: 1
- Push event: 12
- Pull request event: 1
- Create event: 1
Last Year
- Release event: 2
- Watch event: 2
- Delete event: 1
- Public event: 1
- Push event: 12
- Pull request event: 1
- Create event: 1
Issues and Pull Requests
Last synced: 8 months ago
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- numpy >=1.20.0
- opt_einsum >=3.3.0
- black >=22.0.0
- docformatter >=1.4.0
- flake8 >=4.0.0
- isort >=5.10.0
- mypy >=0.910
- numpy >=1.20.0
- opt_einsum >=3.3.0
- setuptools >=60.0.0
- torch >=1.9.0
- tox >=3.24.0
- twine >=4.0.0
- wheel >=0.37.0