https://github.com/datasig-ac-uk/pyrecombine

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.1%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: datasig-ac-uk
License: bsd-3-clause
Language: C++
Default Branch: main
Size: 154 KB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 1
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

PyRecombine

Performs a dynamic Caratheodory process and takes a weighted collection of vectors and identifies by pointers, a subset of minimal cardinality among the vectors and new weights so both empirical measures have the same mean. Software written by Terry Lyons, based on algorithms developed jointly with Christian Litter and then with Maria Tchernychova 2008-2020. Here minimal is a local notion and means it cannot be further reduced. There may be other noncomparable cubature sets with fewer points.

The library has a robust and stable C interface and even old dlls from 2008 with the old algorithm still run. Measures are represented by an array of null pointers (pseudo-points) and an array of equal length of positive doubles adding to 1.; the calling program provides a feature map that converts each null pointer to a vector, and so recombine never knows or needs to know the real types of the points. Recombine returns the indexes of the points that should remain and new positive weights so that the mean of these remaining points with the new weights is the same as the old mean. The new list of survivors is never more than D+1 long where D is the dimension of the vector space. If there are N points in D dimensions the sequential complexity of this algorithm is less than ND + log_2(N/D) D^3. This reflects the order of magnitude improvement in the algorithm developed with Maria Tchernychova; the algorithm with Litterer had complexity ND + log_2(N/D) D^4 although it is quicker for small problems. The interface remains the same. The ND comes from touching the points, and log_2(N/D) D^3 from performing log_2(N/D) complex SVD type calculations on Dx2D matrices. This is a linear programming problem under the surface, but the algorithm here has fixed complexity. In many of the problems we are interested in N is approximately D^2 so the cost is (to logarithms) equated with the cost of touching the points.

The algorithm uses MKL (LAPACK) to do most of the linear algebra, although there is a part that is bespoke. The operations benefit from some parallelism via OMP. Say (export OMPNUMTHREADS=8). The log factor is truly sequential.

Owner

Name: DataSig
Login: datasig-ac-uk
Kind: organization

Website: https://datasig.web.ox.ac.uk/
Repositories: 3
Profile: https://github.com/datasig-ac-uk

A rough path between mathematics and data science

GitHub Events

Total

Issues event: 1
Public event: 1
Push event: 96

Last Year

Issues event: 1
Public event: 1
Push event: 96

Dependencies

.github/workflows/build_wheels.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite
actions/upload-artifact v4 composite
pypa/cibuildwheel v2.21.3 composite

pyproject.toml pypi

intel_openmp (platform_machine == 'AMD64' or platform_machine == 'x86_64') and sys_platform != 'darwin'
numpy >= 2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science