distm
A Python library to efficiently calculate two dimensional distance matrices with C++ and OpenMP.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Repository
A Python library to efficiently calculate two dimensional distance matrices with C++ and OpenMP.
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
distm 
A python package for quick distance matrix calculation for both two
and three dimensional points.
Description
distm was created to answer a question, namely, is it possible to
improve the time needed to compute distance matrices in Python, the
current platform for many of my academic projects, without adding too
much complexity to the workflow.
The hope is that I accomplished this. Build the project and see if it
might help you. (I find benchmark/benchmark.py helpful for this
purpose.)
On available hardware, for various numbers of two dimensional points, the following performance comparison is obtained:
| Algorithm | 10 point time | 100 point time | 300 point time | 1000 point time | 10000 point time | 45000 point time | |--------------|----------------|----------------|----------------|-----------------|------------------|------------------| | naive python | 0:00:00.000165 | 0:00:00.012241 | 0:00:00.103896 | 0:00:01.124534 | 0:01:58.250470 | untested | | scikit-learn | 0:00:00.000305 | 0:00:00.000364 | 0:00:00.000878 | 0:00:00.011287 | 0:00:00.814089 | 0:00:17.175534 | | C++ | 0:00:00.000031 | 0:00:00.000079 | 0:00:00.000446 | 0:00:00.004367 | 0:00:00.437151 | 0:00:09.178394 | | C++/Parallel | 0:00:00.000244 | 0:00:00.000748 | 0:00:00.000326 | 0:00:00.011626 | 0:00:00.386297 | 0:00:07.315356 |
Note: 45000 two dimensonal points is approximately the limit of my offline hardware. That produces a distance matrix with 2,025,000,000 entries, after all.
And a similar set for three dimensional points:
| Algorithm | 10 point time | 100 point time | 300 point time | 1000 point time | 10000 point time | |--------------|----------------|----------------|----------------|-----------------|------------------| | naive python | 0:00:00.000160 | 0:00:00.011874 | 0:00:00.108764 | 0:00:01.191408 | 0:01:55.601313 | | scikit-learn | 0:00:00.000178 | 0:00:00.000372 | 0:00:00.004161 | 0:00:00.014556 | 0:00:00.851405 | | C++ | 0:00:00.000031 | 0:00:00.000085 | 0:00:00.000511 | 0:00:00.005072 | 0:00:00.490086 | | C++/Parallel | 0:00:00.000248 | 0:00:00.000263 | 0:00:00.036277 | 0:00:00.011716 | 0:00:00.450051 |
Features
- Distance matrix calculation for a list of two dimensional points both sequentially and in parallel.
- Distance matrix calculation for a list of three dimensional points both sequentially and in parallel.
- Distance matrix calcualtion for a list of points that is two or three dimenstions, but you do not neccesarily know beforehand.
Using distm
Precompiled point version releases of a pip install-able package
and shared library are available under
releases.
To use this, simply download the wheel and install it in the python environment of your choice with
python
pip install distm-[version information].whl
If you wish to build the package for yourself, read on.
Prerequisites
The following assumes that you are on a GNU\Linux machine with
CMakegccpython(of course)
Install
To build the package, simply clone the repository, cd into the
directory and run
shell
make
Assuming the correct dependencies are present, this should build the
shared c++ library alongside a python package in dist/.
Once this build is complete, you may use pip install on the created
.whl.
For development-style use, one may pip install -e . from the
repository root for an editable install.
Usage
To call upon the functions in this library, simply
python
import numpy
import distm
and you may call the wrapped functions distm.distm and
distm.distm_parallel, et al. For some examples of use, you can look
under examples/. If you wish to use the optimized 2D or 3D specific
functions, they are available under the names distm.calcm,
distm.calcm3d, with the parallel versions denoted with an appended
_parallel.
Testing
You may use pytest to test some of the functionality of the
package. To do so, run pytest in the repository root.
License
This project is licensed under the MIT License.
Note
If this code helps you in your academic work let me know by sending me a message. If you wish to cite this repository, see citing info.
Owner
- Name: Jake Clark
- Login: jnclark
- Kind: user
- Website: https://jnclark.org
- Repositories: 2
- Profile: https://github.com/jnclark
Origami Hobbyist, Professional Mathematician, Computer Vision, Data Scientist, Ph.D.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software in your academic work, it would be appreciated if you cite it as seen below." authors: - family-names: "Clark" given-names: "Jacob" orcid: "https://orcid.org/0000-0002-6142-0602" title: "distm: A Python package for parallel distance matrix calculations." version: 0.1.0 date-released: 2022-09-17 url: "https://github.com/jnclark/distm"
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- ncipollo/release-action v1 composite