https://github.com/jcmgray/hptt
High-Performance Tensor Transpose library
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, acm.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
Repository
High-Performance Tensor Transpose library
Basic Info
- Host: GitHub
- Owner: jcmgray
- License: bsd-3-clause
- Language: C++
- Default Branch: master
- Size: 812 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of springer13/hptt
Created over 7 years ago
· Last pushed over 7 years ago
https://github.com/jcmgray/hptt/blob/master/
# High-Performance Tensor Transpose library # HPTT is a high-performance C++ library for out-of-place tensor transpositions of the general form:  where A and B respectively denote the input and output tensor;represents the user-specified transposition, and
and
being scalars (i.e., setting
!= 0 enables the user to update the output tensor B). # Key Features * Multi-threading support * Explicit vectorization * Auto-tuning (akin to FFTW) * Loop order * Parallelization * Multi architecture support * Explicitly vectorized kernels for (AVX and ARM) * Supports float, double, complex and double complex data types * Supports both column-major and row-major data layouts HPTT now also offers C- and Python-interfaces (see below). # Requirements You must have a working C++ compiler with c++11 support. I have tested HPTT with: * Intel's ICPC 15.0.3, 16.0.3, 17.0.2 * GNU g++ 5.4, 6.2, 6.3 * clang++ 3.8, 3.9 # Install Clone the repository into a desired directory and change to that location: git clone https://github.com/springer13/hptt.git cd hptt export CXX=
Now you have several options to build the desired version of the library: make avx make arm make scalar This should create 'libhptt.so' inside the ./lib folder. # Getting Started Please have a look at the provided benchmark.cpp. In general HPTT is used as follows: #include // allocate tensors float A* = ... float B* = ... // specify permutation and size int dim = 6; int perm[dim] = {5,2,0,4,1,3}; int size[dim] = {48,28,48,28,28}; // create a plan (shared_ptr) auto plan = hptt::create_plan( perm, dim, alpha, A, size, NULL, beta, B, NULL, hptt::ESTIMATE, numThreads); // execute the transposition plan->execute(); The example above does not use any auto-tuning, but solely relies on HPTT's performance model. To active auto-tuning, please use hptt::MEASURE, or hptt::PATIENT instead of hptt::ESTIMATE. ## C-Interface HPTT also offeres a C-interface. This interface is less expressive than its C++ counter part since it does not expose control over the plan. void sTensorTranspose( const int *perm, const int dim, const float alpha, const float *A, const int *sizeA, const int *outerSizeA, const float beta, float *B, const int *outerSizeB, const int numThreads, const int useRowMajor); void dTensorTranspose( const int *perm, const int dim, const double alpha, const double *A, const int *sizeA, const int *outerSizeA, const double beta, double *B, const int *outerSizeB, const int numThreads, const int useRowMajor); ... ## Python-Interface HPTT now also offers a python-interface. The functionality offered by HPTT is comparable to [numpy.transpose](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html) with the difference being that HPTT can also update the output tensor. tensorTransposeAndUpdate( perm, alpha, A, beta, B, numThreads=-1) tensorTranspose( perm, alpha, A, numThreads=-1) See docstring for additional information. Based on those there are also the following drop-in replacements for ``numpy`` functions: hptt.transpose(A, axes) hptt.ascontiguousarray(A) hptt.asfortranarray(A) Installation should be straight forward via: cd ./pythonAPI python setup.py install or pip install -U . if you want a ``pip`` managed install. At this point you should be able to import the 'hptt' package within your python scripts. The python interface also offers support for: * Single and double precision * Column-major and row-major data layouts * multi-threading support (HPTT by default utilizes all cores of a system) ### Python Benchmark You can find an elaborate example under ./pythonAPI/benchmark/benchmark.py --help * Multi-threaded 2x Intel Haswell-EP E5-2680 v3 (24 threads) * Comparison again [numpy.transpose](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html)  # Documentation You can generate the doxygen documentation via make doc # Benchmark The benchmark is the same as the original TTC benchmark [benchmark for tensor transpositions](https://github.com/HPAC/TTC/blob/master/benchmark). You can compile the benchmark via: cd benchmark make Before running the benchmark, please modify the number of threads and the thread affinity within the benchmark.sh file. To run the benchmark just use: ./benshmark.sh This will create hptt_benchmark.dat file containing all the runtime information of HPTT and the reference implementation. # Performance Results  See [(pdf)](https://arxiv.org/abs/1704.04374) for details. # TODOs * Add explicit vectorization for IBM power * Add explicit vectorization for complex types # Related Projects * Shared-Memory Tensor Contractions: * [TCL](https://github.com/springer13/tcl) * [TBLIS](https://github.com/devinamatthews/tblis) * Distributed-Memory Tensor Contractions: * [CTF](https://github.com/cyclops-community/ctf) * [libtensor](https://github.com/epifanovsky/libtensor) * Tensor network codes: * [ITensor](http://itensor.org/) * [Uni10](http://yingjerkao.github.io/uni10/) # Citation In case you want refer to HPTT as part of a research paper, please cite the following article [(pdf)](https://arxiv.org/abs/1704.04374): ``` @inproceedings{hptt2017, author = {Springer, Paul and Su, Tong and Bientinesi, Paolo}, title = {{HPTT}: {A} {H}igh-{P}erformance {T}ensor {T}ransposition {C}++ {L}ibrary}, booktitle = {Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming}, series = {ARRAY 2017}, year = {2017}, isbn = {978-1-4503-5069-3}, location = {Barcelona, Spain}, pages = {56--62}, numpages = {7}, url = {http://doi.acm.org/10.1145/3091966.3091968}, doi = {10.1145/3091966.3091968}, acmid = {3091968}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {High-Performance Computing, autotuning, multidimensional transposition, tensor transposition, tensors, vectorization}, } ```
Owner
- Name: Johnnie Gray
- Login: jcmgray
- Kind: user
- Company: Caltech
- Repositories: 39
- Profile: https://github.com/jcmgray
represents the user-specified
transposition, and
and
being scalars
(i.e., setting