Recent Releases of https://github.com/jianqoq/hpt
https://github.com/jianqoq/hpt - 0.1.2
New Methods
from_raw, allows user to pass raw pointer to create a new Tensorforget, check reference count and forget the memory, you can use it to construct other libary's Tensor.forget_copy, clone the data, return the cloned memory, this method doesn't need to check reference count.- cpu
matmul_post, allows user to do post calculation after matrix multiplication - cuda
conv2d, convolution, usescudnnas the backed - cuda
dw_conv2d, depth-wise convolution, usescudnnas the backed - cuda
conv2d_group, group convolution, usescudnnas the backed - cuda
batchnorm_conv2d, convolution with batch normalization, usescudnnas the backed
Bug fixes
- batch matmul for CPU
matmul - wrong
max_nrandmax_mrfor bf16/f16 mixed_precision matmul kernel - wrong conversion from
CPUtoCUDATensor whenCPUTensor is not contiguous - wrong usage of cublas in
matmulforCUDA
Internal Change
- added layout validation for
scatterinCPU - use fp16 instruction to convert f32 to f16 for Neon. Speed up all calculation related to f16 for Neon.
- let f16 able to convert to i16/u16 by using
fp16 - refectored simd files, make it more maintainable and extendable
- re-exports cudarc
- Rust
Published by Jianqoq about 1 year ago
https://github.com/jianqoq/hpt -
- Fixed wrong index calculation for matmul when input is transposed
- Rust
Published by Jianqoq about 1 year ago
https://github.com/jianqoq/hpt - v0.1.0
- fixed some docs issue
- implemented Matmul for CPU, allows all primitive data types
- exposed
FFTmethods - use
fp16instructions forf16for Neon - fixed wrong
fmacalculation for f32, f64 in Neon - added Matmul, FFT benchmarks
- update
LRU_cache_sizeafter resize lru cache
- Rust
Published by Jianqoq about 1 year ago
https://github.com/jianqoq/hpt -
- refectored files
- fixed wrong calculation for reduction in 1-dimension case
- fixed save load issue for Cuda.
- simplified save/load API
- added tests for save/load for cpu and cuda.
- changed some methods api like
selu - fixed lots of docs issue in github page
- make rust docs consistent for tensor operators
- Rust
Published by Jianqoq about 1 year ago
https://github.com/jianqoq/hpt -
- reexport
half::f16andhalf::bf16 - added docs for
conv - simplified some trait bounds
- added custom allocator support for some methods that I left in last release.
- changed
Debugbehavior, now debug will show tensor meta data info instead of printing the data
- Rust
Published by Jianqoq about 1 year ago
https://github.com/jianqoq/hpt -
- redesigned slice, changed
match_selectiontoselect, now support syntax likeselect![1:2:3, .., 2:], similar asNumpy. - added support for custom allocator, user can now use their custom memory allocator
concat,vstack,hstack,dstackare now moved toConcattrait.- updated
concat,vstack,hstack,dstackdocs, fixedresize_cuda_lru_cachedoc.
- Rust
Published by Jianqoq about 1 year ago
https://github.com/jianqoq/hpt - v0.0.16
- added cuda kernel launch configuration checking function
- added single/list cuda tensor saving/loading support
- added incremental compilation support for hpt-cudakernels, speed up development speed
- added parallel nvcc compilation
- reimplemented reduce kernels, optimized and implemented reduce for CUDA for all reduction operators CPU supported
- added
resize_lru_cache, allowed user to control lru cache size. - Renamed
set_lr_display_elementstoset_display_elements - Renamed
set_cuda_seed, and now it accepts backend generic type - added docs for
get_num_threads,set_num_threads,resize_lru_cache,set_display_elements,set_display_precision,set_seed - fixed wrong
cudatensor tocputensor conversion issue when tensor is sliced. - Simplified display method implementation for cuda, now directly call
to_cpu - added reduce benchmark for
cudaat github page
- Rust
Published by Jianqoq over 1 year ago
https://github.com/jianqoq/hpt -
- added Save/Load derive macro support for Cuda Backend
- added uncontiguous support for Cuda reduce
- refectored hpt-allocator, simplified implementation, improved matain ability
- updated tensor display method documentation
- added unary and reduce tests for cuda
- fixed cuda scalar sinh, tanh, cosh method code gen issue.
- Added CudaType trait to allow cross platform type name mapping between rust primitive type and C promitive type.
- refectored hpt file organization for cuda
- added backend support status in hpt docs
- added resnet example in hpt-examples
- added lstm, resnet benchmarks for hpt CPU backend in docs
- changed with out methods signature, all method with name
*_will requires mutable out. - fixed docs for binary methods.
- changed some crate's method visibility so the user won't see them
- Rust
Published by Jianqoq over 1 year ago