Sorry, you need to enable JavaScript to visit this website.

CUTENSOR-TUBAL: OPTIMIZED GPU LIBRARY FOR LOW-TUBAL-RANK TENSORS

Citation Author(s):
Tao Zhang, Xiao-Yang Liu
Submitted by:
Tao Zhang
Last updated:
8 May 2019 - 8:21am
Document Type:
Poster
Document Year:
2019
Event:
Presenters:
HAI LI
Paper Code:
2071
 

In this paper, we optimize the computations of third-order low-tubal-rank tensor operations on many-core GPUs. Tensor operations are compute-intensive and existing studies optimize such operations in a case-by-case manner, which can be inefficient and error-prone. We develop and optimize a BLAS-like library for the low-tubal-rank tensor model called cuTensor-tubal, which includes efficient GPU primitives for tensor operations and key processes. We compute tensor operations in the frequency domain and fully exploit tube-wise and slice-wise parallelisms. We design, implement, and optimize four key tensor operations namely t-FFT, inverse t-FFT, t-product, and t-SVD. For t-product and t-SVD, cuTensor-tubal demonstrates significant speedups: maximum 29.16X, 6.72X speedups over the non-optimized GPU counterparts, and maximum 16.91X and 27.03X speedups over the CPU implementations running on dual 10-core Xeon CPUs.

Paper link on IEEE Xplore:https://ieeexplore.ieee.org/document/8682323

up
0 users have voted: