CUTENSOR-TUBAL: OPTIMIZED GPU LIBRARY FOR LOW-TUBAL-RANK TENSORS

In this paper, we optimize the computations of third-order low-tubal-rank tensor operations on many-core GPUs. Tensor operations are compute-intensive and existing studies optimize such operations in a case-by-case manner, which can be inefficient and error-prone. We develop and optimize a BLAS-like library for the low-tubal-rank tensor model called cuTensor-tubal, which includes efficient GPU primitives for tensor operations and key processes. We compute tensor operations in the frequency domain and fully exploit tube-wise and slice-wise parallelisms. We design, implement, and optimize four key tensor operations namely t-FFT, inverse t-FFT, t-product, and t-SVD. For t-product and t-SVD, cuTensor-tubal demonstrates significant speedups: maximum 29.16X, 6.72X speedups over the non-optimized GPU counterparts, and maximum 16.91X and 27.03X speedups over the CPU implementations running on dual 10-core Xeon CPUs.

Paper link on IEEE Xplore:https://ieeexplore.ieee.org/document/8682323

ICASSP_poster_taozhang V1 final.pdf

The poster for paper entitled "CUTENSOR-TUBAL: OPTIMIZED GPU LIBRARY FOR LOW-TUBAL-RANK TENSORS" (493)

Thumbs Up

CITE

Documents

Poster

CUTENSOR-TUBAL: OPTIMIZED GPU LIBRARY FOR LOW-TUBAL-RANK TENSORS

ICASSP_poster_taozhang V1 final.pdf

QUESTIONS?