PB-BLAS: A Set of Parallel Block Basic Linear Algebra Subprograms

J. Choi
School of Computing
Soongsil University
1-1 Sangdo-Dong, Dongjak-Ku
Seoul 156-743
South Korea
J. J. Dongarra
Department of Computer Science
University of Tennessee
Knoxville, TN 37996-1301
U. S. A.
D. W. Walker
Mathematical Sciences Section
Oak Ridge National Laboratory
P. O. Box 2008
Oak Ridge, TN 37831-6367
U. S. A.

Abstract

We propose a new software package, which would be very useful for implementing dense linear algebra algorithms on block-partitioned matrices. The routines are referred to as the Block Basic Linear Algebra Subprograms, and their use is restricted to computations in which one or more of the matrices involved consists of a single row or column of blocks, and in which no more than one of the matrices consists of an unrestricted two-dimensional array of blocks. The functionality of the block BLAS routines can also be provided by Level 2 and 3 BLAS routines. However, for Non-Uniform Memory Access machines the use of the block BLAS permit certain optimizations in memory access to be taken advantage of. This is particularly true for distributed memory machines, for which the block BLAS are referred to as the Parallel Block Basic Linear Algebra Subprograms (PB-BLAS). The PB-BLAS are the main focus of this paper, and for a block-cyclic data distribution, a single row or column of blocks lies in a single row or column of the processor template. The PB-BLAS consist of calls to the sequential BLAS for local computations, and calls to the BLACS for communication. The PB-BLAS are the building blocks for implementing ScaLAPACK, the distributed-memory version of LAPACK, and provide the same ease-of-use and portability for ScaLapack that the BLAS provide for LAPACK. The PB-BLAS consists of all Level 2 and 3 BLAS routines for dense matrix computations (not for banded matrix) and 4 auxiliary routines for transposing and copying a vector and/or a block vector. The PB-BLAS are currently available for all numeric data types, i.e., single and double precision real and complex.

J. Choi, J. J. Dongarra, and D. W. Walker, PB-BLAS: A Set of Parallel Block Basic Linear Algebra Subprograms, Concurrency: Practice and Experience, Vol.8, No. 7, pages 517-535, September 1996.