Oak Ridge, TN 37831-6367
U. S. A.
Abstract
We propose a new software package, which would be very useful for
implementing dense linear algebra algorithms
on block-partitioned matrices. The routines are referred to as
the Block Basic Linear Algebra Subprograms, and their use is restricted to
computations in which one or more of the matrices involved consists of a single
row or column of blocks, and in which no more than one of the matrices consists
of an unrestricted two-dimensional array of blocks. The functionality of the
block BLAS routines can also be provided by Level 2 and 3 BLAS routines.
However, for Non-Uniform Memory Access machines the use of the block BLAS
permit certain optimizations in memory access to be taken advantage of. This
is particularly true for distributed memory machines, for which the block
BLAS are referred to as the Parallel Block Basic
Linear Algebra Subprograms (PB-BLAS). The PB-BLAS are the main focus of this
paper, and for a block-cyclic data distribution, a single row or column
of blocks lies in a single row or column of the processor template.
The PB-BLAS consist of calls to the sequential BLAS
for local computations,
and calls to the BLACS for communication.
The PB-BLAS are the building blocks for implementing ScaLAPACK,
the distributed-memory version of LAPACK,
and provide the same ease-of-use and portability for ScaLapack
that the BLAS provide for LAPACK.
The PB-BLAS consists of all Level 2 and 3 BLAS routines for dense matrix
computations (not for banded matrix) and 4 auxiliary routines for
transposing and copying a vector and/or a block vector.
The PB-BLAS are currently available for all numeric data types,
i.e., single and double precision real and complex.
J. Choi, J. J. Dongarra, and D. W. Walker,
PB-BLAS: A Set of Parallel Block Basic Linear
Algebra Subprograms,
Concurrency: Practice and Experience, Vol.8, No. 7, pages 517-535, September
1996.