BLAS (Basic Linear Algebra Subprograms) is a set of functions with a standard API to do basic operations in matrix/vector linear algebra. It works mostly with dense vectors and matrices. This library is used by a lot of other libraries in high performance computing, especially when many matrix/vector operations are required.
The reference implementation is available on Netlib. However, this implementation is rarely used, since it was not developped to be efficient. Other implementations are therefore provided on our supercomputers. Those implementations were optimized for specific processors and usually perform much better, while retaining the compatibility with the reference implementation and the precision of the computation.
The original version is written in Fortran 77. However, there is a C implementation called CBLAS. This version is usually provided with the same package. The functions are grouped in three categories : level 1, 2 and 3. The level 1 functions deal with vector-vector operations. The level 2 functions deal with vector-matrix operations, and the level 3 deal with matrix-matrix operations.
A typical BLAS call is of the form
where T corresponds the type of data, and oper describes the operation to be performed. The possible data types are s, d, c, z, which respectively mean single precision (float), double precision, complex single precision, and complex double precision.
For example, the function ddot will compute the dot product of two vectors of doubles, the function zgemm will compute the matrix-matrix product of two general matrices of double complex numbers, and sgemv will compute the matrix-vector product with single precision floating point numbers.
Memory data representation
All BLAS functions assume that the matrix and vector data are stored contiguously in memory. This implies for example that a matrix can not be represented as a vector of vectors. It needs to be represented as a block of N x M contiguous elements in memory. Moreover, vectors and matrices of complex numbers must be stored such that the real and imaginary parts of a given element are contiguous in memory.
BLAS is a standard API. While the reference implementation is that of Netlib, it is usually not recommended to use it. The reason is that there are much more optimized implementations than that of Netlib.
ATLAS (Automatically Tuned Linear Algebra Software) contains a BLAS implementation written in C and developped by the open source scientific comunity. At compile time, ATLAS attempts to optimize the code to the used architecture.
Even though it performs better than the reference implementation, ATLAS is not as optimized as the MKL and GotoBLAS implementations.
Intel Math Kernel Library (MKL)
Math Kernel Library is a commercial mathematical library that includes BLAS optimized for Intel processors. It is the implementation that is recommended in most cases on our servers.
GotoBLAS2 is a BLAS implementation developped by Kazushige Goto. This is an important implementation since it provides a great number of optimizations written in assembly code to reduce cache misses and TLB misses. In some cases, this is the fastest implementation available.