CLBlast: The tuned OpenCL BLAS library

CLBlast is a modern, lightweight, performant and tunable OpenCL BLAS library written in C++11. It is designed to leverage the full performance potential of a wide variety of OpenCL devices from different vendors, including desktop and laptop GPUs, embedded GPUs, and other accelerators. CLBlast implements BLAS routines: basic linear algebra subprograms operating on vectors and matrices.

The library is not tuned for all possible OpenCL devices: if out-of-the-box performance is poor, please run the tuners first. See the README on GitHub for a list of already tuned devices and instructions on how to tune yourself and contribute to future releases of the CLBlast library.

View on CLBlast on GitHub.

Why CLBlast and not clBLAS or cuBLAS?

Use CLBlast instead of clBLAS:
  1. When you care about achieving maximum performance.
  2. When you want to be able to inspect the BLAS kernels or easily customize them to your needs.
  3. When you run on exotic OpenCL devices for which you need to tune yourself.
  4. When you are still running on OpenCL 1.1 hardware.
  5. When you prefer a C++ API over a C API (C API also available in CLBlast).
  6. When you value an organized and modern C++ codebase.
  7. When you target Intel CPUs and GPUs or embedded devices.
  8. When you can benefit from the increased performance of half-precision fp16 data-types.
Use CLBlast instead of cuBLAS:
  1. When you want your code to run on devices other than NVIDIA CUDA-enabled GPUs.
  2. When you want to tune for a specific configuration (e.g. rectangular matrix-sizes).
  3. When you sleep better if you know that the library you use is open-source.
  4. When you are using OpenCL rather than CUDA.
When not to use CLBlast:
  1. When you run on NVIDIA's CUDA-enabled GPUs only and can benefit from cuBLAS's assembly-level tuned kernels.

Benchmark results

Several benchmarks have been performed using CLBlast's clients and benchmarking script. Below are resuls for various devices:
  1. - Main page
  2. - NVIDIA GeForce GTX750Ti
  3. - NVIDIA Titan X (Pascal)
  4. - AMD Radeon M370X
  5. - AMD Radeon HD7970
  6. - Intel Iris Pro 5100
  7. - Intel Skylake ULT GT2
  8. - Intel Core i5-6200U
  9. - ARM Mali T628

News

June 3, 2018: CLBlast 1.4.0 released

A new CLBlast is released! The changelog and download links are published on GitHub.

May 16, 2018: Presentation at IWOCL and new CLBlast paper

CLBlast was presented at the IWOCL workshop, handouts of the slides are available here. At the same time, a completely revised version of the CLBlast paper was published on arXiv.

January 29, 2018: CLBlast 1.3.0 released

A new CLBlast is released, including bug fixes, a new integrated auto-tuner, API additions for advanced users, improved performance on Mali GPUs, and a new strided-batched routine. The changelog and download links are published on GitHub.

November 8, 2017: CLBlast 1.2.0 released

A new CLBlast is released, including bug fixes, a CUDA back-end and better GEMM performance. The changelog and download links are published on GitHub.

September 30, 2017: CLBlast 1.1.0 released

A new CLBlast is released, including bug fixes, a per-architecture tuning database, and the new im2col routine. The changelog and download links are published on GitHub.

July 30, 2017: CLBlast 1.0.0 released

CLBlast is mature enough to be released as version 1.0! The changelog and download links are published on GitHub.

May 21, 2017: CLBlast paper published on arXiv.org

A technical article on CLBlast was published on the open-access arXiv.org platform. The article is titled "CLBlast: A Tuned OpenCL BLAS Library" and discusses the library design and several of the most interesting performance results. The full 8-page PDF is freely available on arXiv.org.

May 3, 2017: Presenting CLBlast at GTC '17

CLBlast will be presented one week from now at the GPU Technology Conference in San Jose. For an abstract and more details about the 25 minute talk, see the GTC website. Slides and video will be made available after the conference.

May 2, 2017: CLBlast 0.11.0 released

Preview version 0.11.0 of CLBlast was released today. The changelog and download links are published on GitHub.

April 23, 2017: Added benchmark results

An initial set of benchmark results for 6 devices was uploaded, see the navigation links on the left.

April 20, 2017: Launch of this page

First version of this page is created. In the future it will host performance benchmark results for the CLBlast library on various devices.
clblast_picture
CLBlast links:
GitHub

Navigation:
- Main page
- GeForce GTX750 Ti
- Titan X (Pascal)
- Radeon M370X
- Radeon HD7970
- Iris Pro 5100
- Skylake ULT GT2
- Core i5-6200U
- Mali T628

Contact:
www.cedricnugteren.nl