WMMA API

E790552

CUDA API feature programming interface warp-level matrix multiply-accumulate API

The WMMA API is NVIDIA’s programming interface that lets developers perform warp-level matrix multiply-accumulate operations to efficiently leverage Tensor Cores for mixed-precision linear algebra.

Jump to: Statements Referenced by

Statements (48)

Predicate	Object
instanceOf	CUDA API feature ⓘ programming interface ⓘ warp-level matrix multiply-accumulate API ⓘ
abbreviationFor	Warp Matrix Multiply-Accumulate API NERFINISHED ⓘ
developedBy	NVIDIA NERFINISHED ⓘ
documentationPublisher	NVIDIA NERFINISHED ⓘ
documentedIn	CUDA C++ Programming Guide NERFINISHED ⓘ CUDA Toolkit documentation ⓘ
executionModel	SIMT warp execution ⓘ
exposedVia	CUDA C++ headers ⓘ
granularity	warp-level ⓘ
introducedFor	Volta architecture Tensor Cores NERFINISHED ⓘ
levelOfAbstraction	low-level Tensor Core access ⓘ
namespace	nvcuda::wmma NERFINISHED ⓘ
optimizationGoal	efficient Tensor Core utilization ⓘ high throughput matrix operations ⓘ
partOf	CUDA Toolkit NERFINISHED ⓘ
primaryLanguage	C++ ⓘ
programmingModelLevel	device-level API ⓘ
providesFunction	fill_fragment ⓘ load_matrix_sync ⓘ mma_sync ⓘ store_matrix_sync ⓘ
providesType	fragment ⓘ
relatedTo	CUDA core matrix operations ⓘ CUTLASS NERFINISHED ⓘ Tensor Core programming ⓘ cuBLAS NERFINISHED ⓘ
requires	CUDA-capable GPU with Tensor Cores ⓘ
requiresConcept	CUDA warps NERFINISHED ⓘ shared memory tiling ⓘ thread blocks ⓘ
supportsDataType	half precision floating point ⓘ mixed precision ⓘ single precision floating point accumulation ⓘ
supportsFeature	layout specification for matrices ⓘ row-major and column-major layouts ⓘ tile-based matrix operations ⓘ
supportsOperation	matrix multiply-accumulate ⓘ mixed-precision linear algebra ⓘ
targetHardware	NVIDIA GPUs NERFINISHED ⓘ
targetHardwareFeature	Tensor Cores ⓘ
typicalDomain	GPU-accelerated linear algebra ⓘ neural network inference ⓘ neural network training ⓘ
useCase	GEMM acceleration ⓘ deep learning workloads ⓘ high-performance computing ⓘ

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Tensor Cores → exposedThrough → WMMA API ⓘ