WMMA API

E790552

The WMMA API is NVIDIA’s programming interface that lets developers perform warp-level matrix multiply-accumulate operations to efficiently leverage Tensor Cores for mixed-precision linear algebra.

Jump to: Statements Referenced by

Statements (48)

Predicate Object
instanceOf CUDA API feature
programming interface
warp-level matrix multiply-accumulate API
abbreviationFor Warp Matrix Multiply-Accumulate API NERFINISHED
developedBy NVIDIA NERFINISHED
documentationPublisher NVIDIA NERFINISHED
documentedIn CUDA C++ Programming Guide NERFINISHED
CUDA Toolkit documentation
executionModel SIMT warp execution
exposedVia CUDA C++ headers
granularity warp-level
introducedFor Volta architecture Tensor Cores NERFINISHED
levelOfAbstraction low-level Tensor Core access
namespace nvcuda::wmma NERFINISHED
optimizationGoal efficient Tensor Core utilization
high throughput matrix operations
partOf CUDA Toolkit NERFINISHED
primaryLanguage C++
programmingModelLevel device-level API
providesFunction fill_fragment
load_matrix_sync
mma_sync
store_matrix_sync
providesType fragment
relatedTo CUDA core matrix operations
CUTLASS NERFINISHED
Tensor Core programming
cuBLAS NERFINISHED
requires CUDA-capable GPU with Tensor Cores
requiresConcept CUDA warps NERFINISHED
shared memory tiling
thread blocks
supportsDataType half precision floating point
mixed precision
single precision floating point accumulation
supportsFeature layout specification for matrices
row-major and column-major layouts
tile-based matrix operations
supportsOperation matrix multiply-accumulate
mixed-precision linear algebra
targetHardware NVIDIA GPUs NERFINISHED
targetHardwareFeature Tensor Cores
typicalDomain GPU-accelerated linear algebra
neural network inference
neural network training
useCase GEMM acceleration
deep learning workloads
high-performance computing

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Tensor Cores exposedThrough WMMA API