WMMA API
E790552
The WMMA API is NVIDIA’s programming interface that lets developers perform warp-level matrix multiply-accumulate operations to efficiently leverage Tensor Cores for mixed-precision linear algebra.
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
CUDA API feature
ⓘ
programming interface ⓘ warp-level matrix multiply-accumulate API ⓘ |
| abbreviationFor | Warp Matrix Multiply-Accumulate API NERFINISHED ⓘ |
| developedBy | NVIDIA NERFINISHED ⓘ |
| documentationPublisher | NVIDIA NERFINISHED ⓘ |
| documentedIn |
CUDA C++ Programming Guide
NERFINISHED
ⓘ
CUDA Toolkit documentation ⓘ |
| executionModel | SIMT warp execution ⓘ |
| exposedVia | CUDA C++ headers ⓘ |
| granularity | warp-level ⓘ |
| introducedFor | Volta architecture Tensor Cores NERFINISHED ⓘ |
| levelOfAbstraction | low-level Tensor Core access ⓘ |
| namespace | nvcuda::wmma NERFINISHED ⓘ |
| optimizationGoal |
efficient Tensor Core utilization
ⓘ
high throughput matrix operations ⓘ |
| partOf | CUDA Toolkit NERFINISHED ⓘ |
| primaryLanguage | C++ ⓘ |
| programmingModelLevel | device-level API ⓘ |
| providesFunction |
fill_fragment
ⓘ
load_matrix_sync ⓘ mma_sync ⓘ store_matrix_sync ⓘ |
| providesType | fragment ⓘ |
| relatedTo |
CUDA core matrix operations
ⓘ
CUTLASS NERFINISHED ⓘ Tensor Core programming ⓘ cuBLAS NERFINISHED ⓘ |
| requires | CUDA-capable GPU with Tensor Cores ⓘ |
| requiresConcept |
CUDA warps
NERFINISHED
ⓘ
shared memory tiling ⓘ thread blocks ⓘ |
| supportsDataType |
half precision floating point
ⓘ
mixed precision ⓘ single precision floating point accumulation ⓘ |
| supportsFeature |
layout specification for matrices
ⓘ
row-major and column-major layouts ⓘ tile-based matrix operations ⓘ |
| supportsOperation |
matrix multiply-accumulate
ⓘ
mixed-precision linear algebra ⓘ |
| targetHardware | NVIDIA GPUs NERFINISHED ⓘ |
| targetHardwareFeature | Tensor Cores ⓘ |
| typicalDomain |
GPU-accelerated linear algebra
ⓘ
neural network inference ⓘ neural network training ⓘ |
| useCase |
GEMM acceleration
ⓘ
deep learning workloads ⓘ high-performance computing ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.