NVIDIA inference platform
E892043
The NVIDIA inference platform is a comprehensive suite of hardware and software tools designed to accelerate and optimize AI model deployment and real-time inference across data center, edge, and embedded environments.
Statements (69)
| Predicate | Object |
|---|---|
| instanceOf |
AI inference platform
ⓘ
software and hardware platform ⓘ |
| developer | NVIDIA NERFINISHED ⓘ |
| includesComponent |
NVIDIA AI Enterprise
NERFINISHED
ⓘ
NVIDIA AI Workbench integration ⓘ NVIDIA Base Command Manager NERFINISHED ⓘ NVIDIA BlueField DPUs NERFINISHED ⓘ NVIDIA CUDA NERFINISHED ⓘ NVIDIA DGX systems NERFINISHED ⓘ NVIDIA EGX platform NERFINISHED ⓘ NVIDIA GPU operator NERFINISHED ⓘ NVIDIA GPUs NERFINISHED ⓘ NVIDIA Jetson platform NERFINISHED ⓘ NVIDIA NIM microservices NERFINISHED ⓘ NVIDIA NeMo microservices NERFINISHED ⓘ NVIDIA TensorRT NERFINISHED ⓘ NVIDIA TensorRT-LLM NERFINISHED ⓘ NVIDIA Triton Inference Server NERFINISHED ⓘ NVIDIA cuDNN NERFINISHED ⓘ NVIDIA networking NERFINISHED ⓘ |
| optimizationFeature |
FP16 mixed precision
ⓘ
INT8 quantization ⓘ dynamic batching ⓘ layer fusion ⓘ model ensemble execution ⓘ precision calibration ⓘ |
| providesCapability |
autoscaling of inference workloads
ⓘ
model optimization ⓘ model serving ⓘ multi-GPU inference ⓘ multi-node inference ⓘ observability and metrics for inference ⓘ |
| purpose |
accelerate AI inference
ⓘ
optimize AI model deployment ⓘ |
| relatedTo |
NVIDIA AI platform
NERFINISHED
ⓘ
NVIDIA training platform ⓘ |
| softwareStack |
CUDA
NERFINISHED
ⓘ
NVIDIA AI Enterprise NERFINISHED ⓘ TensorRT NERFINISHED ⓘ Triton Inference Server NERFINISHED ⓘ cuDNN NERFINISHED ⓘ |
| supportsDeployment |
Kubernetes
NERFINISHED
ⓘ
bare-metal servers ⓘ cloud environments ⓘ edge devices ⓘ embedded modules ⓘ on-premises data centers ⓘ virtual machines ⓘ |
| supportsEnvironment |
data center
ⓘ
edge ⓘ embedded systems ⓘ |
| supportsFramework |
ONNX Runtime
NERFINISHED
ⓘ
PyTorch NERFINISHED ⓘ TensorFlow NERFINISHED ⓘ XGBoost NERFINISHED ⓘ |
| supportsModelFormat |
ONNX
NERFINISHED
ⓘ
TensorFlow SavedModel NERFINISHED ⓘ TensorRT engine NERFINISHED ⓘ TorchScript NERFINISHED ⓘ |
| supportsUseCase |
batch inference
ⓘ
computer vision inference ⓘ large language model inference ⓘ online prediction services ⓘ real-time inference ⓘ recommender systems ⓘ speech AI inference ⓘ |
| targetUser |
AI developers
ⓘ
IT operations teams ⓘ MLOps engineers ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.