Key Features

Versatile Entry-Level Inference

The NVIDIA A2 Tensor Core GPU provides entry-level inference with low power, a small footprint, and high performance for NVIDIA AI at the edge. Featuring a low-profile PCIe Gen4 card and a low 40-60 watt (W) configurable thermal design power (TDP) capability, the A2 brings adaptable inference acceleration to any server.
 
A2’s versatility, compact size, and low power exceed the demands for edge deployments at scale, instantly upgrading existing entry-level CPU servers to handle inference. Servers accelerated with A2 GPUs deliver higher inference performance versus CPUs and more efficient intelligent video analytics (IVA) deployments than previous GPU generations—all at an entry-level price point.
 
NVIDIA-Certified Systems™ featuring A2 GPUs and NVIDIA AI, including the NVIDIA Triton™ Inference Server, deliver breakthrough inference performance across edge, data center, and cloud. They ensure that AI-enabled applications deploy with fewer servers and less power, resulting in easier deployments, faster insights, and significantly lower costs.

Up to 20X More Inference Performance

AI inference is deployed to make consumer lives more convenient through real-time experiences, and enables them to gain insights on trillions of end-point sensors and cameras. Compared to CPU-only servers, the servers built with NVIDIA A2 Tensor Core GPU offer up to 20X more inference performance, instantly upgrading any server to handle modern AI.

Categories

Powered by NVIDIA Ampere Architecture

NVIDIA Ampere Architecture-Based CUDA Cores

Accelerate graphics workflows with the latest CUDA® cores for up to 2.5X single-precision floating-point (FP32) performance compared to the previous generation.

Second-Generation RT Cores

Produce more visually accurate renders faster with hardware-accelerated motion blur and up to 2X faster ray-tracing performance than the previous generation.

Third-Generation Tensor Cores

Boost AI and data science model training with up to 10X faster training performance compared to the previous generation with hardware-support for structural sparsity.

Virtualization-Ready

Repurpose your personal workstation into multiple high-performance virtual workstations with support for NVIDIA RTX Virtual Workstation (vWS) software.

Third-Generation NVIDIA NVLink

Scale memory and performance across multiple GPUs with NVIDIA® NVLink™ to tackle larger datasets, models, and scenes.

PCI Express Gen 4

Improve data-transfer speeds from CPU memory for data-intensive tasks with support for PCI Express Gen 4.

Power Efficiency

Leverage a dual-slot, power efficient design that’s 2.5X more power efficient than the previous generation and crafted to fit a wide range of workstations.

Peak FP32 4.5 TF
TF32 Tensor Core 9 TF | 18 TF¹
BFLOAT16 Tensor Core 18 TF | 36 TF¹
Peak FP16 Tensor Core 18 TF | 36 TF¹
Peak INT8 Tensor Core 36 TOPS | 72 TOPS¹
Peak INT4 Tensor Core 72 TOPS | 144 TOPS¹
RT Cores 10
Media Engines 1 video encoder 2 video decoders (includes AV1 decode)
GPU memory 16GB GDDR6
GPU memory bandwidth 200GB/s
Interconnect PCIe Gen4 x8
Form factor 1-slot, Low-Profile PCIe
Max thermal design power (TDP) 40-60W (Configurable)
vGPU software support² NVIDIA Virtual PC (vPC), NVIDIA Virtual Applications (vApps), NVIDIA RTX Virtual Workstation (vWS), NVIDIA AI Enterprise, NVIDIA Virtual Compute Server (vCS)
¹ With sparsity ² Supported in future vGPU release

Speak with an expert to learn more.