Key Features

Powerful Data Center for GPU for Visual Computing

The NVIDIA A40 accelerates the most demanding visual computing workloads from the data center, combining the latest NVIDIA Ampere architecture RT Cores, Tensor Cores, and CUDA® Cores with 48 GB of graphics memory. From powerful virtual workstations accessible from anywhere to dedicated render nodes, NVIDIA A40 brings next generation NVIDIA RTX™ technology to the data center for the most advanced professional visualization workloads.

Categories

Powered by NVIDIA Ampere Architecture

NVIDIA Ampere Architecture-Based CUDA Cores

Accelerate graphics workflows with the latest CUDA® cores for up to 2.5X single-precision floating-point (FP32) performance compared to the previous generation.

Second-Generation RT Cores

Produce more visually accurate renders faster with hardware-accelerated motion blur and up to 2X faster ray-tracing performance than the previous generation.

Third-Generation Tensor Cores

Boost AI and data science model training with up to 10X faster training performance compared to the previous generation with hardware-support for structural sparsity.

Virtualization-Ready

Repurpose your personal workstation into multiple high-performance virtual workstations with support for NVIDIA RTX Virtual Workstation (vWS) software.

Third-Generation NVIDIA NVLink

Scale memory and performance across multiple GPUs with NVIDIA® NVLink™ to tackle larger datasets, models, and scenes.

PCI Express Gen 4

Improve data-transfer speeds from CPU memory for data-intensive tasks with support for PCI Express Gen 4.

Power Efficiency

Leverage a dual-slot, power efficient design that’s 2.5X more power efficient than the previous generation and crafted to fit a wide range of workstations.

GPU Architecture NVIDIA Ampere architecture
GPU Memory 48GB GDDR6 with ECC
Memory bandwidth 696 GB/s
Interconnect Interface NVIDIA® NVLink® 112.5 GB/s (bidirectional) PCIe Gen4: 64GB/s
NVIDIA Ampere architecture based CUDA Cores 10,752
NVIDIA second-generation RT Cores 84
NVIDIA third-generation Tensor Cores 336
Peak FP32 TFLOPS (non-Tensor) 37.4
Peak FP16 Tensor TFLOPS with FP16 Accumulate 149.7 | 299.4*
Peak TF32 Tensor TFLOPS 74.8 | 149.6*
RT Core performance TFLOPS 73.1
Peak BF16 Tensor TFLOPS with FP32 Accumulate 149.7 | 299.4*
Peak INT8 Tensor TOPS 299.3 | 598.6*
Peak INT 4 Tensor TOPS 598.7 | 1,197.4*
Form factor 4.4″ (H) x 10.5″ (L) dual slot
Display ports 3x DisplayPort 1.4**; Supports NVIDIA Mosaic and Quadro® Sync
Max power consumption 300 W
Power connector 8-pin CPU
Thermal solution Passive
Virtual GPU (vGPU) software support Passive
vGPU profiles supported See the Virtual GPU Licensing Guide
NVENC | NVDEC 1x | 2x (includes AV1 decode)
Secure and measured boot with hardware root of trust Yes (optional)
NEBS ready Level 3
Compute APIs CUDA, DirectCompute, OpenCL™, OpenACC®
Graphics APIs DirectX 12.07, Shader Model 5.17, OpenGL 4.68, Vulkan 1.18
MIG support No
* Structural sparsity enabled
** A40 is configured for virtualization by default with physical display connectors disabled. The display outputs can be enabled via management software tools.

Speak with an expert to learn more.