4 NVIDIA A40 | Taknet

NVIDIA

NVIDIA A40

Key Features

Powerful Data Center for GPU for Visual Computing

The NVIDIA A40 accelerates the most demanding visual computing workloads from the data center, combining the latest NVIDIA Ampere architecture RT Cores, Tensor Cores, and CUDA® Cores with 48 GB of graphics memory. From powerful virtual workstations accessible from anywhere to dedicated render nodes, NVIDIA A40 brings next generation NVIDIA RTX™ technology to the data center for the most advanced professional visualization workloads.

Powered by NVIDIA Ampere Architecture

NVIDIA Ampere Architecture-Based CUDA Cores

Accelerate graphics workflows with the latest CUDA® cores for up to 2.5X single-precision floating-point (FP32) performance compared to the previous generation.

Second-Generation RT Cores

Produce more visually accurate renders faster with hardware-accelerated motion blur and up to 2X faster ray-tracing performance than the previous generation.

Third-Generation Tensor Cores

Boost AI and data science model training with up to 10X faster training performance compared to the previous generation with hardware-support for structural sparsity.

Virtualization-Ready

Repurpose your personal workstation into multiple high-performance virtual workstations with support for NVIDIA RTX Virtual Workstation (vWS) software.

Third-Generation NVIDIA NVLink

Scale memory and performance across multiple GPUs with NVIDIA® NVLink™ to tackle larger datasets, models, and scenes.

PCI Express Gen 4

Improve data-transfer speeds from CPU memory for data-intensive tasks with support for PCI Express Gen 4.

Power Efficiency

Leverage a dual-slot, power efficient design that’s 2.5X more power efficient than the previous generation and crafted to fit a wide range of workstations.

GPU Architecture	NVIDIA Ampere architecture
GPU Memory	48GB GDDR6 with ECC
Memory bandwidth	696 GB/s
Interconnect Interface	NVIDIA® NVLink® 112.5 GB/s (bidirectional) PCIe Gen4: 64GB/s
NVIDIA Ampere architecture based CUDA Cores	10,752
NVIDIA second-generation RT Cores	84
NVIDIA third-generation Tensor Cores	336
Peak FP32 TFLOPS (non-Tensor)	37.4
Peak FP16 Tensor TFLOPS with FP16 Accumulate	149.7 \| 299.4*
Peak TF32 Tensor TFLOPS	74.8 \| 149.6*
RT Core performance TFLOPS	73.1
Peak BF16 Tensor TFLOPS with FP32 Accumulate	149.7 \| 299.4*
Peak INT8 Tensor TOPS	299.3 \| 598.6*
Peak INT 4 Tensor TOPS	598.7 \| 1,197.4*
Form factor	4.4″ (H) x 10.5″ (L) dual slot
Display ports	3x DisplayPort 1.4**; Supports NVIDIA Mosaic and Quadro® Sync
Max power consumption	300 W
Power connector	8-pin CPU
Thermal solution	Passive
Virtual GPU (vGPU) software support	Passive
vGPU profiles supported	See the Virtual GPU Licensing Guide
NVENC \| NVDEC	1x \| 2x (includes AV1 decode)
Secure and measured boot with hardware root of trust	Yes (optional)
NEBS ready	Level 3
Compute APIs	CUDA, DirectCompute, OpenCL™, OpenACC®
Graphics APIs	DirectX 12.07, Shader Model 5.17, OpenGL 4.68, Vulkan 1.18
MIG support	No

* Structural sparsity enabled

** A40 is configured for virtualization by default with physical display connectors disabled. The display outputs can be enabled via management software tools.

Speak with an expert to learn more.