NVIDIA H100 Tensor Core GPU

NVIDIA

NVIDIA H100 Tensor Core GPU

Key Features

NVIDIA Hopper™ Architecture: Delivers exceptional performance, scalability, and security for diverse workloads.

Transformer Engine: Optimized for training and inference of large language models, supporting models with trillions of parameters.

NVIDIA NVLink® Switch System: Connects up to 256 H100 GPUs to accelerate exascale workloads.

Multi-Instance GPU (MIG) Technology: Partitions the GPU into up to seven isolated instances for versatile workload management.

High Bandwidth Memory (HBM3): Provides up to 80 GB of memory with 3 TB/s bandwidth for handling large datasets.

Take an Order-of-Magnitude Leap for Accelerated Computing

The NVIDIA H100 Tensor Core GPU delivers unprecedented performance, scalability, and security for every workload. With NVIDIA® NVLink® Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads, while the dedicated. Transformer Engine supports trillion-parameter language models. H100 uses breakthrough innovations in the NVIDIA Hopper™ architecture to deliver industry- leading conversational AI, speeding up large language models by 30X over the previous generation.

Ready for Enterprise AI?

NVIDIA H100 GPUs for mainstream servers come with a five-year software subscription, including enterprise support, to the NVIDIA AI Enterprise software suite, simplifying AI adoption with the highest performance. This ensures organizations have access to the AI frameworks and tools they need to build H100- accelerated AI workflows such as AI chatbots, recommendation engines, vision AI, and more. Access the NVIDIA AI Enterprise software subscription and related support benefits for the NVIDIA H100.

Securely Accelerate Workloads From Enterprise to Exascale

NVIDIA H100 GPUs feature fourth-generation Tensor Cores and the Transformer Engine with FP8 precision, further extending NVIDIA’s market-leading AI leadership with up to 4X faster training and an incredible 30X inference speedup on large language models. For high-performance computing (HPC) applications, H100 triples the floating-point operations per second (FLOPS) of FP64 and adds dynamic programming (DPX) instructions to deliver up to 7X higher performance. With second-generation Multi-Instance GPU (MIG), built-in NVIDIA confidential computing, and NVIDIA NVLink Switch System, H100 securely accelerates all workloads for every data center from enterprise to exascale.

NVIDIA Hopper Architecture

The engine for the world’s AI infrastructure makes an order-of-magnitude performance leap.

The Accelerated Computing Platform for Next-Generation Workloads

Learn about the next massive leap in accelerated computing with the NVIDIA Hopper™ architecture. Hopper securely scales diverse workloads in every data center, from small enterprise to exascale high-performance computing (HPC) and trillion-parameter AI—so brilliant innovators can fulfill their life’s work at the fastest pace in human history.

Explore the Technology Breakthroughs

Built with over 80 billion transistors using a cutting edge TSMC 4N process, Hopper features five groundbreaking innovations that fuel the NVIDIA H200 and H100 Tensor Core GPUs and combine to deliver incredible speedups over the prior generation on generative AI training and inference.

Transformer Engine

The NVIDIA Hopper architecture advances Tensor Core technology with the Transformer Engine, designed to accelerate the training of AI models. Hopper Tensor Cores have the capability to apply mixed FP8 and FP16 precisions to dramatically accelerate AI calculations for transformers. Hopper also triples the floating-point operations per second (FLOPS) for TF32, FP64, FP16, and INT8 precisions over the prior generation. Combined with Transformer Engine and fourth-generation NVIDIA^® NVLink^®, Hopper Tensor Cores power an order-of-magnitude speedup on HPC and AI workloads.

Learn More About Hopper Transformer Engine >

NVLink, NVSwitch, and NVLink Switch System

To move at the speed of business, exascale HPC and trillion-parameter AI models need high-speed, seamless communication between every GPU in a server cluster to accelerate at scale.

Fourth-generation NVLink can scale multi-GPU input and output (IO) with NVIDIA DGX™ and HGX™ servers at 900 gigabytes per second (GB/s) bidirectional per GPU, over 7X the bandwidth of PCIe Gen5.

Third-generation NVIDIA NVSwitch™ supports Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ in-network computing, previously only available on Infiniband, and provides a 2X increase in all-reduce throughput within eight H200 or H100 GPU servers compared to the previous-generation A100 Tensor Core GPU systems.

DGX GH200 systems with NVLink Switch System support clusters of up to 256 connected H200s and deliver 57.6 terabytes per second (TB/s) of all-to-all bandwidth.

Learn More About NVLink and NVSwitch >

NVIDIA Confidential Computing

While data is encrypted at rest in storage and in transit across the network, it’s unprotected while it’s being processed. NVIDIA Confidential Computing addresses this gap by protecting data and applications in use. The NVIDIA Hopper architecture introduces the world’s first accelerated computing platform with confidential computing capabilities.

With strong hardware-based security, users can run applications on-premises, in the cloud, or at the edge and be confident that unauthorized entities can’t view or modify the application code and data when it’s in use. This protects confidentiality and integrity of data and applications while accessing the unprecedented acceleration of H100 GPUs for AI training, AI inference, and HPC workloads.

Learn More About NVIDIA Confidential Computing >

Second-Generation MIG

With Multi-Instance GPU (MIG), a GPU can be partitioned into several smaller, fully isolated instances with their own memory, cache, and compute cores. The Hopper architecture further enhances MIG by supporting multi-tenant, multi-user configurations in virtualized environments across up to seven GPU instances, securely isolating each instance with confidential computing at the hardware and hypervisor level. Dedicated video decoders for each MIG instance deliver secure, high-throughput intelligent video analytics (IVA) on shared infrastructure. And with Hopper’s concurrent MIG profiling, administrators can monitor right-sized GPU acceleration and optimize resource allocation for users.

For researchers with smaller workloads, rather than renting a full CSP instance, they can elect to use MIG to securely isolate a portion of a GPU while being assured that their data is secure at rest, in transit, and at compute.

Learn More About MIG >

DPX Instructions

Dynamic programming is an algorithmic technique for solving a complex recursive problem by breaking it down into simpler subproblems. By storing the results of subproblems so that you don’t have to recompute them later, it reduces the time and complexity of exponential problem solving. Dynamic programming is commonly used in a broad range of use cases. For example, Floyd-Warshall is a route optimization algorithm that can be used to map the shortest routes for shipping and delivery fleets. The Smith-Waterman algorithm is used for DNA sequence alignment and protein folding applications.

Hopper’s DPX instructions accelerate dynamic programming algorithms by 40X compared to traditional dual-socket CPU-only servers and by 7X compared to NVIDIA Ampere architecture GPUs. This leads to dramatically faster times in disease diagnosis, routing optimizations, and even graph analytics.

Learn More About DPX Instructions >

Preliminary specifications, may be subject to change
DPX instructions comparison HGX H100 4-GPU vs dual socket 32 core IceLake

Take a deep dive into the NVIDIA Hopper Architecture

Speak with an expert to learn more.

	H100 PCIe	H100 NVL
FP64	26 teraFLOPS	68 teraFLOPS
FP64 Tensor Core	51 teraFLOPS	134 teraFLOPS
FP32	51 teraFLOPS	134 teraFLOPS
TF32 Tensor Core	756 teraFLOPS	1,979 teraFLOPS
BFLOAT16 Tensor Core	1,513 teraFLOPS	3,958 teraFLOPS
FP16 Tensor Core	1,513 teraFLOPS	3,958 teraFLOPS
FP8 Tensor Core	3,026 teraFLOPS	7,916 teraFLOPS
INT8 Tensor Core	3,026 TOPS	7,916 TOPS
GPU Memory	80GB	80GB
Memory bandwidth	2TB/s	7.8TB/s
Decoders	7 NVDEC; 7 JPEG	7 NVDEC; 7 JPEG
Max thermal design power (TDP)	300-350W (configurable)	2x 350-400W (configurable)
Multi-instance GPUs	Up to 7 MIGs @ 10GB each	Up to 14 MIGs @ 12GB each
Form factor	PCIe > dual-slot > air-cooled	2x PCIe > dual-slot > air-cooled
Interconnect	NVLink: > 600GB/s PCIe > Gen5: 128GB/s	NVLink: > 600GB/s PCIe > Gen5: 128GB/s
Server options	Partner and NVIDIA-Certified Systems with 1–8 GPUs	Partner and NVIDIA-Certified Systems with 2-4 pairs
NVIDIA Enterprise	Included	Included