Recently we have came up with a situation where a customer was in possession of a big bunch of NVIDIA H100 GPU’s and the requirement was to either combine them on a big cluster or as a farm to utilize them against massively large datasets; or slice them in a way that tiny little GPU jobs could be used.
Unfortunately it was not obvious from the beginning which one of those scenarios would be necessary, and most likely both of them would.
So I needed to come up with a solution which can cater both use cases, so here this theoretical article was created based on official NVIDIA Documentation and by other’s efforts before me.
First, a little look at the technology that will help us a lot:
The Multi-Instance GPU (MIG) feature is a critical innovation introduced by NVIDIA on its A100 GPUs and continued with the more advanced H100 GPUs. This feature allows a single GPU to be partitioned into multiple, smaller instances, each acting as a fully isolated GPU with its own dedicated compute, memory, and bandwidth resources. Here’s a breakdown of how the MIG feature works and why it’s essential, especially in high-performance computing environments.
What is MIG (Multi-Instance GPU)?
MIG enables a single physical GPU to be divided into several virtual instances. In the case of the NVIDIA H100 GPU, which is part of the Hopper architecture, it allows a single GPU to run up to seven different GPU instances concurrently. Each instance functions like an independent GPU, with its own dedicated memory and compute resources. The number of available instances can be adjusted depending on workload demands, offering great flexibility.
Why MIG on H100 is a Game-Changer
The H100 GPU is designed for extreme performance in AI, high-performance computing (HPC), and data analytics tasks. However, not every application or user requires the full power of an entire H100 GPU. With MIG, you can optimize the utilization of these powerful GPUs by running smaller tasks that don’t require the full GPU power, thereby maximizing efficiency.
This flexibility is particularly useful for shared data centers, cloud services, and enterprises that want to serve multiple clients or run a variety of workloads without dedicating an entire high-end GPU to just one task. It helps organizations save on hardware costs by reducing the need to deploy multiple physical GPUs.
Key Features of MIG on H100
- Resource Isolation: Each MIG instance operates independently with isolated compute, memory, and cache resources. This ensures that the performance of one instance doesn’t impact others, which is crucial for tasks that require reliability and predictability.
- Customizable Instance Sizes: You can create multiple MIG instances of varying sizes, depending on your computational needs. For example, you could have one large instance running a heavy AI training model and several smaller instances for inference tasks or lighter workloads.
- Efficiency and Performance: By enabling multiple users or applications to share a single GPU, the overall utilization rate of the hardware is greatly improved. This leads to significant cost savings and better energy efficiency.
- Workload Flexibility: MIG allows you to allocate resources dynamically, ensuring that different applications can share the same GPU with their own dedicated resources. This is highly beneficial for mixed workloads, such as simultaneous training and inference tasks.
Use Cases for MIG on H100
- Cloud Services: Cloud providers like AWS, Azure, and Google Cloud can benefit by using MIG to offer smaller GPU instances to customers who don’t need the full power of an H100. This makes cloud GPU services more accessible to a broader range of users, including startups and small businesses.
- Enterprise AI & Data Analytics: Organizations that have several teams working on different machine learning or data analytics tasks can use MIG to allocate smaller GPU instances to each team. This boosts efficiency and allows for more concurrent workloads.
- Multi-Tenant Environments: In environments where multiple users or applications share a single server, MIG ensures that each tenant has isolated and predictable performance. This is crucial for service providers who need to guarantee Quality of Service (QoS).
How MIG Works on NVIDIA H100
When you enable MIG on an H100 GPU, the system allows you to define different GPU partitions. For instance, a single H100 GPU with 80GB of memory can be split into several smaller instances, each with a fraction of the total memory and compute capability. The partitioning is done at the hardware level, ensuring true isolation between instances.
MIG allows up to seven instances on an H100, and each instance has dedicated hardware resources like:
- Compute units (SMs)
- Memory bandwidth
- L2 cache
- Memory capacity
This guarantees that even the smallest instances can deliver consistent and predictable performance.
Benefits of MIG on H100 GPUs
- Maximizing Utilization: By creating multiple instances from a single GPU, you can fully utilize its resources. Without MIG, many small tasks might not take advantage of a GPU’s full capacity.
- Improved ROI: MIG allows organizations to make better use of their GPU investments by running multiple workloads on the same hardware. This lowers the cost per workload.
- Predictable Performance: Each instance is fully isolated, ensuring that other tasks running on the same GPU won’t affect your workload. This predictability is key for applications like AI inference, where latency and consistency are crucial.
Next, we continue on multiple different strategies on either combining the power of multiple GPU’s to handle large AI training jobs, or slicing one H100 GPU into smaller units of power to create more cost efficient powerful GPU capacity….