Nvidia Offers Enterprises DGX SuperPOD Subscriptions and DPU Servers

Vivek, July 14, 2021

345 0

Nvidia Offers Enterprises DGX SuperPOD Subscriptions and DPU Servers

Nvidia is stepping up its enterprise sales efforts with a new DGX SuperPOD and DPU Servers subscription-based solution to aid in the acceleration of commercial AI deployments, as well as a new batch of certified GPU servers featuring the company’s BlueField DPUs and Arm-based CPUs.

The Santa Clara, Calif.-based company announced at Computex 2021 the Nvidia Base Command Platform, a new cloud-hosted development hub offered in collaboration with storage vendor NetApp. The platform will include access to Nvidia’s DGX SuperPOD AI supercomputers and NetApp’s data management solution, all hosted in an Equinix data centre, for a monthly subscription fee starting at $90,000.

According to Manuvir Das, Nvidia’s head of enterprise computing, Base Command transforms Nvidia’s DGX AI systems into a “internal shareable environment” that enables multiple researchers and data scientists to work on AI projects concurrently using the same GPU resources. This will simplify the management of AI workloads while increasing customer access to high-performance GPU compute, he added.

“What we’re doing here is significantly lowering the entry barrier to experiencing this best-in-class system and equipment,” he explained during a pre-briefing with journalists and analysts.

Base Command software will provide access to a broad range of AI and data science tools, including the Nvidia NGC software catalogue, and will appear as a single pane of glass, making resource sharing simple via a graphical user interface and command line APIs. Additionally, the software will include dashboards for monitoring and reporting.

Das explained that Base Command is intended for customers who do not have their own DGX SuperPOD, which can cost between $7 million and $60 million depending on the size of the deployment, and he anticipates that those customers will eventually purchase their own SuperPOD or migrate to the cloud.

Base Command’s ability to operate in a hybrid cloud environment is a critical feature. As a result, Nvidia stated that Google Cloud intends to add Base Command support to its marketplace later this year, and Base Customers will also be able to deploy their workloads to Amazon Web Services’ SageMaker service.

“With flexible access to multiple Nvidia A100 Tensor Core GPUs, this hybrid AI offering enables enterprises that leverage on-demand accelerated computing to accelerate AI development,” Manish Sainani, director of product management for machine learning infrastructure at Google Cloud, said in a statement.

Das explained that adding a subscription pricing model for Nvidia’s high-end DGX SuperPOD clusters is about providing enterprise customers with consumption options that are familiar and comfortable.

“What we’re saying is that we now view ourselves as a mainstream provider of hardware and software to enterprise customers, and as such, we’re willing to embrace any of these models,” he explained.

Das added that Nvidia’s channel partners will sell Base Command.

“We have a strong belief in the channel and have worked extremely well with it in the past for the offerings we have,” he explained. “You can be certain that the same philosophy will be applied to Base Command as well. It is simply another product in our portfolio.”

According to the company, Base Command is currently in early access and the subscription programme will launch in the summer. Customers will be able to begin with three DGX SuperPOD deployments and scale up to a total of twenty nodes.

New Nvidia-Certified Systems Coming With DPUs, Arm CPUs

As part of Nvidia’s enterprise push, the company said it is expanding its Nvidia-Certified Systems program to include servers from OEMs that will incorporate its BlueField data processing units and eventually Arm-based CPUs.

New Nvidia-certified servers with BlueField-2 DPUs are expected to arrive from ASUS, Dell Technologies, Gigabyte, QCT and Supermicro later this year, adding to the more than 50 GPU servers from OEMs that have already been certified to run the Nvidia AI Enterprise software suite for AI and data analytics workloads as well as Nvidia Omniverse Enterprise for design collaboration and simulation.

Nvidia introduced the BlueField-2 DPU last year as a component that can replace a standard network interface card and offload critical networking, storage and security workloads from the CPU while enabling new security and hypervisor capabilities in data centers.

The company said a single BlueField-2 DPU “can provide the same data center services that could require up to 125 CPU cores, freeing up server CPU cycles to run a broad range of business-critical applications.” The DPUs are supported by Red Hat, VMware and other software infrastructure vendors.

In a recent interview with CRN, Das said he expects BlueField-2’s security features, which includes real-time network visibility, detection and response capabilities, will compel enterprises to refresh their servers with DPUs installed.

“It’s actually the NIC on the server, where all the packets are flowing through anyway, so it’s very natural and efficient to inspect the packet right there while you’re already processing it,” he said.

But Das said he also expects the total cost of ownership benefits the DPU’s CPU-offload capabilities will compel customers to adopt the component when they are getting ready for a server refresh.

“I think in that case, the entire [CPU] offloading capability will be what drives the choice of DPU in the refresh, because my servers can do 30 percent more work if I put a pretty cost-effective DPU in there, so, of course, that would be attractive,” he said.

Next year, the Nvidia-Certified Systems program will introduce another new component type to GPU servers — Arm-based CPUs — which will mark an important milestone for the company as it plans to embrace the alternative chip architecture and acquire Arm for $40 billion.

The first batch of Nvidia-certified servers with Arm CPUs will come from Gigabyte and Wiwynn, and they will use CPUs based on Arm’s Neoverse CPU designs.

To help developers take advantage of Arm CPUs, the company has collaborated with Gigabyte to introduce an Arm HPC Developer Kit, which includes hardware and software for high-performance computing, AI and scientific computing application development. The server kit will use Arm-based Altra CPUs from Ampere Computing, an up-and-coming server chipmaker, and it will feature two Nvidia A100 GPUs, two BlueField-2 DPUs and the Nvidia HPC software development kit.

Das said Nvidia is working with multiple Arm CPU vendors for future Nvidia-certified systems, and he expects OEM systems will eventually include the company’s own Arm-based data center CPU, Grace, after it has been introduced in Nvidia’s own DGX systems in 2023.

Connect The Dots