The annual Hot Chips event is taking place this week at Stanford University. Hot Chips is the leading gathering of semiconductor design professionals and is about as technical as conferences get. Not only is this conference a place where some of the brightest minds in chips come together, but it is also an opportunity for the big chip players to announce new products or tease out ideas of what the future may look like.

This year is no different, and Arm Holdings has been front and center with significant news about its Neoverse processor technology. Coming just a week after Arm’s IPO filing, at Hot Chips the company announced its Neoverse Compute Subsystems (CSS) and Neoverse V2 platform. I’ll detail these announcements below and provide some insight as to what they mean for the market.

A Neoverse refresher

For those who aren’t familiar, Neoverse is Arm’s family of processor technology that addresses the datacenter market, primarily focusing on cloud computing. Neoverse has three distinct platforms that target different areas of need. First, the Neoverse V-Series Platform targets demanding workloads in areas such as high-performance computing (HPC) and machine learning.

Second, the Neoverse N-Series is a platform built for the mainstream needs of cloud datacenters. When considering the general-purpose workloads that enterprise customers routinely migrate to the cloud, the N-Series is what cloud providers like AWS, Azure and Oracle deploy to stand up cloud instances in their respective data centers

Note that Neoverse is not a chip, but rather a design that chip makers or others can use to build their own CPUs. In the case of N2 (the company’s second generation reference design of the N-Series), we see two contrasting approaches: AWS designed its own chip (called Graviton), while CPU company Ampere designed the Arm-based Altra, which is used by cloud providers and other hyperscalers.

Finally, at the bottom end of the range, the Neoverse E-Series is the platform focused on efficient throughput for networking. When you think about lower-powered networking or 5G equipment running on Arm, the E-Series is the platform being deployed.

Each platform has had success in its respective market. The V-Series powers many HPC clusters and is on the Top500 Supercomputing list. Additionally, Nvidia’s Grace CPU is designed on the V-Series architecture. The N-Series may be the most well-known, as it is widely deployed across all major U.S. cloud providers. In fact, in Arm’s recent F-1 filing, the company estimates that the company now commands 10% of the cloud server CPU market globally.

Neoverse Compute Subsystems: custom silicon, faster by design

The pace of tech innovation has increased dramatically, and one key place we see this is in the new workloads being deployed across the datacenter. Unfortunately, these workloads often run on systems that are not optimized to handle their unique computing requirements.

The traditional answer to this has been to wait a few years while a chip manufacturer develops a specialized CPU. Alternately, you could overpay for a combination of hardware and software that kind of solves the problem.

Enter Arm’s Neoverse Compute Subsystems (CSS), a pre-integrated, pre-validated N-2 platform that enables partners to extend with customization around memory, I/O, acceleration and other areas. Arm positions CSS as enabling partners to get to market faster with customized silicon. In other words, faster time to market at a lower cost of development.

Contrast this with the approach required to get the most out of processors from a manufacturer like Intel or AMD. The vendor’s embedded engineering organization would take a standard product (e.g., an EPYC server processor from AMD) and tailor it for a specific customer with the full support of the product engineering team. By opening up CSS to partners, Arm is enabling a much faster and cheaper path to value.

There’s real potential for Arm with this move, and not only because of the expanded business opportunities. CSS can also position Arm’s architecture as a first mover in many emerging, high-growth markets. And the estimated 80 engineering years saved highlighted in the diagram above? That doesn’t just mean shortening time to market; it also means significant cost savings in the whole process of developing tailored silicon.

I can also foresee hyperscalers utilizing CSS to develop in-house silicon to perform specialized functions. This is a common practice for virtually every major cloud provider. Designing on CSS can enable these providers to deliver deeper levels of differentiation faster.

Neoverse V2 Platform: performance lifts for the cloud, HPC and ML

The other news coming out of Arm is tied to its higher-performing V-Series platform. As mentioned earlier, Arm has found success appealing to the higher end of the server market, powering workloads with higher performance needs and scaling into HPC and AI/ML.

The company’s V-Series has done quite well with high-performing workloads that span HPC vertical segments thanks to its fast core performance combined with architectural advancements such as the scalable vector engine (SVE) and CMN-700, a high-speed interconnect that enables connecting memory, storage and workload accelerators over a highly scalable mesh.

V2 is the company’s next-generation V-Series product. Announced last year, V2 is coming to market with the strongest endorsement Arm could hope for: Nvidia’s Grace supercomputing chip is designed on the V2 architecture.

The performance numbers reported for V2 show a marked increase over V1 across all the workloads one would care about.

  • SPEC CPU and SPECRate (speed and throughput) show a 13% and 17% rate increase, respectively.
  • Testing on a popular distributed memory caching system, MemCacheD, shows a performance increase of up to 15%.
  • Web server NGINX sees an up to 32% increase (reverse proxy, secure) on V2.
  • The Percona distribution for MySQL sees an up to 104% increase in performance (measured as transactions per second) due to V2 improvements in branch prediction, fetch and hardware prefetching.
  • Finally, ML testing using XGBoost sees on average about a doubling of performance on V2 relative to V1.

These numbers comparing V2 to V1 are impressive, I’m more interested in how V2 compares against Arm’s biggest competitors. Luckily, Nvidia did exactly this by comparing the performance of its Grace CPU to Intel’s Sapphire Rapids and AMD’s Genoa CPUs, as shown in the following charts.

Before digging into the numbers, its important to note that Grace utilizes V2 supported by an Nvidia-designed coherency fabric (called LPDDR5X). That said, there are two ways to look at the performance. The first is at the individual server level, as noted in the chart on the left. In that comparison, you can see that the Grace CPU performs competitively with Genoa, and both show clear advantages over Sapphire Rapids. The one area where the Nvidia chip really shines is graph analytics.

The second way to look at performance is through the lens of a real-world power budget to find out how much work servers running on the different processors can do. In that case, Grace blows away the competition, as shown in the chart on the right. In a datacenter with a power budget of 5MW, you can see Grace nearly doubling the performance of the competition across every measure, with graph analytics again outperforming by the most significant margin.

This performance delta is due to the power efficiency of the Grace CPU (and, by extension, the V2 design). The point is that Arm squeezed considerable performance per watt out of its V2 design. This is a big benefit for customers with sustainability goals, plus it is significant simply from an economic perspective. Companies deploying on Intel or AMD chips would have to pay considerably more in terms of power usage to achieve the same performance levels as the Arm-based CPUs.

My take

Arm’s progress in the cloud datacenter— from virtually zero to 10% market share in about four years—has been impressive. This is especially notable considering some of the early missteps of the Arm ecosystem back in the early 2010’s. Does anybody remember the Arm-based processors developed by Calxeda, Cavium, Applied Micro and even AMD? (I worked on AMD’s “Seattle” project and have the emotional scars to show for it.)

Likewise, Arm’s V-Series platform has found a strong position in the high-performance space. And again, the company’s ascent has been impressive as it initially had to overcome the challenge of being perceived as “the smartphone chip company.” This perception has disappeared from the discussion, to the point that it’s no longer even in the rearview mirror.

CSS’s adoption is going to be fun to watch in the marketplace. There is so much potential for this platform and so many possible opportunities. How many killer apps will be built using CSS, and which market segments using it will accelerate the quickest?

V2 is already a huge success—case closed. Nvidia’s Grace CPU and Grace Hopper superchip have solidified Arm’s position in the marketplace before it’s even generally available. And I believe this is the first of many successes to come for the platform.

Will the server market launch more Arm-based servers to address the growing needs of its customers? HPE and Lenovo have each announced Arm-based server platforms. Is there a Dell server being designed, too?

What about a non-Nvidia Grace alternative? Given the performance-per-watt advantage Nvidia demonstrated over x86 CPUs, is another chip company looking to build a supercomputing platform?

Finally, what is Arm’s market share potential? Clearly, 10% is outstanding. How much more of the pie can the company take thanks to CSS and V2? There’s a lot of opportunity. The next few quarters should start to tell the story.

Source link

Next Post

Leave a Reply

Your email address will not be published. Required fields are marked *