Ampere Computing is a commercial data center SoC company that provides high performance and efficient computing to the cloud data center market. Over the past few years, I have written many articles about the company, following their progress in creating a new cloud-native processor category.

I believe that the move to a cloud-native software approach requires a new computing engine that delivers sustained performance and doesn’t burden you with legacy software-focused transistors. Ampere is at the forefront of meeting this demand by developing more efficient processors that deliver performance across all segments of the cloud data center.

Ampere recently announced a new family of Arm-compatible server processors called ‘AmpereOne’. AmpereOne features a custom CPU core developed in-house. Scale up to 192 cores. This is the largest number of cores currently available in the industry. This article explores Ampere’s new milestone. This is unique and definitely disruptive for all data center CPU and SoC players.

Ampere brings a new twist to 2nd generation design

Ampere has established itself with leading cloud service providers such as Azure, Google Cloud, Oracle Cloud, Alibaba, Baidu, Hewlett Packard Enterprise, SuperMicro, and other manufacturers.

That said, in the enterprise space, no one is willing to buy a version of something, and the more risk-averse people usually wait until the second version of the product is released (if not after that). Wait for it and then adopt it. Now, the 2nd generation design embedded in AmpereOne makes it easier for enterprises to use, especially since it delivers its very compelling value proposition (largest VMs and containers per rack with low power and highest efficiency). You will be able to

AmpereOne picks up where Ampere Altra left off

Ampere’s first two processors introduced in 2020 were built using cores licensed from Arm Ltd., namely the 80-core Altra and 128-core Altra Max, which use a 7-nanometer manufacturing process. rice field.Altra and altra Maximum processor families range from 32 to 128 cores. The new AmpereOne family expands the portfolio with 136-192 single-threaded cores and more IO, memory, performance and cloud capabilities, with no duplication between product families.

We anticipate use cases for AmpereOne will include AI inference, web servers, databases, caching services, media encoding, and video streaming. A key advantage of this CPU is that it scales almost linearly with application workloads.

The AmpereOne family of chips is different from the previous Altra family. Ampere custom designed the core to tailor the product to the needs of hyperscalers. The CPU is built using the latest 5nm manufacturing process. The custom-designed CPU uses the Arm Instruction Set Architecture (ISA) to ensure compatibility with applications developed on Altra processors.

Inside the new AmpereOne family

As mentioned earlier, AmpereOne is available with 136 to 192 single-threaded cores. It also features 8 channels of DDR5 memory, 128 lanes of PCIe Gen5 IO, and 2MB of private cache per core (twice that of Altra).

AmpereOne also has all the features found in the Altra family. In addition to cloud-optimized single-threaded cores, it features 2×228-bit vector units per core for greater efficiency when manipulating large amounts of data, and supports FP16 and Int16. for memory efficiency.

Custom-designing the core allowed Ampere to implement some other cool features. The CPU supports bfloat16, useful for AI and deep learning applications where large neural networks are trained and deployed. It has attracted attention due to its ability to improve memory efficiency while maintaining reasonable numerical accuracy for gradient computation and weight updates, which is important for training deep learning models.

performance consistency

Having a large number of cores is one thing, and having the right architecture to power those cores is one thing. Ampere implemented mesh congestion management to handle 192 cores. Mesh congestion management is a type of intelligent traffic management that uses techniques such as adaptive routing, virtual channels, and load balancing to avoid bottlenecks. This optimizes the performance and efficiency of the inter-core mesh interconnection network and minimizes the impact of congestion on overall system performance.

Memory and Service Level Classification (SLC) quality of service enforcement are mechanisms implemented to prevent single users from unfairly sharing memory bandwidth or SLC capacity, ensuring consistent performance for all users. gender is provided. Nested virtualization, on the other hand, extends virtualization capabilities to allow a VM to act as a virtual host, within which nested VMs can be created and run, allowing cloud providers to provide additional services to their users. is a way to provide Nested virtualization enables scenarios such as running a hypervisor inside a VM that can host additional VMs.

scalable management

Granular power management gives Ampere users fine-grained OS-based control and visibility into what is happening with the processor from a power perspective. Advanced droop detection is a mechanism used to monitor and detect changes in power supply voltage levels, especially droops and brownouts, to ensure stable and reliable processor operation. A voltage “sag” can occur when the supply voltage to the CPU is temporarily reduced due to high current demand, sudden load changes, or power supply throttling.

Over time, all processors can age and affect performance. To address this, Process Aging allows Ampere’s customers to monitor how their processors age under conditions of high utilization and low idle time. This ensures that reliability goals are met and end users do not experience the effects of processor aging.

New security features

Ampere has introduced several security measures into its new generation of chips. Secure virtualization provides security across multi-tenant environments and supports single-key memory encryption for deploying machines in untrusted locations where physical access is risky.

Memory tagging is a long-requested feature by customers who have no analogue in the x86 space. This security and data integrity feature protects against buffer overflow attacks and enhances data integrity for applications such as large databases that can disrupt memory over time. Memory tagging ensures reliability when accessing memory by associating tags or labels with memory regions or individual memory addresses. These tags track and enforce memory access permissions, detect unauthorized access, and mitigate the impact of certain security vulnerabilities.

Performance metrics people can relate to

Ampere considered two performance benchmarks to measure the relative performance of AmpereOne. The first is the number of virtual machines per rack.

AmpereOne’s 192 cores delivered 7,296 VMs per rack. This is 2.9 times the 2,496 VMs per rack of the 96-core Genoa AMD EPYC 9654. AmpereOne also offers 1680 VMs per rack, 4.3x more VMs per rack than the 60-core Intel Xeon 8480+ Sapphire Rapids.

In the second performance benchmark, Ampere used two different types of AI inference workloads to improve AmpereOne’s performance. AmpereOne’s performance per rack for generated AI Stable Diffusion workload compared to Genoa AMD EPYC 9654. AmpereOne delivered 2.3x more frames or images per second compared to Genoa.

The second AI workload was an AI recommender, specifically a deep learning recommendation model (DLRM), a machine learning model designed to provide personalized recommendations to users. DLRM is a deep neural network architecture specialized for recommendation tasks such as suggesting products, movies, and content based on user preferences and past behavior. This use case involves a large amount of data and is very sensitive to latency. At the rack level, AmpereOne delivered nearly double the number of recommendations per second compared to Genoa. You might be wondering why Intel Sapphire Rapids wasn’t included in the comparison, but the performance difference was even more pronounced.


Cloud providers and enterprises will always need to purchase x86 processors from Intel or AMD for x86 applications that cannot be ported to Arm or RISC-V architectures.

Over a decade later, there is now a lot of cloud-optimized code that is Arm-optimized, and AWS has guaranteed this with Graviton. And that means Arm is starting to gain momentum in the data center market, currently dominated by x86-based processors from Intel and AMD.

Arm General Purpose Data Center Merchant Silicon Lead Now Led by Ampere, Addition of AmpereOne Family Addresses Nearly Any Cloud-Native Computing Need, From Lowest-Power, Most Constrained Applications to Large-Scale Deployments To do.

The simple message is increased cores, IO, memory, performance and cloud capabilities. This is a message that resonates in many ways.

follow me twitter Or LinkedIn. check out You can find my website and other works here.

Source link

Next Post

Leave a Reply

Your email address will not be published. Required fields are marked *