Numenta Achieves 123x Inference Performance Improvement for BERT Transformers on 4th Gen Intel Xeon Scalable Processors | 2023 #1

Wed, Feb 01, 2023
•
« Back to Newsletters

In this special edition of the Numenta Newsletter, I’m pleased to share details on Numenta’s recent Intel announcement where we demonstrated groundbreaking results on the new 4th Gen Xeon Scalable Processors.

Numenta Achieves 123x Inference Performance Improvement for BERT Transformers on 4th Gen Intel Xeon Scalable Processors

Numenta’s 2023 started with a major announcement as part of Intel’s 4^th Generation Xeon launch (codenamed Sapphire Rapids). As we shared in our January 10 press release, in collaboration with Intel, we achieved groundbreaking performance gains for BERT-Large Transformers on two new Intel Xeon Scalable Processors. These performance results enable transformative possibilities for many NLP and real-time AI applications.

Delivering two orders of magnitude throughput speedup for BERT-Large Transformers

Leveraging Intel’s new Advanced Matrix Extensions (Intel AMX), we showed 123X throughput improvement vs. current generation AMD Milan CPU implementations for BERT inference on short text sequences. Numenta’s neuroscience-based technology turns out to be a perfect fit for AMX instructions, which are designed for AI workloads. In fact, as shown in the chart above, our implementation is 19X faster than Intel’s own AMX BERT-Large implementation.

Smashing latency barriers

In addition to throughput acceleration, our solution enabled each query to be processed in under 10ms, an often-touted threshold for real-time applications like Conversational AI. In the chart above, Numenta was the only solution that could meet this threshold, whereas the other processors, which used standard BERT models, could not. Our results elevate CPUs for real-time inference applications, making them an ideal alternative to the costly and complex GPUs that have long dominated the space.

Accelerating high volume document processing

We also demonstrated dramatic acceleration on longer text sequences, which are necessary for NLP applications that analyze large collections of documents and need to understand their full context. Typically, longer sequences run into bandwidth limitations, but on the new Intel Xeon CPU Max Series, our optimized BERT-Large model achieved 20x throughput speed-up for long sequence lengths of 512.

“Numenta and Intel are collaborating to deliver substantial performance gains to Numenta’s AI solutions through the Intel Xeon CPU Max Series and 4th Gen Intel Xeon Scalable processors. We’re excited to work together to unlock significant throughput performance accelerations for previously bandwidth-bound or latency-bound AI applications such as Conversational AI and large document processing,”— Scott Clark, vice president and general manager of AI and HPC Application Level Engineering, Intel (From Numenta press release on Jan 10, 2023)

We’re excited to demonstrate these initial examples of how we’re applying our brain-based AI technology to deep learning networks, and we look forward to working with Intel to uncover even more opportunities.

Interested in getting results like these? Apply to our Private Beta Program

While our products are in beta, we are engaging with a handful of customers to create high-performance, cost-effective deep learning networks. If your company is looking to accelerate throughput, improve accuracy, reduce latency, or optimize price-performance of deep learning models in NLP and Computer Vision applications, we want to hear from you. Get early access to Numenta’s AI technology today.

Apply to our Private Beta Program >>

Learn more

There are several additional resources on Numenta and Intel’s websites if you’d like to learn more about our results:

Press Release: Numenta Achieves 123x Inference Performance Improvement for BERT Transformers on Intel Xeon Processor Family
Blog: A New Performance Standard for BERT Transformers with Numenta + Intel
Case Study: Numenta + Intel achieve 123x inference performance improvement for BERT Transformers
Intel Developer page: Intel AI Platform Overview
Xeon Series Product Brief: Intel® Xeon® CPU Max Series Product Brief

Thank you for your continued interest in Numenta. Follow us on LinkedIn to make sure you don’t miss any updates.

Christy Maver
VP of Marketing

Authors

Christy Maver • VP of Marketing

Numenta Achieves 123x Inference Performance Improvement for BERT Transformers on 4th Gen Intel Xeon Scalable Processors | 2023 #1