With our neuroscience-based optimization techniques, we shift the model accuracy scaling laws such that at a fixed cost, or a given performance level, our models achieve higher accuracies than their standard counterparts.
Salad is a global cloud platform that harnesses latent compute resources from idle, high-end consumer hardware to power and distribute computing applications more affordably than traditional data centers. Salad and Numenta partnered to increase price performance on AI inference.
Scaling up deep learning models without breaking the bank
Deploying deep learning systems today can be costly and complex. Models are becoming larger, as we see with large language models (LLMs) moving from millions to billions of parameters. Additionally, the reliance on highly available processing resources leads many people to deploy their networks to the public cloud, which leads to restrictive technical requirements, expensive model development resources, and increasing cloud spend. New methods are needed to optimize and scale these models on specialized hardware.
Deploying Numenta’s AI Inference Server on Salad Container Engine
Using hardware-aware optimizations and neuroscience-based acceleration techniques, we created an optimized BERT-Base model and deployed it on the Salad Container Engine (SCE), a fully managed orchestration platform built to facilitate container deployments on Salad’s distributed cloud. To assess the price-performance benefits of Numenta technology on SCE, we benchmarked our optimized BERT-Base model against a standard BERT-Base on four different Amazon Web Services (AWS) configurations and SCE.
10x more inferences per dollar
Our optimized BERT-Base model delivered more inference throughput than a standard BERT-Base model on each AWS instance. When deployed on Salad’s infrastructure, we achieved a 10x price performance improvement over a standard BERT-base model running on AWS.
Cost savings and performance speed-ups that enable AI deployment at scale
The combination of Numenta + SCE allows users interested in deploying deep learning models to benefit from the best of both worlds: performance improvements from Numenta’s optimized models and more affordable on-demand cloud service provider pricing from Salad.
- Do more with your existing budget by running 10x more inferences per dollar
- Run existing workloads at 10% of your current cost
- Enable new users to run deep learning models who couldn’t previously afford to run them
Interested in working with us?
Related Case Studies
Numenta technologies running on the Intel 4th Gen Xeon Max Series CPU enables unparalleled performance speedups for longer sequence length tasks.
Numenta technologies combined with the new Advanced Matrix Extensions (Intel AMX) in the 4th Gen Intel Xeon Scalable processors yield breakthrough results.