With our neuroscience-based optimization techniques, we shift the model accuracy scaling laws such that at a fixed cost, or a given performance level, our models achieve higher accuracies than their standard counterparts.
Xilinx™, an AMD company, is a technology and semiconductor company and primary supplier of programmable logic devices. Known for inventing the field-programmable gate array (FPGA), Xilinx provides adaptable, accelerated computing that can be deployed at global scale and respond to dynamic needs.
Overcoming performance problems in deep learning without increasing energy consumption
Deep learning networks today have accomplished a great deal but are hitting bottlenecks as they scale to more complex tasks and bigger models. Attempts to break through the performance bottlenecks in today’s machine learning techniques typically require adding more compute power and more data. The result is enormous models that consume vast amounts of power, limiting scalability and creating environmental damage.
We need a new approach to achieve significant breakthroughs in performance and scalability while reducing power consumption on today’s hardware.
Brain-inspired, optimized networks on FPGAs yield multiplicative throughput improvements
In contrast to the standard dense representations used in most deep learning networks, we created networks that borrow several aspects of the brain’s efficient structure. These brain-inspired, optimized networks not only deliver equivalent accuracy to their standard counterparts, they drastically reduce computational requirements and can run on today’s hardware.
We demonstrated these performance improvements on inference tasks using the Google Speech Commands (GSC) dataset. We created optimized networks on two off-the-shelf Xilinx products:
- Alveo™ U250 – a powerful platform designed for datacenters
- Zynq™ UltraScale+ ZU3EG – a smaller platform designed for embedded applications
100x throughput speedup and power improvement, and new possibilities for deep learning at the edge
Our optimized networks delivered over 100x throughput speed-up and power improvement over their traditional counterparts on the large FPGA platform. Additionally, our optimized network was able to run efficiently on even the smallest of these platforms, where the standard network could not fit, opening new possibilities for Edge AI.
Better resource utilization, untapped edge opportunities and critical energy savings
This dramatic speed improvement provides great benefits, enabling:
- Implementation of much larger networks using the same resource
- Implementation of more copies of networks on the same resource
- Ability to run networks on edge platforms where traditional networks don’t fit
- Massive energy savings and lower costs due to scaling efficiencies
Interested in working with us?
Related Case Studies
Numenta technologies running on the Intel 4th Gen Xeon Max Series CPU enables unparalleled performance speedups for longer sequence length tasks.
Numenta technologies combined with the new Advanced Matrix Extensions (Intel AMX) in the 4th Gen Intel Xeon Scalable processors yield breakthrough results.