Products You May Like
Over the last few years, there’s been an increasing amount of talk around the neural processing unit, or NPU. While NPUs have shipped in smartphones for a few years, Intel, AMD, and most recently Microsoft have fielded AI-enabled consumer laptops and PCs featuring NPUs.
NPUs are closely tied to the related concept of an AI PC and are found on-die inside a growing number of chips made by major hardware manufacturers such as AMD, Apple, Intel, and Qualcomm. They’ve begun to show up more often in laptops, especially since Microsoft launched its Copilot+ AI PC products earlier this year.
What Does an NPU Do?
The job of an NPU is to act as a hardware accelerator for artificial intelligence. Hardware acceleration is the use of dedicated silicon to manage a specific task, like a head chef delegating different tasks to sous chefs as they all work together to prepare a meal on time. NPUs won’t replace your CPU or GPU; instead, NPUs are designed to complement the strengths of CPUs and GPUs, handling workloads like edge AI so that the CPU and GPU can reserve processing time for the tasks at which they excel.
GPUs are specialized hardware accelerators designed for rendering graphics but with enough underlying flexibility to also be great for AI or certain types of scientific calculations. For the longest time, if you had an AI workload you wanted to process, you’d expect to do the actual number crunching with one or more high-powered [probably Nvidia?] GPUs. Some companies are working on building specialized hardware accelerators specifically for AI, like Google’s TPU, because the additional graphics capabilities that put the “G” in “GPU” aren’t useful in a card purely intended for AI processing.
It’s About the Workload
Hardware acceleration is most useful in repetitive tasks that don’t involve a whole lot of conditional branching, especially when there’s a large amount of data. For example, rendering 3D graphics requires a computer to manage an ongoing stream of a zillion particles and polygons. It’s a bandwidth-heavy task, but the actual computation is (mostly) trigonometry. Computer graphics, physics and astronomy calculations, and large language models (LLMs) like the ones that power modern AI chatbots are a few examples of ideal workloads for hardware acceleration.
There are two types of AI workloads: Training and inference. Training is done almost exclusively on GPUs. Nvidia has leveraged its nearly two-decade investment in CUDA and its leadership position in discrete GPUs to dominate both markets, though AMD has emerged as a distant second. Large-scale training takes place at the data center scale, as do the inferencing workloads that run when you communicate with a cloud-based service like ChatGPT.
NPUs (and the AI PCs they are attached to) operate at a vastly smaller scale. They complement the integrated GPU inside microprocessors from your favorite flavor of CPU vendor by offering additional flexibility for future AI workloads and potentially improved performance compared to waiting on the cloud.
How Do NPUs Work?
In general, NPUs rely on a highly parallel design to do repetitive tasks very quickly. By comparison, CPUs are generalists. This difference is reflected in an NPU’s logical and physical architecture. Where a CPU has one or more cores with access to a handful of shared memory caches, an NPU features multiple subunits that each have their own tiny cache. NPUs are good for high-throughput and highly parallel workloads like neural nets and machine learning.
NPUs, neural nets, and neuromorphic systems like Intel’s Loihi platform all share a common design goal: to emulate some aspect of the brain’s information processing.
Credit: AMD
Each device manufacturer bringing an NPU to market has its own microarchitecture specific to its products. Most have also released software development tools to go with their NPUs. For example, AMD offers the Ryzen AI Software stack, and Intel continues to improve its ongoing open-source deep learning software toolkit, OpenVINO.
NPUs and Edge Intelligence
Most NPUs are in consumer-facing devices like laptops and PCs. For example, Qualcomm’s Hexagon DSP adds NPU acceleration to its Snapdragon processors, which are used for smartphones, tablets, wearables, advanced driver-assistance systems, and the Internet of Things. The Apple ecosystem uses its Neural Engine NPU in the A-series and M-series chips that power iPhones, iPads, and iMacs. In addition, some PCs and laptops are designated Copilot+, meaning they can run Microsoft’s Copilot AI on an onboard NPU. However, some server-side or cloud-based systems also make use of NPUs. Google’s Tensor Processing Units are NPU accelerators designed for high-performance machine learning in data centers.
One reason for the ascent of the NPU is the growing importance of edge intelligence. Between sensor networks, mobile devices (like phones and laptops), and the Internet of Things, there’s already a growing demand for data wrangling. At the same time, cloud-based services are beholden to infrastructure latency. Local processing doesn’t necessarily have to do anything in the cloud. This may be an advantage, in both speed and security.
The question of whether you need an NPU is almost a red herring. Silicon Valley juggernauts like Intel, AMD, and Apple have already invested in the technology. Whether or not you have a specific use for an NPU, chances are good that the next time you build or buy a PC, the chip you choose will have an NPU on it. By the end of 2026, analysts expect 100% of American enterprise PC purchases to have one or more NPUs baked right into the silicon. In other words, don’t worry about running out to buy a system with an NPU. They’ll come to you.