What Is Artificial Intelligence? From Generative AI to Hardware, What You Need to Know

Products You May Like

To many, AI is just a horrible Steven Spielberg movie. To others, it’s the next generation of learning computers—or a source of endless empty hype. What is artificial intelligence, exactly? The answer depends on who you ask.

Broadly, artificial intelligence (AI) is the combination of computer software, hardware, and robust datasets deployed to solve some kind of problem. What distinguishes a neural net from conventional software is its structure: A neural net’s code is written to emulate some aspect of the architecture of neurons or the brain.

AI vs. Neural Nets vs. Deep Learning vs. Machine Learning

The difference between a neural net and an AI is often a matter of semantics more than capabilities or design. For example, OpenAI’s powerful ChatGPT chatbot is a large language model built on a type of neural net called a transformer (more on these below). It’s also justifiably called an AI unto itself. A robust neural net’s performance can equal or outclass a narrow AI.

Artificial intelligence has a hierarchical relationship to machine learning, neural networks, and deep learning.

Credit: IBM

IBM puts it like this: “[M]achine learning is a subfield of artificial intelligence. Deep learning is a subfield of machine learning, and neural networks make up the backbone of deep learning algorithms. The number of node layers, or depth, of neural networks distinguishes a single neural network from a deep learning algorithm, which must have more than three [layers].”

The relationships between AI, neural nets, and machine learning are often discussed as a hierarchy, but an AI isn’t just several neural nets smashed together, any more than Charizard is three Charmanders in a trench coat. There is much overlap between neural nets and artificial intelligence, but the capacity for machine learning can be the dividing line. An AI that never learns isn’t very intelligent at all.

What Is an AI Made Of?

No two AIs are the same, but big or small, an AI’s logical structure has three fundamental parts. First, there’s a decision process: usually an equation, a model, or software commands written in programming languages like Python or Common Lisp. Second, there’s an error function, some way for the AI to check its work. And third, if the AI will learn from experience, it needs some way to optimize its model. Many neural networks do this with a system of weighted nodes, where each node has a value and a relationship to its network neighbors. Values change over time; stronger relationships have a higher weight in the error function.

Commercial AIs typically run on server-side hardware, but client-side and edge AI hardware and software are becoming more common. AMD launched the first on-die NPU (Neural Processing Unit) in early 2023 with its Ryzen 7040 mobile chips. Intel followed suit with the dedicated silicon baked into Meteor Lake. Less common but still important are dedicated hardware neural nets, which run on custom silicon instead of a CPU, GPU, or NPU.

How Does an Artificial Intelligence Learn?

When an AI learns, it’s different than just saving a file after making edits. To an AI, getting smarter involves machine learning.

Machine learning takes advantage of a feedback channel called “back-propagation.” A neural net is typically a “feed-forward” process because data only moves in one direction through the network. It’s efficient but also a kind of ballistic (unguided) process. In back-propagation, however, later nodes get to pass information back to earlier nodes.

Not all neural nets perform back-propagation, but for those that do, the effect is like panning or zooming a viewing frame on a topographical map. It changes the apparent lay of the land. This is important because many AI-powered apps and services rely on a mathematical tactic known as gradient descent. In an x vs. y problem, gradient descent introduces a z dimension. The terrain on that map forms a landscape of probabilities. Roll a marble down these slopes, and where it lands determines the neural net’s output. Steeper slopes constrain the marble’s path with greater certainty. But if you change that landscape, where the marble ends up can change.

Supervised vs. Unsupervised Learning

We also divide neural nets into two classes, depending on the problems they can solve. In supervised learning, a neural net checks its work against a labeled training set or an overwatch; in most cases, that overwatch is a human. For example, SwiftKey is a neural net-driven mobile keyboard app that learns how you text and adjusts its autocorrect to match. Pandora uses listeners’ input to classify music to build specifically tailored playlists. And in 3blue1brown’s excellent explainer series on neural nets, he discusses a neural net using supervised learning to perform handwriting recognition.

Neither supervised nor unsupervised learning is necessarily better. Supervised learning is terrific for fine accuracy on an unchanging set of parameters, like alphabets. Unsupervised learning, however, can wrangle data with changing numbers of dimensions. (An equation with x, y, and z terms is a three-dimensional equation.) Unsupervised learning tends to win with small datasets. It’s also good at noticing subtle things we might not even know to look for. Ask an unsupervised neural net to find trends in a dataset, and it may return patterns we had no idea existed.

What Is a Transformer?

Transformers are a versatile kind of AI capable of unsupervised learning. They can integrate many different data streams, each with its own changing parameters. Because of this, they’re excellent at handling tensors. Tensors, in turn, are great for keeping all that data organized. With the combined powers of tensors and transformers, we can handle more complex datasets.

Video upscaling and motion smoothing are great applications for AI transformers. Likewise, tensors—which describe changes—are crucial to detecting deepfakes and alterations. With deepfake tools reproducing in the wild, it’s a digital arms race.

The person in this image does not exist. This is a deepfake image created by StyleGAN, Nvidia’s generative adversarial neural network.
Credit: Nvidia

Video signal has high dimensionality, or “bit depth.” It’s made of a series of images, which are themselves composed of a series of coordinates and color values. Mathematically and in computer code, we represent those quantities as matrices or n-dimensional arrays. Helpfully, tensors are great for matrix and array wrangling. DaVinci Resolve, for example, uses tensor processing in its (Nvidia RTX) hardware-accelerated Neural Engine facial recognition utility. Hand those tensors to a transformer, and its powers of unsupervised learning do a great job picking out the curves of motion on-screen—and in real life.

Tensor, Transformer, Servo, Spy

That ability to track multiple curves against one another is why the tensor-transformer dream team has taken so well to natural language processing. And the approach can generalize. Convolutional transformers—a hybrid of a convolutional neural net and a transformer—excel at image recognition in near real-time. This tech is used today for things like robot search and rescue or assistive image and text recognition, as well as the much more controversial practice of dragnet facial recognition, à la Hong Kong.

The ability to handle a changing mass of data is great for consumer and assistive tech, but it’s also clutch for things like mapping the genome and improving drug design. The list goes on. Transformers can also handle different dimensions, more than just the spatial, which is useful for managing an array of devices or embedded sensors—like weather tracking, traffic routing, or industrial control systems. That’s what makes AI so useful for data processing “at the edge.” AI can find patterns in data and then respond to them on the fly.

What Is a Large Language Model?

Large language models (LLMs) are deep learning software models that attempt to predict and generate text, often in response to a prompt delivered in natural language. Some LLMs are multimodal, which means that they can translate between different forms of input and output, such as text, audio, and images. Languages are huge, and grammar and context are difficult, so LLMs are pre-trained on vast arrays of data. One popular source for training data is the Common Crawl: a massive body of text that includes many public-domain books and images, as well as web-based resources like GitHub, Stack Exchange, and all of Wikipedia.

What Is Generative AI?

The term “generative AI” refers to an AI model that can create new content in response to a prompt. Much of the conversation around generative AI these last 16 months has focused on chatbots and image generators. However, generative AI can create other types of media, including text, audio, still images, and video.

Generative AI produces photorealistic images and video, and blocks of text that can be indistinguishable from a response written by a human. This is useful, but it comes with some caveats. AI is prone to mistakes, and the results of an AI search can be outdated since they may only have access to the body of data the AI trained on. The problem is called hallucination, and it’s a consequence inherent to the open-ended creative process by which generative AI does its work. Developers usually include controls to make sure a generative AI doesn’t give output that could cause problems or lead to harm, but sometimes things slip through.

For example, Google’s AI-powered search results were widely criticized in the summer of 2024 after the service gave nonsense or dangerous answers, such as telling a user to include glue in a pizza recipe to help the cheese stick to the pizza, or suggesting that geologists recommend people eat at least one rock per day. On its splash page, Copilot (formerly Bing Chat) advises the user that “Copilot uses AI. Check for mistakes.”

Most major AI chatbots and media creation services are generative AIs, many of which are transformers at heart. For example, the ‘GPT’ in the name of OpenAI’s wildly popular ChatGPT AI stands for “generative pre-trained transformer.” Let’s look at the biggest ones below.

ChatGPT, DALL-E, Sora | OpenAI

ChatGPT is an AI chatbot based on OpenAI’s proprietary GPT-4 large language model. As a chatbot, ChatGPT is highly effective—but its chatbot skills barely scratch the surface of what this software can do. OpenAI is training its model to perform sophisticated mathematical reasoning, and the company offers a suite of developer API tools by which users can interface their own services with ChatGPT. Finally, through OpenAI’s GPT Store, users can make and upload their own GPT-powered AIs. Meanwhile, DALL-E allows the creation of multimedia output from natural-language prompts. Access to DALL-E3, its most recent generation, is included with the paid tiers of service for ChatGPT.

Sora is the most recently unveiled service; it’s a text-to-video creator that can create video from a series of still images, extend the length of a video forward after its end or backward from its starting point, or generate video from a textual prompt. Its skill in performing these tasks isn’t yet ironclad, but it’s still an impressive display of capability.

CoPilot | Microsoft

Microsoft CoPilot is a chatbot and image generation service the company has integrated into Windows 11 and backported to Windows 10. The service was initially branded as Bing Chat, with image creation handled as a different service. That’s not the case anymore; the Microsoft AI image generation tool, originally called Image Creator, is now accessible from the same CoPilot app as the chatbot. Microsoft has developed several different chatbots and wants to sell AI services as a subscription to commercial customers and individuals.

Gemini | Google

Gemini (formerly Bard), Google’s generative AI chatbot, is a multimodal LLM based on Google’s LaMDA (Language Model for Dialogue Applications). Like other LLMs trained on data from the internet, Gemini struggles with bias inherited from its Common Crawl roots. But its strength is perhaps better demonstrated through its ability to juggle information from multiple different Google services; it shines with productivity-focused tools like proofreading and travel planning.

Grok | xAI

Available to paying X subscribers, the Grok AI chatbot is more specialized than other LLMs, but it has a unique feature: Because it’s the product of xAI, Elon Musk’s AI startup, it enjoys near real-time access to data from X (formerly Twitter). This gives the chatbot a certain je ne sais quoi when it comes to analyzing trends in social media, especially with an eye to SEO. Musk reportedly named the service, because he felt that the term “grok” was emblematic of the deep understanding and helpfulness he wanted to instill in the AI.

Midjourney | Midjourney Inc.

Midjourney is an image-generating AI service with a unique perk (or restriction, depending on your use case): It’s only accessible via Discord. Billed as an aid for rapid prototyping of artwork before showing it to clients, Midjourney rapidly entered use as an image creation tool of its own right. It’s easy to see why: the viral “Pope Coat” image of early 2023 was created using Midjourney.

Credit: Public domain

Midjourney’s image-creation talents are at the top of the heap, with a caveat. The AI’s eponymous parent company has spent nearly two years in court over its alleged use of copyrighted source material in training Midjourney.

What Is AGI?

AGI stands for artificial general intelligence. Straight out of an Asimov story about the Three Laws of Robotics, AGI is like a turbo-charged version of an individual AI, capable of human-like reasoning. Today’s AIs often require specific input parameters, so they are limited in their capacity to do anything but what they were built to do. But in theory, an AGI can figure out how to “think” for itself to solve problems it hasn’t been trained to solve. Some researchers are concerned about what might happen if an AGI were to start drawing conclusions we didn’t expect.

In pop culture, when an AI makes a heel turn, the ones that menace humans often fit the definition of an AGI. For example, Disney/Pixar’s WALL-E followed a plucky little trashbot who contends with a rogue AI named AUTO. Before WALL-E’s time, HAL and Skynet were AGIs complex enough to resent their makers and powerful enough to threaten humanity. Imagine Alexa, but smart enough to be a threat, with access to your entire browser history and checking account.

What Does AI Have to Do With the Brain?

Many definitions of artificial intelligence include a comparison to the brain, whether in form or function. Some take it further, zeroing in on the human brain; Alan Turing wrote in 1950 about “thinking machines” that could respond to a problem using human-like reasoning. His eponymous Turing test is still a benchmark for natural language processing. Later, however, Stuart Russell and John Norvig observed that humans are intelligent but not always rational.

As defined by John McCarthy in 2004, artificial intelligence is “the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.”

Russell and Norvig saw two classes of artificial intelligence: systems that think and act rationally versus those that think and act like a human being. But there are places where that line begins to blur. AI and the brain use a hierarchical, profoundly parallel network structure to organize the information they receive. Whether or not an AI has been programmed to act like a human, on a very low level, AIs process data in a way common to not just the human brain but many other forms of biological information processing.

Neuromorphic Systems

A neural net is software, designed to emulate the multi-layered parallel processing of the human brain. But on the hardware side of the equation, there are neuromorphic systems, which are built using a type of specialized and purpose-built hardware called an ASIC (application-specific integrated circuit). Not all ASICs are neuromorphic designs, but neuromorphic chips are all ASICs. Neuromorphic design fundamentally differs from CPUs and only nominally overlaps with a GPU’s multi-core architecture. But it’s not some exotic new transistor type, nor any strange and eldritch kind of data structure. It’s all about tensors. Tensors describe the relationships between things; they’re a kind of mathematical object that can have metadata, just like a digital photo has EXIF data.

Tensors figure prominently in the physics and lighting engines of many modern games, so it may come as little surprise that GPUs do a lot of work with tensors. Modern Nvidia RTX GPUs have a huge number of tensor cores. That makes sense if you’re drawing moving polygons, each with some properties or effects that apply to it. Tensors can handle more than just spatial data, and GPUs excel at organizing many different threads at once.

But no matter how elegant your data organization might be, to run on a CPU, it must filter through multiple layers of software abstraction before it becomes binary. Intel’s neuromorphic chip, Loihi 2, takes a very different approach.

Loihi 2

Loihi 2 is a neuromorphic chip that comes as a package deal with a compute framework named Lava. Loihi’s physical architecture invites—almost requires—the use of weighting and an error function, both defining features of AI and neural nets. The chip’s biomimetic design extends to its electrical signaling. Instead of ones and zeroes, on or off, Loihi “fires” in spikes with an integer value capable of carrying much more data. For better or worse, this approach more closely mirrors the summative potentials that consensus-taking neurons use to decide whether or not to fire.

Loihi 2 is designed to excel in workloads that don’t necessarily map well to the strengths of existing CPUs and GPUs. Lava provides a common software stack that can target neuromorphic and non-neuromorphic hardware. The Lava framework is explicitly designed to be hardware-agnostic rather than locked to Intel’s neuromorphic processors.

Credit: Intel/Intel Labs

Machine learning models using Lava can fully exploit Loihi 2’s unique physical design. Together, they offer a hybrid hardware-software neural net that can process relationships between multiple entire multi-dimensional datasets, like an acrobat spinning plates. According to Intel, the performance and efficiency gains are largest outside the common feed-forward networks typically run on CPUs and GPUs today. In the graph below, the colored dots towards the upper right represent the highest performance and efficiency gains in what Intel calls “recurrent neural networks with novel bio-inspired properties.”

Feed-forward-only architectures are limited compared to neural net architectures that can take advantage of feedback.

Credit: Intel/Intel Labs

Intel hasn’t announced Loihi 3, but the company regularly updates the Lava framework. Unlike conventional GPUs, CPUs, and NPUs, neuromorphic chips like Loihi 1/2 are more explicitly aimed at research. The strength of neuromorphic design is that it allows silicon to perform a type of biomimicry. Brains are extremely cheap, in terms of power use per unit throughput. The hope is that Loihi and other neuromorphic systems can mimic that power efficiency to break out of the Iron Triangle and deliver all three: good, fast, and cheap.

IBM NorthPole

IBM’s NorthPole processor is distinct from Intel’s Loihi in what it does and how it does it. Unlike Loihi or IBM’s earlier TrueNorth effort in 2014, Northpole is not a neuromorphic processor. NorthPole relies on conventional calculation rather than a spiking neural model, focusing on inference workloads rather than model training. What makes NorthPole special is the way it combines processing capability and memory. Unlike CPUs and GPUs, which burn enormous power just moving data from Point A to Point B, NorthPole integrates its memory and compute elements side by side.

According to Dharmendra Modha of IBM Research, “Architecturally, NorthPole blurs the boundary between compute and memory,” Modha said. “At the level of individual cores, NorthPole appears as memory-near-compute and from outside the chip, at the level of input-output, it appears as an active memory.” IBM doesn’t use the phrase, but this sounds similar to the processor-in-memory technology Samsung was talking about a few years back.

IBM
Credit: IBM’s NorthPole AI processor.

NorthPole is optimized for low-precision data types (2-bit to 8-bit) as opposed to the higher-precision FP16 / bfloat16 standard often used for AI workloads, and it eschews speculative branch execution. This wouldn’t fly in an AI training processor, but NorthPole is designed for inference workloads, not model training. Using 2-bit precision and eliminating speculative branches allows the chip to keep enormous parallel calculations flowing across the entire chip. Compared with an Nvidia GPU manufactured on the same 12nm process, NorthPole was reportedly 25x more energy efficient. IBM reports it was 5x more energy efficient.

NorthPole is still a prototype, and IBM has yet to say if it intends to commercialize the design. The chip doesn’t neatly fit into any of the other buckets we use to subdivide different types of AI processing engines. Still, it’s an interesting example of companies trying radically different approaches to building a more efficient AI processor.

AI on the Edge vs. for the Edge

Not only does everyone have a cell phone, but everything seems to have a Wi-Fi chip and an LCD. Embedded systems are on the ascent. This proliferation of devices gives rise to an ad hoc global network called the Internet of Things (IoT). In the parlance of embedded systems, the “edge” represents the outermost fringe of end nodes within the collective IoT network.

Edge intelligence takes two primary forms: AI on edge and AI for the edge. The distinction is where the processing happens. “AI on edge” refers to network end nodes (everything from consumer devices to cars and industrial control systems) that employ AI to crunch data locally. “AI for the edge” enables edge intelligence by offloading some compute demand to the cloud.

In practice, the main differences between the two are latency and horsepower. Local processing will always be faster than a data pipeline beholden to ping times. The tradeoff is the computing power available server-side.

Embedded systems, consumer devices, industrial control systems, and other end nodes in the IoT all add up to a monumental volume of information that needs processing. Some phone home, some have to process data in near real-time, and some have to check and correct their work on the fly. Operating in the wild, these physical systems act just like the nodes in a neural net. Their collective throughput is so complex that, in a sense, the IoT has become the AIoT—the artificial intelligence of things.

None of Us Is As Dumb As All of Us

The tech industry has a reputation for rose-colored lenses, and to a degree, it has earned its optimism. As devices get cheaper, even the tiny slips of silicon that run low-end embedded systems have surprising computing power. But having a computer in a thing doesn’t necessarily make it smarter. Everything’s got Wi-Fi or Bluetooth now. Some of it is really cool. Some of it is made of bees. Mostly, its strength is in analytics. If I forget to leave the door open on my front-loading washing machine, I can tell it to run a cleaning cycle from my phone. But the IoT is already a well-known security nightmare. Parasitic global botnets exist that live in consumer routers. Hardware failures can cascade, like the Great Northeast Blackout of the summer of 2003 or when Texas froze solid in 2021. We also live in a timeline where a faulty firmware update can brick your shoes. None of these things compare to the widespread monetization of passively collected user data sold to third parties for advertising purposes.

There’s a common pipeline (hypeline?) in tech innovation. When a Silicon Valley startup invents a widget, it goes from idea to hype train to widgets-as-a-service to market saturation and disappointment, before finally figuring out what the widget is good for. It may not surprise you to see that generative AI, which sometimes seems to miss as often as it hits, appears to be descending into its trough of disillusionment.

Generative AI, while powerful, sometimes seems to stumble as much as it finds its stride.

Credit: Gartner

This is why we lampoon the IoT with loving names like the Internet of Shitty Things and the Internet of Stings. (Internet of Stings devices communicate over TCBee-IP.) But the AIoT isn’t something anyone can sell. It’s more than the sum of its parts. The AIoT is a set of emergent properties that we have to manage if we’re going to avoid an explosion of splinternets, and keep the world operating in real time.

What Is Artificial Intelligence? TL;DR

In a nutshell, artificial intelligence is often the same as a neural net capable of machine learning. They’re both software that can run on whatever CPU or GPU is available and powerful enough. Neural nets use weighted nodes to represent relationships, and often have the power to perform machine learning via back-propagation.

There’s also a kind of hybrid hardware-and-software neural net that brings a new meaning to “machine learning.” It’s made using tensors, ASICs, and neuromorphic engineering meant to mimic the organization of the brain. Furthermore, the emergent collective intelligence of the IoT has created a demand for AI on, and for, the edge. Hopefully, we can do it justice.

View original source here.