ExtremeTech Explains: How Does ChatGPT Work?

Products You May Like

How do you use ChatGPT? What can it do? We explain what ChatGPT is, how it works, and how to make it work for you. Let’s start with the basics and go deeper from there.

What Is ChatGPT?

ChatGPT is an artificial intelligence chatbot based on OpenAI’s foundational GPT-4 large language model. It parses natural-language prompts from the user and generates a relevant response based on the data used to train the AI. The ChatGPT backend software is written in Python, and it runs on the Microsoft Azure supercomputer powered by thousands of Nvidia A100 and H100 GPUs.

What Does the ‘GPT’ Stand For?

The GPT in ChatGPT’s name stands for generative pre-trained transformer. A generative AI is a type of multimodal AI system that generates text, images, or other media in response to prompts from the user. In machine learning, a transformer is a deep learning model that can begin to discern what parts of its input are essential. Transformers take a given data set, perform an operation on it, and then return the result.

OpenAI used a semi-supervised approach to pre-train the GPT models that power ChatGPT. In the first unsupervised stage, ChatGPT’s programmers loosed the model on their training data sets, prompting it to form its own assumptions about the structure of languages. (Unsupervised learning is critical to ChatGPT’s flexibility because it lets the model assess the data it’s received.) Then, the programmers fine-tuned the model, correcting its broad-brush assumptions using filtered, structured, and labeled data that was content-moderated by humans.

What Can ChatGPT Do?

As a generative, multimodal AI, ChatGPT specializes in generating content in response to user prompts. ChatGPT and other models of this type can produce blocks of text, create images and video, and even write music. Mobile users can take advantage of the microphone in their phone or tablet, using verbal queries or speech-to-text. ChatGPT Plus subscribers can upload images to the service to inform its process and output. Unlimited access to OpenAI’s DALL-E image creator is also included in the paid ChatGPT service tiers.

Once you learn its internal rules, the service is straightforward, and there’s a surprising wealth of different ways to use it that are “off the beaten path.” We’ve seen success stories from folks who used ChatGPT to help them plan meals and vacations, host trivia games, write code and Excel formulas, and even beat depression. And it looks like the GPT model family’s powers are only increasing. OpenAI has been using their GPT 4.0 model to do mathematical reasoning with natural language inputs, training the model on questions like “Simplify ‘tan 100 + 4(sin 100).'”

Credit: OpenAI

OpenAI offers a fair number of third-party bells and whistles for (paid) ChatGPT Plus users. There are now more than a hundred extensions for ChatGPT from providers like Instacart, OpenTable, and Zillow. You can install as many as you like, but you can only have three enabled at a time.

Still more powerful are the APIs by which developers can interface their projects with OpenAI’s GPT3/3.5/4+ backend. These tools aren’t free; developers pay per query, although it’s a fraction of a cent per thousand input tokens. However, OpenAI’s various ChatGPT APIs can interface with external APIs, allowing external services to perform sophisticated queries and function calls.

Finally, OpenAI’s GPT Store hosts millions of custom GPTs, which free and paid users can browse and use. These special-purpose mini models allow users to apply the full power of GPT-3+ to their own specific tasks. If you’ve ever wanted to build your own AI, you can use ChatGPT to do it.

How Does ChatGPT Work?

ChatGPT’s unique powers come from its roots as a transformer. Before tools like ChatGPT began to use natural language, they were already terrific at algorithmically transforming and upscaling images and video. Images can be represented as an array of pixels, where each pixel has its associated values within the colorspace—and videos are a series of images, sometimes with audio waveforms attached. You can give a transformer an image and tell it, “Do X operation to every pixel,” and what it returns can be a drastic improvement on the source material. Transformers can even analyze and change the motion of elements in a video, sussing out the movement vectors based on which pixels change between frames.

Before we go any further, you’ll want to know what a token is if you didn’t already: A token, in machine learning, is a subordinate element of a sentence, phrase, clause, paragraph, or other input form (like elements of an image, or motifs within a piece of music). According to OpenAI, there are about 1,000 tokens in 750 words. The path (sequence of words or tokens) a transformer takes to get to its output is informed by how much attention its model thinks it should pay to various tokens and areas of the map.

Tokens, in this example text, represented as a multidimensional vector

Credit: OpenAI

Under the hood, these models use vectors (which—remember?—have both magnitude and direction) to navigate language as a kind of conceptual landscape, where attributes of language represent the Cartesian coordinate system, and the relative importance of each token informs the topographical height of features in this semantic landscape. A sentence, or a group of tokens, has a net vector. The model chooses its path through that landscape, working in terms of vectors, using an algorithm called gradient descent.

Suppose you put a marble on an uneven surface; it would roll along the path of least resistance to find its lowest-energy resting place. Gradient descent picks a place on that 3.5-billion-dimensional landscape and puts down a metaphorical marble. The AI’s output, be it text or image or multimedia, is analogous to the marble’s path.

To produce its fluent answers, ChatGPT has to ask itself, “What comes next in this text string?” To solve the problem with gradient descent, ChatGPT picks a starting point on the semantic landscape, a point that corresponds to the language attributes of the starting query. The model decides on the desirability of its options based on which place on the semantic map has the greatest probability of coming next in the model’s process. Then, it takes the highest-value path to where it thinks the user wants to go.

Transformers can manage what they pay attention to—in effect, they can cordon off parts of the map, depending on what they are programmed to do. More important tokens get preferential attention, and the model decides which tokens are important by looking at a comparator set of natural-language exchanges between humans. For ChatGPT, one important set is the Common Crawl, which indexes a vast number of websites, including Wikipedia, Reddit, StackExchange, and GitHub. Another is the official Ubuntu help forums, which contain more than a million language exchanges between human beings.

How to Access ChatGPT

You can find the vanilla ChatGPT tool at chat.openai.com. Access is free, but a paid subscription does exist. Both require you to create an account. The free tier offers users “unlimited messages, interactions, and history” with its GPT-3.5 model. Right now, the paid personal subscription is $20 per month, and for that outlay, you get access to ChatGPT-4, with additional tools for browsing, media generation, and data analysis. ChatGPT Plus subscribers can also use OpenAI’s suite of tools to create their own purpose-built GPTs, although per OpenAI’s terms of use, commercial use of those GPTs requires a commercial or enterprise-level subscription.

OpenAI has released an official ChatGPT app for Android and iOS, available to both free users and ChatGPT Plus subscribers. It offers the same features as the website, with an added perk: OpenAI’s open-source Whisper speech recognition tool allows mobile users to speak their queries rather than typing them out.

Credit: Emiliano Vittoriosi | Unsplash

Microsoft has poured billions of dollars into ChatGPT and integrated it into Bing Chat and SwiftKey, a recently acquired third-party keyboard app. Several “front-end” services are powered by the GPT-3+ models, including DALL-E and DALL-E2, the latter behind Bing Image Creator.

One caveat: ChatGPT can only handle so many users simultaneously. More than 100 million people have signed up for the service, most of whom are in the US. This means that, like with game servers, peak use periods correspond to the times people are most likely to use the service. If too many try to engage with ChatGPT simultaneously, queue times go up, and its servers can overload.

How to Use ChatGPT

Once you’ve found your chosen flavor of ChatGPT, it’s time to start. To engage with ChatGPT, you need to give the AI a prompt. You don’t need to know how to code; you can write your prompt in regular conversational language. Some examples:

Draw a picture of a red fox sitting happily in a pile of autumn leaves.

In three paragraphs, summarize the reasons for the Federal Reserve’s last three interest rate adjustments.

Give me an outline for an 800-word blog post that gives an overview of n-type versus p-type semiconductors.

Credit: ExtremeTech

You can refine your results by changing the prompt or telling the AI to refer to a given resource, such as .gov domains, StackExchange, or Wikipedia. More detail in your prompt helps the AI provide a better response. Trial and error is your friend here.

For those who want to delve more deeply into artistic experimentation with GPT-type neural nets, OpenAI also built DALL-E 3, an ‘AI art generator’ based on ChatGPT. DALL-E 3 can parse natural-language descriptions and use them to create art of many styles, from abstract to clip art to photorealistic.

What Are the Drawbacks of Using ChatGPT?

OpenAI bills its GPT-4 model as capable of humanlike performance, able to “see, hear and speak,” but clothes don’t make a man. AI has no sense of context and can only do what it’s programmed to do, which starts to show when people test its limits.

One of the biggest issues with ChatGPT and all similar programs is called hallucination. The ability to hallucinate is key to ChatGPT’s abilities but is also a critical weakness. When an AI hallucinates, it produces output with a known form but mismatched content. Generative AIs, like ChatGPT, use this ability to respond to prompts, creating novel content that’s essentially a sophisticated, blenderized remix of what the AI has seen as it trains and learns.

The problem is that ChatGPT isn’t always as wise as it is powerful. AI hallucinating detail can change what characters look like or distort videos and images in unusual ways. While ChatGPT can tell jokes, it can’t make up its own; research has shown it tends to return to just a couple dozen, only making simple variations on the theme. More seriously, a recipe chatbot based on GPT 3.5 recently made headlines by suggesting delights such as a “Poison Bread Sandwich,” “Thermite Salad,” “Bleach-Infused Rice Surprise” and an “Aromatic water mix” containing bleach, water and ammonia. ChatGPT-based tools such as GitHub Copilot can write a functional computer program—even one that includes backdoors or hidden malicious code.

ChatGPT also struggles with accuracy. Any service or tool that scrapes the Web is vulnerable to misinformation, and ChatGPT is no exception, although GPT-4 is less likely to confabulate than GPT-3. For example, in September 2023, a deprecated model of GPT-3 confidently told Quora that eggs can melt. (They can’t.) AIs powered by large language models, like ChatGPT and its kin, have an annoying tendency to create textual output that is grammatically coherent but factually wrong.

And then there are the legal problems: Concerns related to copyrighted material being ingested into image and video tools have created headaches across the legal landscape. “Fair use” isn’t necessarily the same for commercial use as it is for personal or academic purposes. Getty (of Getty Images) and the New York Times have filed suit against OpenAI for copyright infringement, alleging that OpenAI trained ChatGPT on their work and is profiting from their respective intellectual property. Major scientific journals, such as Science and Nature, have banned or sharply restricted AI-generated content.

In practice, this all means that when it comes to the heavy-duty stuff, ChatGPT isn’t ready to have the training wheels off quite yet. But the number of services powered by ChatGPT in its various incarnations will only grow over the next few years. Microsoft intends to build AI into Windows through multiple services and applications, including products like CoPilot and its Bing web browser. As AI tools gain traction and popularity, we’ll move along the hype curve, and the technology will find its place. Until then—we’ll be here to demystify ChatGPT and other new technologies, one how-to at a time.

How did we do with this explainer? Did we miss a question you wanted to ask? We do read the comments, so please leave your feedback below. Thanks for reading!

View original source here.