
want to passively keep up-to-date on my latest blog posts?
Subscribe to my Substack!
It seems like every day a new ‘revolutionary’ AI term emerges. In fact, it happens so often that DALL-E invented a new term when I asked it to create the above image (I mean… emborming?!).
Quick post-publishing edit!
In the last couple of weeks before publishing this post the buzz is already all around Model Context Protocol (MCP), which I don’t cover in this post since it wasn’t popular when I started writing a few weeks ago!
Even without AI hallucinations, keeping up with the latest AI buzz terms is challenging. More importantly, it’s difficult to determine where the actual substance lies. When you want to start building your first AI application, it’s hard to know which concepts will provide practical knowledge.
That’s where I’d like to help. After taking several online courses, reading hundreds of Reddit threads, and having late-night discussions with my good friends ChatGPT and Claude, I’ve created my own mental model for the world1 of AI.
We’ll start by breaking down the most common terms used today. Then, we’ll conclude with recommended learning pathways based on your personal goals and appetite for deeper understanding.
Let’s Speak the Same Language
Speaking the same language is crucial in software engineering, where we’re inundated with overloaded technical terms and the challenges of Naming Things. I’ve seen people argue over designs when they’re actually describing the same approach using different terms, making them think they’re engaged in architectural warfare.
This can also hinder learning, so I’ve created a view of key terms to help beginners understand AI and create focused learning paths. I’ve made some necessary oversimplifications and generalizations to quickly establish fundamentals. My apologies to any experts reading this.
Note that no tier or subset in this image is complete. It covers the key terms I’ve encountered most often related to GenAI (specifically text-to-text generation). Even while writing this initial version, I had to add new key terms as they emerged!
The terms are divided into three categories, with hierarchical relationships showing related topics. While I can’t provide full details for each term here, I’ll offer a brief overview and resources for deeper exploration.
Fields: The Problem Spaces
These are areas where we’re trying to solve difficult problems. They don’t offer solutions but drive what’s being created. How can we create artificial intelligence? How can we extract information from text? How can machines understand language? How can machines identify objects in images?
The concepts in this area are the most widespread, since they don’t involve technical details. These terms are often used so generally that you can’t be entirely sure what someone specifically means. For instance, if someone says “I’d like to do artificial intelligence,” that could mean many different things.
The picture becomes clearer when we focus on generative AI (GenAI). This field generates the most buzz due to regular major breakthroughs. As Wikipedia defines it, GenAI is “a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data.” The ability to generate human-like language or media with minimal human input has made GenAI nearly synonymous with artificial intelligence.
Implementation Methods: The Tools Available to Solve Problems
As a builder, implementation methods are the things you actually write into your code to address problems in the fields. While some of these tools may only be used to solve for a specific problem in a field, it is very common for them to be used creatively to address areas outside their original use. Just because a flat-head screwdriver is made for screws, doesn’t mean you can’t use it to open a paint can.
The buzziest term you see in this section is definitely Large Language Models (LLMs). Let’s break down the hierarchy above and below this level, so we can understand how LLMs came to be and what is actually used by state-of-the-art LLMs today.
The worlds of Machine Learning (ML) and AI are closely intertwined. At the most simple level, machine learning is a process by which input data is passed through some sort of (usually iterative) algorithm or model. The results of that model then may go through some sort of interpretation process to make them usable in some field. The iterative nature of this process is what resulted in the term “learning”. Typically, the result of the models get better and better in each iteration, which makes it seem like the machine is learning.
In GenAI, Neural Networks are the primary family of models used. These models mirror how neurons in the brain communicate, hence the name. A neural network consists of an input layer, a hidden layer, and an output layer. Mathematics applied at each layer produces a series of numbers in the output layer representing what the model “learned.” These are the model’s “parameters,” which we’ll discuss more later.
Visually, Deep Learning is the simplest step we take down this hierarchy. A neural network can be made deeper by increasing the number of hidden layers - voilà - deep learning! Why don’t all neural networks use deep learning? The aforementioned mathematics between each of these layers is expensive to complete, especially when using datasets with billions of data points. Each of these hidden layers help the model learn more and more, but it is why training GenAI models is associated with high costs.
Go ahead and pass a large amount of language data into a deep learning model, and you’ve got what something that is starting to resemble Large Language Models (LLMs)! Note, this text data is actually a numerical representation of words using embeddings. The LLM will use the numerical representations to understand language.
When the LLM is sent user input after being trained, it will use these numbers to calculate its response. For each word in the reply, the LLM will calculate the words with the highest probabilities of being the “right” word to use next given the context. This repeats for each word until a final response is created.
Up to this point, we’ve only depicted the general form (or architecture) for a neural network model. The real catalyst in the GenAI revolution has been the implementation of…
Transformer Models! While these are only depicted as a few boxes here, this is just an abstraction - there is deep learning going on inside those boxes. We can’t cover the specifics in detail here (that’s why we have a Learning Path section at the end of this post). At a high-level, though, this model architecture is designed to enable a few key characteristics depending on if the model uses an encoder, decoder, or both:
- Encoder Characteristics:
- Bi-directional - considers the words before and after a word in a sentence
- Self-attention - an attention mechanism used to consider the context of a word in the text. In other words, ‘dog’ would have a different numerical representation in the sentences, “I love dogs.” and “I wish he stopped barking at me like a dog”.
- Decoder Characteristics:
- Uni-directional - only considers the words before a word in a sentence
- Masked Self-attention - the numerical representation of a word does not consider words to the ‘right’ (i.e. later in the sentence). The subsequent words are ‘masked’. This can be switched, so it only includes words on the left.
- Auto-regressive - the word it generates can then act as an input back into the decoder
You can see how these attributes would be useful in understanding language. Various LLMs use these to solve different NLP problems.
| Architecture | Use Cases | LLM Examples |
|---|---|---|
| Encoder-only Transformer Models | Tasks that require understanding of the input, such as sentence classification and named entity recognition | BERT, RoBERTa |
| Decoder-only Transformer Models | Tasks such as text generation | GPT, CTRL |
| Encoder-Decoder Transformer Models | Tasks that require an input, such as translation or summarization | T5, mBART, Marian |
Enhancement Methods: The Approaches to Make Tools more Effective
Now you may wield the tools to start working with GenAI, but as George RR Martin once said: “A sword is only as good as the man who wields it”. An out-of-the-box model can accomplish some really cool things, but it is being able to make the seemingly small tweaks that will set apart anything you build.
There are many different strategies you can use. We’ll discuss some of the buzziest at a high-level, so you can see which may fit your building use case.
Alter Model Input or Input Interpretation
One of the pillars of machine learning is “garbage in, garbage out” and GenAI is no different. Altering the input data into the model, can make a huge impact to the model’s performance.
- tokenization: the process of splitting text data into tokens (e.g., a sentence into individual words). Individual words may be split even further. Also, punctuation often receives individual special tokens to help know when sentences start or end or when a question is being asked. There are different strategies for tokenization, including Word-based Tokenizers, Character-based Tokenizers, and Subword Tokenizers.
- Attention: many experts believe that the discovery of the impact of attention on language models is one of the foundational moments that helped us enter this era of GenAI. An attention mechanism is what impacts how the context of each word in your input sentence will be interpreted. In the above example, the tokens “like” and “ing” have more impact on “train” than words earlier in the sentence.
- Prompt Engineering: a prompt is the user input provided to the model that then results in the model generating a response. The process of Prompt Engineering is refining that prompt to get better and better results. This is typically the “lowest hanging fruit” when it comes to improving model outputs. The world is constantly discovering new and interesting ways to interact with the models.
- Embedding Models and Retrieval Augmented Generation (RAG): a RAG architecture helps improve response and protect against hallucinations (i.e., when a model states something as fact when it is wrong, or completely made up). With RAG, the model has an additional set of reference data (e.g., company documents, a textbook). This data is turned into a numerical representation via the embedding models and stored as vectors in vector databases. Through mathematical computations, the prompt and embeddings can be compared to determine the documents most similar and relevant to the prompt. When generating a response, the model can be told to cite directly or double check against the references. This helps ground each response in truth and relevance.
Alter the Model
Beyond the foundational architecture of the model, there are many slight changes you can make to tweak a fully-trained model, or change how it may behave. These enhancements can be used to improve the model, or to make it more practical to use (i.e., make it smaller or less compute intensive).
- Parameters: you may have noticed that the full name of many LLMs include a term like
80Bor7B. This term is quantifying the number of parameters in the model. The parameters actually are the single most important aspect of what a model “is”. They represent the final numeric values created after training the model (i.e., they are the result of the millions of dollars of compute costs doing iterative math calculations). These numeric values are used to produce all future outputs based on the user input. Generally, the larger the model, the better the model. But, this comes with a tradeoff on compute costs and memory usage. - Fine-tuning: while these parameters are amazing for general purposes, if you’re building a specialized application with GenAI you’ll want to make sure the model is tuned for your needs. This is done via fine-tuning a model. You start with a general purpose model, which provides an amazing starting point and saves you millions of dollars in initial training. Then, you feed it data specific to your use case, run additional training, and then you have slightly updated data for your needs! Strategies for fine-tuning include LoRA, QLoRA, PEFT, and DPO.
- Quantization: All of these parameters result in models being very large from a memory standpoint, which is why many model providers offer alternatives with fewer parameters that may be more practical for something like a mobile application. One way to further reduce the size of a model is to use Quantization, which is the process of mapping a set of data that takes up a large amount of memory to a set of data that takes up a smaller set of memory. For example, the parameters of the model may be represented using
float32values that take up 32 bits each. Quantization would map these values into afloat16orint8representation, which take up less space. Precision and, therefore accuracy, would be lost, but the model would take up significantly less space. - Temperature: Finally, you can alter the results of the model by tweaking the model temperature. Earlier, we mentioned how the next word of a model response is selected based on which has the highest probability of being the correct one to use next. The temperature of the model influences the model to select words of lower probabilities. This can result in more “creativity” in the model, but it also can cause the model to have hallucinations.
Alter Model Interactions
Lastly, you can use your refined models and interact with them creatively to get different results. One example of this picking up buzz in the GenAI community is using the models as AI Agents.
The models themselves aren’t heavily altered in this case, but they are given some “special” capabilities using carefully crafted prompts.
- Prompts for agents should not prescribe an approach to apply. Rather, the prompts should give the models some general goals and a persona (e.g., teacher, student), but the approach should be determined by the model using reasoning and reflection. You can provide examples of good results, or test cases, though.
- Agents should be given access to tools to allow them to complete their goals. A tool is a function the model can call. This may be an API to check the weather, or access to writing an updated order to a database.
- Optionally, agents can interact with other model agents. This division of responsibility and improve performance by having specialized agents working on certain tasks or having some agents “check” the work of other agents.
In the example depicted above, we could have an AI agent with a principal persona help lead a group of teachers in creating curriculum that will lead to the highest overall school grade. The teachers then create individual curriculums for their subjects based on tools like converting words to a relevant SAT synonym. Each curriculum can then be tested against students that may have different needs (e.g., a learning disability) to ensure it meets the needs of a general population. All the models in this example could be specially trained and fine-tuned for their purpose.
Okay, but how do I start “doing AI”?
Well, if you haven’t already learned you shouldn’t be saying “doing AI”, I’ve failed at the first part of this blog… anyways, from here, you have the knowledge to understand the GenAI APIs available to you. If you want to dive-deeper into any of these areas to see how they work under the hood, consider any of the below learning paths.
Learning Paths
For more on the technical stacks used to build an AI application, checkout part 2 of this series: From Buzz to Building - Introduction to GenAI for Developers - Part 2 - The Technical Stack.
- FastAI’s Practical Deep Learning for Coders
- For Who: Developers that want a coding-first path to learning about machine learning, deep learning, and different model architectures (e.g., transformers)
- Hugging Face NLP Course
- For Who: Developers who want more of a targeted focus on models used specifically for Natural Language Processing (NLP)
- OpenAI Quickstart
- For Who: Under the hood, under the shmood! After reading this post, you should have enough basic knowledge to click around the OpenAI documentation and start building!
- Deep Dives:
- Sampling for Text Generation
- For Who: Developers who better want to understand how GenAI picks the “next word” and the impacts of model temperature
- Attention Is All You Need
- For Who: Developers who love foundational white papers
- Sampling for Text Generation
Is this post not quite what you need? Check out AI Canon by a16z. I discovered this resource after I completed my article, and I must admit it does a better job :)
#blog-post #technical-deep-dive
Footnotes
Footnotes
-
The world of AI is massive. Here, we’ll focus on where we see the most buzz: LLMs (specifically, their text-to-text capabilities) ↩