AI Engineering Notes: Foundation Models and AI Engineering (Ch.1)

These are my personal summaries and observations from reading AI Engineering by Chip Huyen. They are not a comprehensive review — just the notes and takeaways that stood out to me. I highly recommend picking up the book for the full picture.


This chapter covers the foundations of building AI applications with foundation models — key concepts around language models, tokenization, and the emerging discipline of AI engineering.

Foundation Models and Language Models

Language models have come a long way. They started out with simple probabilistic approaches using unigram and n-gram models. For example, computing probabilities like:

Over time, these models became much more sophisticated with the use of deep learning.

Tokenization

For any language model, there needs to be a step to break up text into tokens — this process is called tokenization. Here is how GPT-4 tokenizes text:

GPT-4 tokenizer screenshot showing "tokenization" and "I enjoy playing table tennis" broken into tokens (8 tokens, 41 characters)

A vocabulary contains the words that a given model can recognize. Different models have very different vocabulary sizes:

In general, there is an <oov> (out-of-vocabulary) token that represents unrecognizable tokens. You can think of it as a dummy value.

Why tokenize with tokens instead of words or characters?

Types of Language Model

How to Train Your Model

Most language models are now trained in a self-supervised way using next token prediction. The labels are inferred from the training data itself.

Imagine you have Harry Potter as the training text. Given the tokens for “Harry”, the model will be trained to predict “Potter”:

Input:  <BOS> Harry
Output: Potter

How much data do you need for pretraining? The tl;dr is: a lot. Researchers propose a Chinchilla scaling law, suggesting approximately 20 tokens per parameter.

What is a Foundation Model?

The author proposes that to be considered a foundation model, the model needs to satisfy two criteria:

  1. Can be used in various AI applications for different needs
  2. Has good general capabilities

Multimodal models with image understanding are foundation models. The CLIP model is an embedding model that is part of the component of Vision-LLMs (VLLM).

How to work with foundation models:

AI Engineering

What makes AI engineering different from ML engineering / MLOps?

As pretraining foundation models is expensive, few companies have the capabilities to do it. But the foundation models are increasingly accessible via APIs. One can simply make a single API call to get model predictions instead of hosting their own models. This fundamentally changes the workflow.

Use Cases

The author tabulates the use cases in Github open-sourced projects (n=205):

Use Case Percentage
Coding 30.4%
Chatbots 26.5%
Information Aggregation 12.7%
Image / Video Production 12.7%
Workflow Automation 11.3%
Writing 3.4%
Education 1.5%
Data Organization 1.5%

Another observation is that companies are faster to deploy internal-facing applications (internal knowledge management) than external-facing ones.

Detailed breakdown of key use cases:

Planning AI Applications

The author proposes evaluating the necessity of building an AI application with decreasing urgency:

  1. If you don’t build, competitors can build the AI and make you obsolete (e.g. Website builder?)
  2. If you don’t build, you miss opportunities to boost profits and productivity (e.g. Better chatbot to improve user experience)
  3. Haven’t figured how to incorporate AI into business — Can have a R&D department for prototyping

Once the use case is established, the team can consider whether to build in-house or buy an industry solution.

Types of AI Integration into Applications

There are several dimensions to consider:

How to create a deep moat for your AI products?

ML Development Cycle

AI Engineering Tech Stack

How is AI Engineering Evolving?

The author proposes a shift in workflow paradigm:

Key trends:

References