Thursday, June 12, 2025

Machine Learning Model Errors

In this post, I describe different machine learning model errors and conduct a simulation to illustrate their behavior as model complexity, or flexibility, increases. Note: Because of my use of \(\LaTeX\) typesetting, this post is best viewed in "web" mode (as opposed to "mobile").

1. Setup and Motivating Problem

Consider two random variables \(X\) and \(Y\) with the following joint probability distribution \(F_{X, Y}\):

\begin{align} X &\sim Uniform(0, 10) \\ Y | X &\sim Normal(f(X), \sigma^2), \end{align}

where \(f(x) = x^2 - 8x + 20\) and \(\sigma^2 = 25\). The quadratic polynomial \(f(x)\) represents the "signal" in the relationship between \(X\) and \(Y\), whereas the variance parameter \(\sigma^2\) quantifies the "noise." Let \(\mathcal{T} = \{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}\) be a random sample, or training set, of size \(n = 25\) drawn from \(F_{X, Y}\).

The motivating problem is to use \(\mathcal{T}\) and least squares polynomial regression to build a model that predicts \(Y\) based on \(X\). The figure below shows a scatter plot of one possible training set. Also plotted are \(f(x)\) and regression fits \(\hat{f}(x)\) for six different polynomial models: constant, linear, quadratic, cubic, quartic, and quintic. As the degree of the polynomial increases, so does the number of model parameters (i.e., regression coefficients). Consequently, the model becomes more flexible and fits the training set better.

2. Training Error

The difference between the observed and predicted \(Y\) values on \(\mathcal{T}\) can be summarized by the model's training error. Training error is defined by

\begin{equation} Err_{\mathcal{T}}^{Train} = \frac{1}{n} \sum_{i=1}^{n} L \left(y_i, \hat{f}(x_i)\right),\tag{1} \end{equation}

where \(L\) is a loss function. Throughout this post, I use squared error loss:

\[L \left(y_i, \hat{f}(x_i)\right) = \left(y_i - \hat{f}(x_i)\right)^2.\]

Training error is an optimistic measure of the predictive performance of \(\hat{f}\) because the model is fit and evaluated on the same dataset.

3. In-Sample Error

Another type of model error is in-sample error, which quantifies how well \(\hat{f}\) predicts new \(Y\) values for the same \(X\) values observed in \(\mathcal{T}\). This provides insight into the optimism of training error described above. In-sample error is defined by

\begin{equation} Err_{\mathcal{T}}^{In} = \frac{1}{n} \sum_{i=1}^{n} E_{Y_i^{New} \mid \mathcal{T}} \left[ L \left(Y_i^{New}, \hat{f}(x_i)\right) \mid \mathcal{T} \right],\tag{2} \end{equation}

where \(Y_i^{New} \overset{ind}{\sim} F_{Y \mid X=x_i}\) \((i=1, \ldots, n)\). To emphasize, in-sample error is conditional on the training set \(\mathcal{T}\) and the fitted model \(\hat{f}\). In my setup, \(F_{X, Y}\) is both known and simple, so I can calculate \(Err_{\mathcal{T}}^{In}\) exactly.

4. Test Error

A superior measure of predictive performance comes from applying \(\hat{f}\) to new, previously unseen observations from \(F_{X, Y}\). Test error, also known as generalization error or out-of-sample error, is defined by

\begin{equation} Err_{\mathcal{T}}^{Test} = E_{X^{New}, Y^{New} \mid \mathcal{T}} \left[ L \left(Y^{New}, \hat{f}(X^{New})\right) \mid \mathcal{T} \right],\tag{3} \end{equation}

where \((X^{New}, Y^{New}) \sim F_{X, Y}\). As with in-sample error, test error is conditional on the training set \(\mathcal{T}\) and the fitted model \(\hat{f}\). In my setup, I can calculate \(Err_{\mathcal{T}}^{Test}\) exactly. In practice, however, test error is typically estimated by the average loss on an independent test set.

5. Expected Errors

Training, in-sample, and test errors in machine learning are defined in terms of a particular training set \(\mathcal{T}\) and fitted model \(\hat{f}\). To understand the expected behavior of these errors, one can consider averaging over all possible training sets of size \(n\) from \(F_{X, Y}\). The theoretical expected training, in-sample, and test errors are defined by the following:

\begin{equation} Err^{Train} = E_{\mathcal{T}} \left[Err_{\mathcal{T}}^{Train} \right]\tag{4} \end{equation} \begin{equation} Err^{In} = E_{\mathcal{T}} \left[Err_{\mathcal{T}}^{In} \right]\tag{5} \end{equation} \begin{equation} Err^{Test} = E_{\mathcal{T}} \left[Err_{\mathcal{T}}^{Test} \right]\tag{6} \end{equation}

In practice, these expected errors are estimated via resampling methods such as cross-validation and the bootstrap.

6. Simulation

To study all of these conditional and expected model errors, I conduct a simulation using my setup. I begin by randomly drawing \(B = 5{,}000\) training sets of size \(n = 25\) from \(F_{X, Y}\). For each training set \(\mathcal{T}_b\) \((b = 1, \ldots, B)\), I fit the six polynomial regression models (constant, linear, quadratic, cubic, quartic, and quintic) and calculate their training, in-sample, and test errors (\(Err_{\mathcal{T_b}}^{Train}\), \(Err_{\mathcal{T_b}}^{In}\), and \(Err_{\mathcal{T_b}}^{Test}\)). To estimate the expected errors (\(Err^{Train}\), \(Err^{In}\), and \(Err^{Test}\)), I average over the \(B\) corresponding conditional values. The table below displays these estimates of expected error.


Model
Flexibility
Estimate of Expected Error
Training In-Sample Test
Constant \(109.93\) \(111.51\) \(118.27\)
Linear \(73.24\) \(76.70\) \(89.35\)
Quadratic \(22.07\) \(27.99\) \(28.50\)
Cubic \(21.03\) \(29.03\) \(30.79\)
Quartic \(20.04\) \(30.03\) \(35.28\)
Quintic \(19.04\) \(31.02\) \(53.71\)

The figure below plots training and test error as a function of model flexibility for the first \(100\) training sets. Curves for the estimated expected errors are plotted with extra thickness. For reference, I also include a horizontal line at \(\sigma^2 = 25\), the noise variance. Observe that test error (in orange) always lies above this line.

My simulation illustrates the following points:

  • Training error decreases as model flexibility increases.
  • The constant and linear models are simplistic and suffer from underfitting.
  • The minimum expected test error is achieved by the quadratic model. This was anticipated as the true relationship between \(X\) and \(Y\) is quadratic \(\left(f(x) = x^2 - 8x + 20\right)\).
  • With increased flexibility, the cubic model starts to fit the noise in the data. However, it does not perform too badly in terms of test error.
  • The most flexible models, quartic and quintic, fit the noise even more closely and exhibit overfitting. They do not generalize well when predicting \(Y\) for new values of \(X\).

Sunday, July 3, 2022

Hobbies

Besides reading, I enjoy bike riding, going on long walks, and watching movies. Another one of my hobbies is assembling models from kits that come with sheets of wood or metal pieces. The models represent world landmarks, animals, and various objects such as windmills, ferris wheels, and airships. My last major project was an intricate mechanical clock. It was a lot of fun and took me about two weeks working in the evening to complete. I'm currently on a model-building moratorium, however, because I've run out of bookshelf space to display my work.

I also enjoy making my own computer games in my free time. My favorite languages to code in are Python, JavaScript, and HTML. I'm proud of my mini role-playing game called "Treasure Hunt!" in which you explore a small world for buried treasure. Below is a screenshot. It's satisfying returning to these gaming projects every so often to add features and polish. To download and play all of my games, visit my GitHub page at https://github.com/brian-dumbacher.

A former hobby of mine was playing the computer game Magic: The Gathering Arena (MTGA). MTGA is the electronic version of the popular collectible card game Magic: The Gathering. When I was younger, I had a decent collection of Magic cards but never really had the opportunity to play with others. I got into MTGA during the COVID-19 pandemic and played daily for four years. I really liked drafting, building decks around cards with interesting mechanics and flavor, and completing collections of entire sets. Click on the image below to see me cast the winning spell in a training match using my favorite deck.

Saturday, March 23, 2019

Books

The following is a list of my favorite books. I enjoy science fiction, adventures, and mysteries the most but also like to alternate between reading fiction and nonfiction.


Fiction: Contemporary


  • Horus Rising (The Horus Heresy) by Dan Abnett
  • The Water Knife by Paolo Bacigalupi
  • The Windup Girl by Paolo Bacigalupi
  • Ender's Game by Orson Scott Card
  • Magpie Murders by Anthony Horowitz
  • False Gods (The Horus Heresy) by Graham McNeill
  • Harry Potter and the Sorcerer's Stone by J.K. Rowling
  • The No. 1 Ladies' Detective Agency by Alexander McCall Smith

Fiction: Classic


  • The Good Earth by Pearl S. Buck
  • And Then There Were None by Agatha Christie
  • Robinson Crusoe by Daniel Defoe
  • The Adventure of the Reigate Puzzle (Sherlock Holmes) by Sir Arthur Conan Doyle
  • A Study in Scarlet (Sherlock Holmes) by Sir Arthur Conan Doyle
  • The Count of Monte Cristo by Alexandre Dumas
  • King Solomon's Mines by H. Rider Haggard
  • Catch-22 by Joseph Heller
  • Dune by Frank Herbert
  • Emil und die Detektive von Erich Kästner
  • The Sea-Wolf by Jack London
  • Der kleine Prinz von Antoine de Saint-Exupéry
  • Treasure Island by Robert Louis Stevenson
  • Der Schimmelreiter von Dominik Wexenberger nach Theodor Storm
  • Some Buried Caesar (Nero Wolfe) by Rex Stout
  • The Hobbit by J.R.R. Tolkien
  • Die Reise zum Mittelpunkt der Erde von Dominik Wexenberger nach Jules Verne
  • Warum es wichtig ist, ehrlich zu sein von Dominik Wexenberger nach Oscar Wilde
  • The Honjin Murders by Seishi Yokomizo and translated by Louise Heal Kawai

Nonfiction


  • Geography of the World by Simon Adams et al.
  • The Greatest Show on Earth: The Evidence for Evolution by Richard Dawkins
  • Team of Rivals: The Political Genius of Abraham Lincoln by Doris Kearns Goodwin
  • Word Power Made Easy by Norman Lewis

Friday, November 23, 2018

About

Welcome to my website! My name is Brian Dumbacher, and I like mathematics, statistics, computer programming, and geography. I enjoy working on projects that combine all of these interests. I live in Washington, DC and make the most of being so close to fun spots such as the Smithsonian Museums and the National Mall. Some of my hobbies include reading, traveling, and exercising.