Revisiting ‘All Models are wrong, some are useful’
Last week Ezra Klein reprised his 4 June 2021 podcast with Brian Christian, “If All Models are Wrong, Why Do we Give them so much power?”
Christian used George Box’s aphorism, “All models are wrong, some are useful” in his discussion of machine learning models. He outlined one egregious example, the self-driving Uber that killed a person in 2018.
Box rooted his insight based on experience with many projects and examples that involved mathematical expressions that a human could write and study.
Machine learning models, for example neural nets, typically have no simple expression for people to view and study. The models transform inputs into outputs with a lot of calculations that are layered on top of each other. Here’s a simple introduction to neural nets in the context of machine learning.
Just as in 20th century statistical models, machine learning models are shaped by the data used to construct them. As described by Christian, the data informing the model in the self-driving Uber accident had included pedestrians and bicyclists. The data set did not include the situation that set up the accident: a woman walking a bicycle. With a deadly consequence, the model failed to perform in a novel situation.
Christian goes on to say:
I think part of the danger that we have at the current moment is that our [machine learning] models are wrong in the way that all models are wrong, but we have given them the power to enforce the limits of their understanding on the world.
(from the transcript of the podcast).
I think the same danger has always held for people.
Humans take signals from the world and predict what will happen next.
We form internal models of the world “out there” from mental shortcuts and subject to all kinds of perceptual biases. We don’t see all there is to see, as demonstrated by the “invisible gorilla” experiment.
People surely act to enforce the limits of their understanding on the world. Those people with more power can enforce their limits more widely, potentially causing great damage in the world.
As I consider both mental models and machine learning models, I want to hold George’s advice dearly and clearly. All our models, while sometimes useful, are necessarily wrong and we and the world will be better off when we use our models with curiosity and humility.
Historical Notes
George offered his ‘all models are wrong’ guidance in the context of response surface modeling, which he and fellow investigators developed seventy years ago to provide solutions to chemical engineering problems. Look at the opening pages of Chapter 13 “Design Aspects of Variance, Bias, and Lack of Fit”, George E.P. Box and Norman Draper (1987), Empirical Modeling and Response Surface Methods, John Wiley & Sons, New York for an elegant summary of the George’s perspective.
How might you approximate a surface by a relatively simple function to get a sense of ups, downs, and local flatness? Box used polynomial functions, which many people first encounter in high school math. Polynomials that include linear and quadratic terms in predictor variables like pressure and temperature could, with careful construction, approximate a response variable like yield.
A small number of parameters—an overall average level and the coefficients of linear and quadratic terms in pressure and temperature --might offer an approximation that is good enough to help operators improve yield of a chemical reaction. With the computing methods of the 1950’s, the parameters in the polynomial function can be estimated without too much effort; statistical inference about uncertainty of the estimates follows from the theory of linear models built in the first half of the twentieth century.
In the chemical engineering example, a polynomial model is certainly wrong in ignoring chemical kinetics that can be described by more advanced mathematical functions. A polynomial model will surely fail to capture the relationship between yield and pressure and temperature over a wide range of these variables. Nonetheless, Box showed that polynomial models could be useful: the simple models could guide operators to revise settings of temperature and pressure to improve yield. Better still, a series of polynomial models, iteratively developed through a series of efficient designs could generate insights that enabled step-wise improvement.
Advice in lyrics
George Box wrote many lyrics to popular show tunes to entertain friends and students. Here’s an excerpt on holding models to account, sung to the tune of “It Ain’t Necessarily So”
Now models describe what we know
Yes, models describe what we know
But when individuals
Look at residuals*
It ain’t necessarily so.
(From my memory, a verse not included in the 1994 collection of songs “George & Bill in Concert”, Berthouex’s Basement Oct 28, 1983)
*Residuals are the differences between what a model predicts and actual values. Analysis of residuals drives assessment of model adequacy. Residual analysis starts with pictures and plots, for example using the R package auditor. Look for patterns and hints of model failing to track important features of the system of interest. The big challenge in residual analysis is to keep up the practice. As you and your model make new predictions, continue to compare actual values to those predictions. Does the model continue to be useful? Learning by doing, dynamic residual analysis is just the Plan-Do-Study-Act cycle: (1) plan = model. (2) do = make a prediction and observe actual system state (3) study = compare prediction and actual and reflect on the difference. (4) act = continue to use the model, revise it, or abandon it.
All links accessed 29 December 2022.