Why Stephen Wolfram's research program is a dead end
Reddit discussion here and here.
Recently, Stephen Wolfram (of Mathematica and Wolfram Alpha fame) made waves on social media by posting “Finally We May Have a Path to the Fundamental Theory of Physics… and It’s Beautiful”. This is accompanied by two scientific papers by an employee, and a textbook-length introduction, all under the umbrella of “The Wolfram Physics Project”.
Today, I’ll explain why I think this entire research programme is a dead end, and largely a waste of time. That said, Wolfram’s latest work is based on his previous work so some background on Wolfram and his research is going to be necessary.
A New Kind of Science
Wolfram detached from the scientific community in the late 1980s. He re-emerged in 2002 with a 1200-page book called “A New Kind of Science” (ANKS). The book argues that all of science [footnote]Yes, all of it, from particle physics to macroeconomics [/footnote] should be modeled from cellular automata (CA), the kind of model made famous by John Conways’ “Game of Life”:
If Wolfram’s idea sounds a little out there, it’s because it is. ANKS was panned by academic reviewers. Cosma Shalizi doesn’t mince words:
It is my considered, professional opinion that A New Kind of Science shows that Wolfram has become a crank in the classic mold, which is a shame, since he’s a really bright man, and once upon a time did some good math, even if he has always been arrogant.
[…] I am going to keep my copy of A New Kind of Science, sitting on the same shelf as Atlantis in Wisconsin, The Cosmic Forces of Mu, Of Grammatology[footnote]Note the diss on Jacques Derrida in passing[/footnote], and the people who think the golden ratio explains the universe.
Melanie Mitchell echoes the feeling. Scott Aaronson, also reviews him negatively and even formally disproves some key ideas. Even the most charitable academic review I could find, by Lawrence Gray, ends this way:
In ANKS Wolfram says that “…the core of this book can be viewed as introducing a major generalization of mathematics”. In this he is entirely mistaken, but there are at least two ways in which he has benefited mathematics: he has helped to popularize a relatively little-known mathematical area (CA theory), and he has unwittingly provided several highly instructive examples of the pitfalls of trying to dispense with mathematical rigor.
The only notable scientific result from ANKS is that a particular CA rule, rule 110 is Turing Complete [footnote]eg. you can make a computer out of it; given enough time and space it can theoretically compute any function[/footnote]. Of course, the author of that proof was eventually sued by Wolfram for making it public while it was Wolfram’s “trade secret”.
Wolfram’s view on science
The main reason[footnote]Besides his ego[/footnote] Wolfram rejects the research in all other scientific fields is because of an idea he calls “Computational Irreducibility”. The idea is that if extremely simple systems like Conway’s Game of Life or Rule 110 can be Turing Complete, then we’re wasting our time trying to model the world, because no system will reduce a Turing Complete system without omitting something.
This leads to Wolfram’s core research idea: if simple systems like rule 110 can compute anything, then:
[…] perhaps, lying somewhere out there in the computational universe, is the rule for our physical universe
Wolfram doesn’t just want to model physics with CA, he wants to model biology, economics and everything else. The perceptive reader will notice that this idea conflicts with Wolfram’s other idea on computational irreducibility. Hold on to that thought, we’ll come back to it. First, a tangent on scientific modeling.
What makes a good scientific model?
Without diving deep into philosophy of science, scientific model should have these qualities to be useful: it should be precisely stated, it should be predictive of reality, and it should be descriptive.
- A model should be precisely stated. We need to state a model clearly in a set of mathematical equations to make it clear what is being described and measured. Paul Krugman’s classic essay Two Cheers for Formalism discusses why this is necessary in economics, where results are often politicized. But this is true for all scientific models. You can’t statistically measure what you didn’t write into an equation.
On this front, Wolfram is doing well: he wants to find a precise formula that reproduces the universe. It might be misguided, but it’s mathematically precise.
- A model should be predictive of reality, to respect the scientific method. It’s fine for some researchers to be focused on pure theory, but a research programme for all of science needs to eventually be put to the test against reality.
Wolfram’s research is far from being at the point where we test it against real world data, but let’s charitably assume he plans on it if he ever gets there.
- A model should be descriptive. It’s not enough to predict reality, a good model should do it with as few free parameters as possible, and the value of those parameters should explain something about the world.
For instance it’s well know that neural networks are universal function approximators and even Turing Complete. This predictive power is unsurprising, since neural network models have thousands to billions of free parameters.
Fitting a neural network against data makes a good predictive model, but it’s we don’t understand what a particular neural network is doing (without reverse engineering) so it makes a terrible scientific model [footnote]Even in my day job as a data scientist, I generally start with a TC model like a neural network for a project and then try to scale it back to an interpretable model like a generalized linear model wherever possible[/footnote]
Wolfram’s models are not descriptive
There is no inherent structure to CA models that predict real-world phenomenon. They’re only capable of predicting it as a byproduct of being Turing complete.
If we went ahead searching for the CA “rule of the universe”, we would end up with the rule that minimizes error between the CA output and observed. The problem that our search space is the set of possible CA rules and initial conditions. Once this is done [footnote]probably decades later[/footnote], the resulting model would be made up entirely of free parameters [footnote]the functional form itself is a free parameter[/footnote], because CA models have no inherently relevant structure before we fit them.
So given any particular set of observations of the world, we’d do no better than a neural network model doing the same task.
Wolfram’s latest research is fundamentally the same
Wolfram’s recent “Path to the Fundamental Theory of Physics” is based of what he did in ANKS. In fact it was sketched out in chapter 9.
Instead of a CA model updating cells over a grid, he describes rules which update nodes and edges in hypergraph. They then define the number of spatial dimensions as average connectivity in the graph[footnote]I have problems with this idea being foundational when average connectivity is a local measure[/footnote]. Over this, an Wolfram Research employee proves cool results that it can exhibit relativity, which echoes of what Matthew Cook did in the 1990s[footnote]Let’s hope Jonathan Gorard doesn’t get sued[/footnote].
Aaronson’s 2002 result still stands.
The kind of graphs Wolfram is studying being capable of exhibiting this physical behavior isn’t new to physicists in the domain(1, 2 and this was noted by ANKS reviewers. More importantly, Scott Aaronson’s review (section 3.2) proved this important negative result on the kind of model Wolfram is still studying:
This is ignored by all Wolfram and Gorard’s recent research I can find [footnote]And I read entirely too much of it in the last 2 days[/footnote].
Why Wolfram’s whole research programme is a dead end
Wolfram gets one thing correctly: the universe is complicated[footnote]citation needed[/footnote]. A lot of things are complex enough to be Turing Complete themselves, so predicting all of it is not possible.
Scientists’ answer to that problem is to find fundamental relationships which explain aspects of the universe, and try to unify them as possible. Wolfram’s answer is to try to find a “universal rule”, which is a doomed effort.
More is different
P.W. Anderson’s classic essay “More is different” argues that it’s impossible to “scale science up”, because emergent phenomenon invalidate valid models at smaller scales. In other words, there isn’t a general way to link models from one science to another like in this XKCD joke:
Anderson gives examples where a simple ammonia molecule need a whole different class of model than the model of the particles it’s made of to be coherent with symmetry.
Generally, modeling interactions between many objects require different classes of models than just adding together many models of the individual object, because something needs to capture emergent properties, which invalidate individual behavior.
One model per level
What this means is that if we want to model the world around us, we need to focus on specific niches. The only fully accurate model of the world is the world itself.
The bad news for Wolfram’s research programme is that he’ll never find the “one rule for the universe” in his lifetime, or in our solar system’s lifetime. And because his models have no particular structure that describes the problem[footnote]they’re a fully general class of models[/footnote], the best he can try to model focused problems like we do in machine learning, minimizing the difference between observation and model output.
Discretizing models of the universe is going the wrong direction
The reason a neural network can fit such a huge number of free parameters efficiently is that it exploits continuous math through gradients. An overarching trend in mathematical optimization is that optimizing discrete quantities is difficult (eg. NP-Complete or harder), and continuous optimization often easier.
Moreover, for neural networks we set the structure of the model (number of layers, etc.) before fitting the parameters. For a CA model or a “Wolfram model” (hypergraph growth rule model) we don’t even know the structure we’re looking for, increasing the search space by a lot.
So overall Wolfram’s research programme makes me think of the project where someone realized the MOV instruction is Turing complete, so he compiled the game Doom to a program that uses a single CPU instruction.
It renders one frame every 7 hours[footnote]Given that the execution speed of Mathematica makes even Python look fast, I get the inkling this isn’t a concern for Wolfram[/footnote].