# A Nobel worthy game theory problem

In light of the excellent 2021 Economics Nobel choice [footnote]Proud member of the microeconometrics gang[/footnote], I want to explain a game theory problem which I think anyone giving a good solution to is getting the economics Nobel in a few decades. Don’t hold your breath on actually solving it, though.

Back when I used to play poker professionally, I spent a significant amount of time researching the game. This entailed working with the output CFRM-based algorithms [footnote]kind of similar to what serious chess players do with chess engines[/footnote] and building strategies that adapt from “perfect” algorithmic play to exploit imperfect opponents[footnote]Something you don’t see in chess, because chess is a perfect information game, so the best play is the same regardless of the opponent. In imperfect information games like poker, the best play is opponent dependent[/footnote]. Our problem today is related to that latter task.

# The Game Theory of Poker

All games have at least one Nash Equilibrium (NE), a pair of strategies where each player can only worsen their outcome by deviating from it. Poker specifically has a mixed strategy Nash equilibrium - in a particular situation, the Nash equilibrium strategy with two Aces might be to raise a bet 60% of the time (chosen at random) and call it the remaining 40% of the time. Doing the same thing every time in poker is easy to exploit for a smart opponent.

The NE strategy has great properties in poker: by using it, **you can only win in the long run**. This is derived by two properties:

The NE guarantees that a player can only lose by deviating from it

Poker is a

*Zero-Sum Game*- what you win is exactly what your opponent loses, and vice versa. The sum of the gains of the two players is zero.

If your opponent deviates from the NE strategy in poker, and you play the NE strategy, he can only do equal or worse than zero. Thus, you can only do greater than or equal to zero - you can only win in the long run.

This is powerful! Poker algorithms try to converge to the NE strategies, because you can blindly play them and expect to win.

**Problems with Nash Strategies**

A core problem with the Nash Equilibrium concept is that in general games, there are often plenty of Nash equilibrium points in a game, many of which nonsensical. For instance, take this game [footnote]This is the standard notation for a normal form game; rows and columns are actions by players 1 and 2, and the payoff for each player is in the parentheses[/footnote]:

A | B | |
---|---|---|

A | (1,1) | (0,0) |

B | (0,0) | (0,0) |

The pair of actions *(A, A)* is a Nash Equilibrium, but so is *(B, B)*! You can’t do better than 0 if your opponent always plays B, so *(B, B)* is a NE point even if it’s a completeely nonsensical one.

This nonsense motivates the literature on solution concepts in game theory: finding equilibrium refinements which root out unrealistic solutions. For instance, the concept of a Trembling Hand Equilibrium roots out *(B, B)*.

# Equilibrium stability and the exploitation problem

Another problem with Nash Equilibria, even realistic ones, is that they don’t even necessarily make sense against non-equilibrium strategies.

Take the game Rock-Paper-Scissors (RPS) as an example. Clearly the one mixed strategy Nash equilibrium is *(0.33, 0.33, 0.33)* - at any point you play rock, paper and scissor with 33% probability. Since RPS is also a zero-sum game, this guarantees both players net zero long run.

But what if your opponent played *(0.25, 0.37, 0.37)*, throwing rock only 25% of the time? The NE strategy then *still nets zero long run*. By playing an even mix, the NE strategy doesn’t exploit the opponent’s weakness. You need to throw paper more often than scissors to profit against a rock player in RPS.

This concept remains similar for poker - if someone bluffs too often, or not enough, the NE strategy doesn’t exploit this. In poker, the NE strategy might be profitable against unbalanced strategies (unlike in RPS) but it won’t properly exploit the bad strategies.

# Problem Statement

Now we’re ready to informally state the namesake problem of this post:

Define a metric which, given a two player game and an associated Nash equilibrium, quantifies the stability of the Nash equilibrium point to off-equilibrium strategies

For instance, in Rock-Paper-Scissors, the Nash equilibrium is somewhat unstable - if your opponent throws rock 25% of the time, you’re incentivized to deviate from the Nash equilibrium strategy and throw paper 100% of the time. Of course, deviating to a 100% paper strategy would incentivize further adjustment from the opponent, so maybe you just want to play something like *(0.27, 0.45, 0.27)* to avoid getting noticed, etc.

This explains why algorithmic contests exist for RPS - despite the fact that everyone knows the equilibrium strategy, the incentive is continuously there to deviate from it, and adjust to expected deviations. This leads real RPS games to rarely be played at the equilibrium point. They’re played *around it* instead. The RPS Nash Equilibrium is not particularly stable.

**Why it matters**

Getting a good solution to this problem help us understand where behavior differs from Nash Equilibria (or which equilibrium gets picked).

For instance, if you were writing a perfect poker player algorithm, you’d want to code when to deviate from the Nash Equilibrium strategy based on the opponent’s strategy. The Nash Equilibrium stability of the situation would help dictate *how far you can adjust*.

Should you play paper 40% or 80% of the time against an opponent playing rock 25% of the time in Rock-Paper-Scissors?

How does this ratio differ from game to game or from situation to situation within a game?

# Why this is difficult

Take the classic prisonner’s dilemma game:

A/B | Cooperate | Betray |
---|---|---|

Cooperate | (-1, -1) | (-3, 0) |

Betray | (0, -3) | (-2, -2) |

The only Nash Equilibrium here is *(Betray, Betray)*. But when played in real life, or played repeatedly between players, all sorts of things happen! We are smart enough to recognize that the unstable *(Cooperate, Cooperate)* outcome is preferable to the stable *(Betray, Betray)*, but have a hard time coordinating [footnote]There are a few valid Nash Equilibrium strategies in the repeated game depending on patience, see link[/footnote] so the actual outcome of the prisonner’s dilemma is unstable.

**A solution should take game payoffs into account**

Take this modified prisonner’s dilemma game:

A/B | Cooperate | Betray |
---|---|---|

Cooperate | (-1, -1) | (-300, 0) |

Betray | (0, -300) | (-200, -200) |

Intuitively, people should cooperate more often in this game than in the game where the relative punishment of betrayal was milder. If you are quantifying stability of equilibria, your metric should fluctuate based on such parameters.

**A solution should take game structure into account**

Coming back to nonsense game from earlier:

A | B | |
---|---|---|

A | (1,1) | (0,0) |

B | (0,0) | (0,0) |

Clearly *(B, B)* should somehow be scored as extremely unstable and *(A, A)* as extremely stable.

# Actually, you probably won’t solve it

Not to be discouraging, but I doubt anyone in my readership will find an elegant solution to the general form problem. Having spent weeks on the problem over the years, I don’t even think an elegant general solution even exists.

A notable paper on the topic is “On the Strategic Stability of Equilibria” by Kohlberg & Mertens (1986) which attempts an easier variation of our problem, defining which equilibria are stable and which aren’t (rather than quantifying it). Now, Kohlberg & Mertens arent Nobel winners, but…

**Many Nobel prize winners have tried**

A lot of work in game theory derives from the Nash Equilibrium being unsatisfactory. Here’s a quick list of work on equilibrium selection in game theory from people who won a Nobel prize in economics:

Roger Myerson proposed the proper equilibrium

Richard Selten introduced the Trembling Hand Equilibrium

Paul Milgrom was working on refinements to the Proper Equilibrium this year. He’s also been busy working on equilibrium selection in auctions recently.

John Nash, of course, found the Nash Equilibrium. John Harsanyi also worked on equilibrium selection.

To put it lightly, you’d be in good company to make a significant advance on the problem of equilibrium selection in game theory.

Now, to be clear, the problem I’m proposing isn’t exactly *equilibrium selection*, but rather some form of *quantification of equilibrium stability*. They’re related problems, and making an inroad on equilibrium stability quantification would (presumably) make a large advance in equilibrium selection.

But the stability quantification problem is within continuous mathematics rather than discrete equilibrium selection, and the approaches are different.

**What a solution might look like**

Again, I don’t think there is an elegant general solution to the quantification problem we proposed earlier. Here’s couple of things that are likely necessary in your solution:

- A
**theory of observability**. Go back to Rock-Paper-Scissors: unlike in prisonner’s dilemma, where the opponent’s action is obvious, in RPS you only observe the*strategic frequency*. So you would have to keep a Bayesian update to your opponent’s frequencieswhen off-equilibrium.

This is more complex than it seems! For instance, in poker, *observability is partial*. If you decide to bluff more often against someone who is weak, they will only observe the frequencies at which you bet, until they call you and observe the hand you bluffed with.

- A
**theory of learning**. You will also need a model of how people evolve their strategies. This could be some form of reinforcement learning like counterfactual regret minimization (this is the algorithm used to solve poker, by the way), but should generally provide update rules given current game information and observed outcomes, and converge towards a Nash Equilibrium asymptotically.

As said, I’ve worked the equivalent of several full time weeks on this problem over the years and have never been able to come up with a workable algorithm or closed form solution, even when restricting myself only to the game of Poker (let alone the general form problem).

Good luck!