Gabriele Farina - Setting and equilibria: the Nash equilibrium

MIT 6.S890 — Topics in Multiagent Learning Tue, Sep 10th 2024
Lecture 2
Setting and equilibria: the Nash equilibrium
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
Normal-form games model simultaneous-move interactions with a single move (think about rock-
paper-scissors). Despite their simplicity, normal-form games will provide a natural ground for
looking into important concepts in multiagent settings, such as notions of equilibria (Nash, maxmin,
correlated, ...), and learning from repeated play. In the second part, we will move on to notions of
games that explicitly capture more complex phenomena, such as sequential moves and imperfect
information.
1 Normal-form games and the Nash equilibrium
When introducing a (finite) normal-form game, we need to specify the following quantities:
• The set of players [𝑛] = {1, ..., 𝑛}.
• For each player 𝑖 ∈ [𝑛], a finite set of actions 𝐴𝑖.
• For each player 𝑖 ∈ [𝑛], the payoff function 𝑢𝑖 : 𝐴1 × ⋯ × 𝐴𝑛 → ℝ.
To represent a normal-form game, it is customary to use a matrix representation.
Example 1.1. For instance, in the 2 × 2 game on the right, called “prisoner’s dilemma”,
the rows correspond to the actions of Player 1, and the columns
correspond to the actions of Player 2.
The entries at row 𝑖 and column 𝑗 are the payoffs of the two players
when Player 1 plays action 𝑖 and Player 2 plays action 𝑗.
Deny Confess
Deny −1, −1 −3, 0
Confess 0, −3 −2, −2
■ Strategies. A randomized strategy (also known as mixed strategy) for a generic player 𝑖 ∈ [𝑛] is a
distribution over the set of actions. We can represent such an object as a vector 𝑥𝑖 ∈ Δ(𝐴𝑖), that is,
such that 𝑥𝑖 ≥ 0 and ∑𝑎𝑖∈𝐴𝑖
𝑥𝑖,𝑎𝑖 = 1. To lighten the notational burden, we will write the expected
utility when all players play according to strategies 𝑥1, ..., 𝑥𝑛 reusing the same letter 𝑢𝑖 as the payoff,
that is
𝑢𝑖(𝑥1, ..., 𝑥𝑛) ≔ 𝔼𝑎1∼𝑥1...
𝑎𝑛∼𝑥𝑛
[𝑢𝑖(𝑎1, ..., 𝑎𝑛)] = ∑
𝑎1∈𝐴1
⋯ ∑
𝑎𝑛∈𝐴𝑛
𝑥1,𝑎1 ⋯ 𝑥𝑛,𝑎𝑛 ⋅ 𝑢𝑖(𝑎1, ..., 𝑎𝑛).
We will sometimes intersperse deterministic actions and mixed strategies freely and write expressions
such as 𝑢𝑖(𝑎1, 𝑥2, ..., 𝑥𝑛) to mean the expected utility when player 1 plays action 𝑎1 and the other
players play according to the strategies 𝑥2, ..., 𝑥𝑛.
1.1 Dominant-strategy equilibrium
The question of what constitutes rational play for players can get complicated depending on the
game. But, in some lucky cases, like the prisoner’s dilemma game above, it turns out that some
actions are just better than others, no matter what the other players do. In such cases, we say that
a player has a dominant strategy. In the case above, both Player 1 and Player 2 have a dominant
strategy to confess. In this case, we expect that the players will play their dominant strategy, and
this is called a dominant-strategy equilibrium.
1.2 Maxmin strategies
The benefit of dominant-strategy equilibria is that they require no counterspeculation: some strate-
gies just are better no matter what anyone else does. However, in many games, no player has a
dominant strategy. Consider, for example, rock-paper-scissor: all actions are symmetric, and no
action is strictly better than the others. How can we find a good strategy for that?
One way to think about this is to consider the worst-case scenario: what is the best strategy for a
player if they assume the other players are trying to minimize their payoff? This is the idea behind
maxmin strategies. A maxmin strategy for Player 𝑖 is a strategy 𝑥𝑖 that maximizes the minimum
payoff that Player 𝑖 can get, that is,
𝑥𝑖 ∈ arg max
𝑥𝑖∈Δ(𝐴𝑖)
min
𝑥𝑗∈Δ(𝐴𝑗)
for all 𝑗≠𝑖
𝑢𝑖(𝑥′
𝑖, 𝑥−𝑖),
where the notation 𝑥−𝑖 is popular syntectic sugar to denote the tuple (𝑥𝑗)𝑗≠𝑖.¹ Thinking back about
rock-paper-scissors, it is clear that the maxmin strategy is to play uniformly at random: the opponent
could exploit anything else by playing the counteraction more often.
The above idea has some merits, especially in two-player zero-sum games, that is, those two-player
games where 𝑢1(𝑎1, 𝑎2) + 𝑢2(𝑎1, 𝑎2) = 0 for all combinations of actions. In those games, players are
in direct competition, so it makes sense to assume that players will be “out to get us”. But in more
general games, the maxmin strategy can be too conservative, since it assumes that all other players
have nothing better going on than to minimize our payoff, even if that hurts them.
1.3 The Nash equilibrium
In general, defining what constitutes “optimal play” is tricky. But we can start from what is
convincingly not optimal play: if we predict that the players should play according to some strategies
𝑥1, ..., 𝑥𝑛, then it is not optimal if it turned out that any player would be better off by switching to
something else. This is the idea behind the Nash equilibrium.
Definition 1.1 (Nash equilibrium). A strategy profile (𝑥1, ..., 𝑥𝑛) ∈ Δ(𝐴1) × ⋯ × Δ(𝐴𝑛) is a
Nash equilibrium if no player benefits from unilaterally deviating from their strategy. In symbols,
∀𝑖 ∈ [𝑛], 𝑥′
𝑖 ∈ Δ(𝐴𝑖), 𝑢𝑖(𝑥′
𝑖, 𝑥−𝑖) ≤ 𝑢𝑖(𝑥1, ..., 𝑥𝑛).
Remark 1.1. Without loss of generality, when verifying if a profile (𝑥1, ..., 𝑥𝑛) is a Nash
equilibrium, it is sufficient to consider only deterministic deviations 𝑥′
𝑖 ∈ 𝐴𝑖. Indeed, if a player
has a profitable randomized deviation, this must mean that at least one of the actions they are
randomizing over is profitable.
It is clear that a dominant-strategy equilibrium is a special case of a Nash equilibrium, since in a
dominant-strategy equilibrium, by definition,
∀𝑖 ∈ [𝑛], 𝑥′
𝑖 ∈ Δ(𝐴𝑖), 𝑥′
−𝑖 ∈ Δ(𝐴𝑖), 𝑢𝑖(𝑥′
𝑖, 𝑥′
−𝑖) ≤ 𝑢𝑖(𝑥1, 𝑥′
−𝑖). (Dominant-strategy eq.)
(note the stronger quantifiers.) As we will discuss more in depth shortly, in two-player zero-sum
games, it turns out that Nash equilibrium and maxmin equilibrium are equivalent.
Before continuing, we consider a couple of examples that help illustrate a couple of important
properties of the Nash equilibrium.
Example 1.2. The only Nash equilibrium in the game of rock-paper-scissors is for all players to
play the uniform strategy. This shows that in some games, no Nash equilibrium exists in pure
(i.e., non-randomizing) strategies
Example 1.3 (Theater or football). Consider the following small game. Player 1 can either insist
on going to the theater or accept attending the football match. Player 2 can either insist on
going to the theater or accept attending the football match. The payoffs are as follows:
insist accept
insist 0, 0 5, 1
accept 1, 5 0, 0
This game has two obvious Nash equilibria: Player 1 insisting and Player 2 accepting, or vice
versa (top right and bottom left corners). However, there is a third equilibrium as well: both
players accept a probability of 1/6 and insist with a probability of 5/6.
This is not a coincidence: in two-player nondegenerate games, there is always an odd number
of Nash equilibria. This fact comes from more profound connections with some combinatorial
objects that we will uncover toward the end of the course.
2 Existence of mixed-strategy Nash equilibrium
In 1950, John Nash established one of the most celebrated results in game theory:² mixed-strategies
in Nash equilibria exist in all games, no matter the number of players or number of actions. The
proof of Nash is nonconstructive, and fundamentally boils down to showing that one can think of
Nash equilibria as fixed points. Two remarks are in order:
• The idea that Nash equilibria can be thought of as fixed points should feel extremely natural.
By definition, an equilibrium is a situation where nobody has an incentive to deviate—that is,
no player has a more profitable strategy given the strategies of the other players. Hence, if we
are able to introduce a function that maps a strategy profile to the profile of “most profitable
strategies for each player keeping everyone else fixed”, then a fixed point of such a function
would be a Nash equilibrium. We could then use one of the many theorems in analysis that
guarantee existence of fixed points.
Of course, the devil is in the details: how to handle multiple profitable responses? how to ensure
continuity of the deviation function? These are the questions that Nash had to answer.
• The idea of viewing Nash equilibria as fixed points is not just natural, but also the only possible.
We will make this formal towards the end of this course, when we discuss the computational
complexity of Nash equilibria. In particular, we will show that the computation of fixed points
of continuous functions and computation of Nash equilibria are computationally equivalent, in
the sense that each problem can be reduced to the other in polynomial time.
In the remainder of the section, we will give a proof of the existence of Nash equilibria. While the
first proof of Nash, J. F., Jr. [Nas50] invokes Kakutani’s fixed point theorem, a year later Nash
noticed that a much more elementary proof can be given [Nas51]. We present a variation of the
latter today.
2.1 The Nash improvement function
As mentioned above, one can think about Nash equilibria as fixed points of a “profitable response”
function from the set to mixed strategy to itself. Intuitively, this function must calculate a profitable
response for each player. Furthermore, to invoke fixed point theorems, this function must be
continuous. The key insight of Nash was to find a simple continuous function that, given a strategy
profile, calculates a “profitable response” for each player. For lack of a better term, I will refer to
this function with the term “Nash improvement function”.
■ Regret. To formally define the Nash improvement function, we first introduce a simple quantity
called regret, which will be a staple of this course. The regret that Player 𝑖 experiences with respect
to action 𝑎𝑖 ∈ 𝐴𝑖 is the difference between the payoff that Player 𝑖 would have obtained by playing
𝑎𝑖, and the payoff that Player 𝑖 actually obtained:
𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛) ≔ 𝑢𝑖(𝑎𝑖, 𝑥−𝑖) − 𝑢𝑖(𝑥1, ..., 𝑥𝑛).
■ The Nash improvement function. The idea now is simple: if an action 𝑎𝑖 has very large regret, then
the current strategy profile cannot be an equilibrium, because Player 𝑖 would want to increase the
probability of playing 𝑎𝑖. Thus, an “improved” strategy for Player 𝑖 should move more probability
mass to 𝑎𝑖. We need to handle two complications: (1) if multiple actions have positive regret, how
should we prioritize adding mass to those? and (2) for those actions whose regret is negative (that
is, “bad” actions), should we forcefully decrease the mass?
Nash’s answers to the above questions are as follows: (1) add mass to all actions with positive regret,
and the amount of mass added should be proportional to the regret; (2) do not decrease the mass for
actions with negative regret. To retain the fact that the output of the improvement function must
be a valid strategy, the step is renormalized so that the sum of the mass across all actions of any
player is 1. We can formalize this process by using the following definition.
Definition 2.1 (Nash improvement function [Nas51]). Let 𝑥1 ∈ Δ(𝐴1), ..., 𝑥𝑛 ∈ Δ(𝐴𝑛) be
arbitrary strategies. The Nash improvement function 𝜑 : Δ(𝐴1) × ⋯ × Δ(𝐴𝑛) → Δ(𝐴1) × ⋯ ×
Δ(𝐴𝑛) is the map
𝜑𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛) ≔ 𝑥𝑖,𝑎𝑖 + [𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛)]+
1 + ∑𝑎′
𝑖∈𝐴𝑖
[𝑟𝑖,𝑎′
𝑖 (𝑥1, ..., 𝑥𝑛)]+ (1)
for all player 𝑖 ∈ [𝑛] and action 𝑎𝑖 ∈ 𝐴𝑖. Here, [𝑟]+ ≔ max{0, 𝑟} denotes the positive part of 𝑟.
It is straightforward to verify that 𝜑 is well-defined and maps strategy profiles into strategy profiles.
Indeed, the numerator in 1 is always nonnegative, and the denominator is always at least 1, implying
that 𝜑𝑖,𝑎𝑖 ≥ 0 for all 𝑎𝑖 ∈ 𝐴𝑖 and player 𝑖 ∈ [𝑛]. Furthermore,
∀𝑖 ∈ [𝑛], ∑
𝑎𝑖∈𝐴𝑖
𝜑𝑖,𝑎𝑖 =
∑𝑎𝑖∈𝐴𝑖
(𝑥𝑖,𝑎𝑖 + [𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛)]+
)
1 + ∑𝑎′
𝑖∈𝐴𝑖
[𝑟𝑖,𝑎′
𝑖 (𝑥1, ..., 𝑥𝑛)]+ = 1,
where we used the fact that ∑𝑎𝑖∈𝐴𝑖
𝑥𝑖,𝑎𝑖 = 1 since 𝑥𝑖 is a valid strategy. Finally, observe that 𝜑 is a
continuous function. The following example visualizes the Nash improvement function in the small
games we have seen so far.
Example 2.1. The plots below visualize the displacement 𝜑(𝑥1, 𝑥2) − (𝑥1, 𝑥2) induced by the
Nash improvement function for four games, whose payoff matrices are noted below each plot,
after projecting away the probability of the first action of each player (and keeping around only
the probability of the second action, which is sufficient to uniquely recover the strategy of the
player since each player only has two actions). The black dots denote the fixed points of the
Nash improvement function. These correspond exactly to the Nash equilibria of the game, as we
make formal below.
0
0
1
1
ℙ[kick right]
ℙ[dive right]
Penalty shot game
dive L dive R
kick L −1, 1 1, −1
kick R 1, −1 −1, 1
0
0
1
1
ℙ[confess]
ℙ[confess]
Prisoner's dilemma
deny confess
deny −1, −1 −3, 0
confess 0, −3 −2, −2
0
0
1
1
ℙ[accept]
ℙ[accept]
Theater or football
insist accept
insist 0, 0 5, 1
accept 1, 5 0, 0
0
0
1
1
ℙ[hawk]
ℙ[dove]
Hawk-dove game
hawk dove
dove 0, 4 2, 2
hawk −2, −2 4, 0
The background of the plots highlights the angle of displacement
induced by the Nash improvement function, according to the gradient
wheel shown on the right. [If the color scheme seems arbitrary, as
a small spoiler it will play a fundamental role in the proof of the
computational complexity of Nash equilibria, which we will discuss
later on in this course. In particular, the three regions of the coloring
scheme will be key in defining an important combinatorial construc-
tion called Sperner coloring.]
yellow
blue
red
2.2 Quantifying the increase in utility of the improvement step
We validate our intuition that the Nash improvement function is a “profitable response” function.
The following result shows that if a player has positive regret for any action, then the Nash
improvement function unilaterally increases that player’s utility. This is a key property that will
allow us to show that the fixed points of the Nash improvement functions must be Nash equilibria.
Theorem 2.1. For any strategy profile (𝑥1, ..., 𝑥𝑛), and any player 𝑖 ∈ [𝑛], the Nash improvement
function 𝜑 satisfies
𝑢𝑖(𝜑𝑖(𝑥1, ..., 𝑥𝑛), 𝑥−𝑖) − 𝑢𝑖(𝑥1, ..., 𝑥𝑛) =
∑𝑎𝑖∈𝐴𝑖
([𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛)]+
)2
1 + ∑𝑎𝑖∈𝐴𝑖
[𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛)]+ .
So, if even one action of a player 𝑖 has positive regret, then the Nash improvement function
unilaterally strictly increases the utility of that player.
Proof . Since we are focusing on a generic player 𝑖 and keeping all the other ones fixed (and
playing strategies 𝑥−𝑖), we will reduce the notational burden by using the following shorthands:
𝑟𝑖,𝑎𝑖 ≔ 𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛), (regret of action 𝑎𝑖 of Player 𝑖 fixing 𝑥−𝑖)
𝑢𝑖,𝑎𝑖 ≔ 𝑢𝑖(𝑎𝑖, 𝑥−𝑖), (utility of action 𝑎𝑖 of Player 𝑖 fixing 𝑥−𝑖)
𝑥′
𝑖,𝑎𝑖 ≔ 𝜑𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛) = 𝑥𝑖,𝑎𝑖 + 𝑟+
𝑖,𝑎𝑖
1 + ∑𝑎′
𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎′
𝑖
(prob. of action 𝑎𝑖 in the improved strategy),
where the notation 𝑧+ is a shorthand for the positive part of 𝑧, that is, 𝑧+ ≔ [𝑧]+ ≔ max{0, 𝑧}.
The increase in utility is then computed as
∑
𝑎𝑖∈𝐴𝑖
𝑢𝑖,𝑎𝑖 ⋅ (𝑥′
𝑖,𝑎𝑖 − 𝑥𝑖,𝑎𝑖 ) = ∑
𝑎𝑖∈𝐴𝑖
𝑢𝑖,𝑎𝑖 ⋅
(
(( 𝑥𝑖,𝑎𝑖 + 𝑟+
𝑖,𝑎𝑖
1 + ∑𝑎′
𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎′
𝑖
− 𝑥𝑖,𝑎𝑖
)
))
= ∑
𝑎𝑖∈𝐴𝑖
𝑢𝑖,𝑎𝑖 ⋅
𝑟+
𝑖,𝑎𝑖 − ∑𝑎′
𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎′
𝑖 ⋅ 𝑥𝑖,𝑎𝑖
1 + ∑𝑎′
𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎′
𝑖
= 1
1 + ∑𝑎′
𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎′
𝑖 (
(( ∑
𝑎𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎𝑖 ⋅ 𝑢𝑖,𝑎𝑖 − ∑
𝑎′
𝑖∈𝐴𝑖
(𝑟+
𝑖,𝑎′
𝑖 ∑
𝑎𝑖∈𝐴𝑖
𝑥𝑖,𝑎𝑖 ⋅ 𝑢𝑖,𝑎𝑖 )
)
))
= 1
1 + ∑𝑎′
𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎′
𝑖 (
(( ∑
𝑎𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎𝑖 ⋅
(
((𝑢𝑖,𝑎𝑖 − ∑
𝑎′
𝑖∈𝐴𝑖
𝑢𝑖,𝑎′
𝑖 ⋅ 𝑥𝑖,𝑎′
𝑖
)
))
)
))
= 1
1 + ∑𝑎′
𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎′
𝑖
( ∑
𝑎𝑖∈𝐴𝑖
𝑟+
𝑖,𝑎𝑖 ⋅ 𝑟𝑖,𝑎𝑖 ).
Using the fact that 𝑧+ ⋅ 𝑧 = (𝑧+)2 for all 𝑧 ∈ ℝ, we obtain the statement. □
At this point, the following is a simple corollary.
Theorem 2.2. A strategy profile (𝑥1, ..., 𝑥𝑛) is a Nash equilibrium if and only if it is a fixed
point of the Nash improvement function 𝜑.
Proof . (⟹) If (𝑥1, ..., 𝑥𝑛) is a Nash equilibrium, then by definition for all 𝑖 ∈ [𝑛] and 𝑎𝑖 ∈ 𝐴𝑖,
we have 𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛) ≤ 0. Hence, for all 𝑖 ∈ [𝑛] and 𝑎𝑖 ∈ 𝐴𝑖, we have
𝜑𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛) = 𝑥𝑖,𝑎𝑖 + [𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛)]+
1 + ∑𝑎′
𝑖∈𝐴𝑖
[𝑟𝑖,𝑎′
𝑖 (𝑥1, ..., 𝑥𝑛)]+ = 𝑥𝑖,𝑎𝑖 ,
that is, (𝑥1, ..., 𝑥𝑛) is a fixed point of 𝜑.
(⟸) Conversely, suppose that (𝑥1, ..., 𝑥𝑛) is a fixed point of 𝜑. Then, for all 𝑖 ∈ [𝑛], from
Theorem 2.1 we have
∑𝑎𝑖∈𝐴𝑖
([𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛)]+
)2
1 + ∑𝑎𝑖∈𝐴𝑖
[𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛)]+ = 𝑢𝑖(𝜑𝑖(𝑥1, ..., 𝑥𝑛), 𝑥−𝑖) − 𝑢𝑖(𝑥1, ..., 𝑥𝑛) = 0.
Hence, it must be 𝑟𝑖,𝑎𝑖 (𝑥1, ..., 𝑥𝑛) ≤ 0 for all 𝑖 ∈ [𝑛] and 𝑎𝑖 ∈ 𝐴𝑖 (or the left-hand side would be
strictly positive), and therefore (𝑥1, ..., 𝑥𝑛) is a Nash equilibrium. □
By invoking Brouwer’s fixed-point theorem, we recover the central result of this lecture: Nash
equilibria always exist.
Corollary 2.1. Since 𝜑 is continuous and maps the nonempty compact convex set Δ(𝐴1) × ⋯ ×
Δ(𝐴𝑛) into itself, by Brouwer’s fixed point theorem, it has a fixed point. By Theorem 2.2, this
implies that every game has (at least) one Nash equilibrium in mixed strategies.
Bibliography
[Nas50] J. F. Nash Jr., “Equilibrium Points in N-Person Games,” Proc. Natl. Acad. Sci. U.S.A.,
vol. 36, no. 1, p. 48, Jan. 1950, doi: 10.1073/pnas.36.1.48.
[Nas51] J. Nash, “Non-Cooperative Games,” Annals of Mathematics, vol. 54, no. 2, pp. 286–295,
1951, [Online]. Available: http://www.jstor.org.ezproxyberklee.flo.org/stable/1969529
Changelog
• Sep 10: fixed location of NE dots in the rightmost plot of Example 2.1.
★These notes are class material that has not undergone formal peer review. The TAs and I are grateful for any
reports of typos.
¹This notation appears often in game theory, since we are often in studying the effect of changing a single player
𝑖‘s strategy, while keeping all “the other” strategies 𝑥−𝑖 fixed.
²John Nash went on to win the Nobel prize in economics for his fundamental contributions to game theory.

Content

Typo or question?

Metadata