Gabriele Farina - Combinatorial games and Kernelized MWU (Part II)

MIT 6.S890 — Topics in Multiagent Learning Nov 5th−7th 2024
Lecture 14
Combinatorial games and Kernelized MWU
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
Extensive-form games belong to a larger class of games called combinatorial games. Combinatorial
games are characterized by the following properties:
• For every player 𝑖, the set of deterministic strategies can be represented as a set 𝒱𝑖 of bit
strings, that is, elements in {0, 1}𝑑𝑖 for some appropriate dimension 𝑑𝑖; and
• The utility of every player is a multilinear function of the strategies.
1 Examples of Combinatorial Games
Both normal-form games and extensive-form games, the classes of games we have been studying so
far, are examples of combinatorial games. In this section, we also show that several other important
classes of games are combinatorial games.
1.1 Normal-form games
In normal-form games, we already know that the utility of each player is multilinear in the distri-
butions over actions used by the players. The deterministic strategies correspond to deterministic
distirbutions, which have the form
(
(((((((((1
0
0
⋮
0)
)))))))))
,
(
(((((((((0
1
0
⋮
0)
)))))))))
,
(
(((((((((0
0
1
⋮
0)
)))))))))
, ...,
(
(((((((((0
0
0
⋮
1)
)))))))))
.
Hence, normal-form games are combinatorial games.
Remark 1.1. The number of deterministic strategies in a normal-form game is |𝐴𝑖|, where 𝐴𝑖 is
the set of actions of the player.
1.2 Extensive-form games
Perhaps slightly less obvious is that extensive-form games are combinatorial games. There, we already
know that the utility of each player is multilinear when using the sequence form representation of
strategies. Deterministic strategies for the tree in the sequence-form representation contain entries
in {0, 1}. As a reminder, these strategies correspond to the set of all determinstic contingency plans
for the entire tree. We illustrate this with an example.
Example 1.1. For example, consider the following two-player game, where black nodes belong
to Player 1 and white nodes belong to Player 2.
A
P Q
B C D
1 2
3 4 5 6 7 98 7 98
The information set D for Player 1 encodes the fact that Player 1 does not observe Player 2′s
action at Q. In this small game, the following 7 strategies 𝑣1, ..., 𝑣7 form the vertices of the
sequence-form polytope.
A
P Q
B C D
1 2
3 4 5 6 7 98 7 98
𝑣1 ≔ ( 1 2 3 4 5 6 7 8 9
1 0 1 0 1 0 0 0 0 )
A
P Q
B C D
1 2
3 4 5 6 7 98 7 98
𝑣2 ≔ ( 1 2 3 4 5 6 7 8 9
1 0 1 0 0 1 0 0 0 )
A
P Q
B C D
1 2
3 4 5 6 7 98 7 98
𝑣3 ≔ ( 1 2 3 4 5 6 7 8 9
1 0 0 1 1 0 0 0 0 )
A
P Q
B C D
1 2
3 4 5 6 7 98 7 98
𝑣4 ≔ ( 1 2 3 4 5 6 7 8 9
1 0 0 1 0 1 0 0 0 )
A
P Q
B C D
1 2
3 4 5 6 7 98 7 98
𝑣5 ≔ ( 1 2 3 4 5 6 7 8 9
0 1 0 0 0 0 1 0 0 )
A
P Q
B C D
1 2
3 4 5 6 7 98 7 98
𝑣6 ≔ ( 1 2 3 4 5 6 7 8 9
0 1 0 0 0 0 0 1 0 )
A
P Q
B C D
1 2
3 4 5 6 7 98 7 98
𝑣7 ≔ ( 1 2 3 4 5 6 7 8 9
0 1 0 0 0 0 0 0 1 )
Remark 1.2. We remark that the number of deterministic strategies in an extensive-form game is
in general exponential in the size of the game tree. In particular, we can easily bound the number
of deterministic strategies of a generic player 𝑖 as |𝒱𝑖| ≤ 𝐴|𝐼𝑖|
𝑖 , where 𝐼𝑖 is the set of information
sets of the player and 𝐴𝑖 is the maximum number of actions at any of those information sets.
In contrast, the dimension 𝑑𝑖 of the bit strings is just the sum of the number of actions across
all information sets of the player.
1.3 Resource allocation games
Another class of combinatorial games goes under the name of resource allocation games, of which the
game Colonel Blotto is the most famous example. In these games, the set of deterministic strategies
is the set of all possible ways to allocate a fixed amount of resources among a set of activities. In
Colonel Blotto, the resources are soldiers, and the activities are the battlefields. We can represent
these allocations using bit strings that represent a table where on the rows we have activities, and
on the columns we have amounts of allocated resources. We can then one-hot encode how many
resources we would like to allocate to each activity, making sure that the total number of resources
allocated matches the number of resources the player has.
1.4 Games on graphs
Another example of combinatorial games are games on graphs. In these games, the set of determin-
istic strategies is the set of all possible paths in the graph. Here, we can use bit strings of length
equal to the number of edges in the graph to represent the paths. We can then one-hot encode which
edges are traversed in the path.
1.5 Games on fixed-size subsets (aka. 𝑚-sets)
Finally, there exist games in which the actions correspond to picking a subsets of a given size. In
these games, the set of deterministic strategies is encoded as binary strings that one-hot encode
which of the elements are selected, that is,
𝒱𝑖 = {𝑥 ∈ {0, 1}𝑑 : ‖𝑥‖1 = 𝑚}.
The setting above goes under the name of m-sets in the machine learning literature.
2 Learning in combinatorial games and kernelization
How can we learn in a combinatorial game? To start, if we had a way to compute projections onto the
convex hull of 𝒱𝑖 (i.e., the polytope spanned by 𝒱𝑖), we could perform projected gradient descent.
However, computing projections onto the convex hull of 𝒱𝑖 might be hard. Furthermore, the regret
bounds (and hence equilibrium approximation rates) are typically far from optimal. For example, in
normal-form games we have observed that projected gradient descent achieves a regret of 𝑂(√𝑑𝑖𝑇 ),
while the optimal regret is 𝑂(√log(𝑑𝑖)𝑇 ), an exponential improvement in terms of the dimension of
the set. We were able to achieve the latter, optimal bound using the multiplicative weights update
(MWU) algorithm.
The reasons to like the MWU algorithm extend even beyond the logarithmic dependence on the
number of actions. As we saw in Lecture 6, the optimistic variant of MWU (OMWU) achieves a very
strong dependence on the number of repetitions 𝑇 , going from a √𝑇 dependence to a 𝑂(polylog(𝑇 ))
dependence.
Alas, multiplicative weights is an algorithm that is designed for probability simplices, and it is not
clear how to apply it to combinatorial games. In this lecture, we show how to kernelize the MWU
algorithm to combinatorial games, and how to implement it efficiently in several important classes
of combinatorial games.
2.1 Multiplicative weights on the set of deterministic strategies
We can use MWU (or OMWU) in combinatorial games by letting it pick distirbutions over the
set of deterministic strategies 𝒱𝑖. In other words, we can run MWU to keep tallies on how each
deterministic strategy is performing, and then use these tallies to compute the next distribution over
the strategies (of course, skewing the distribution towards better-performing strategies). We call this
approach “Vertex MWU”.
Algorithm 1: Vertex MWU/OMWU (Player 𝑖)
1 𝜆(1) ≔ uniform distribution over 𝒱𝑖
2 𝑔(0) ≔ 0 ∈ ℝ𝑑𝑖
3 function NextStrategy()
4 Play mixed strategy 𝑥(𝑡) ≔ ∑𝑣∈𝒱𝑖
(𝜆(𝑡)[𝑣] ⋅ 𝑣) // How to do efficiently?
5 function ObserveUtility(𝑔(𝑡))
6 (If optimistic) Perform optimistic correction ̃ 𝑔(𝑡) ≔ 2𝑔(𝑡) − 𝑔(𝑡−1)
7 (If not optimistic) Let ̃ 𝑔(𝑡) ≔ 𝑔(𝑡)
8 Update 𝜆(𝑡+1)[𝑣] ∝ 𝜆(𝑡)[𝑣] ⋅ exp{𝜂 ⋅ ⟨̃𝑔(𝑡), 𝑣⟩} for all 𝑣 ∈ 𝒱𝑖 // How to do efficiently?
Remark 2.1. In the game setting (i.e., what we called the “canonical learning setup” in Lecture
4), the inputs to ObserveUtility are the gradients of the utility. These can be always computed
in polynomial time in 𝑑 and 𝑛.
2.2 Main result
It is not obvious that we can implement the Vertex MWU/OMWU algorithm (Algorithm 1) effi-
ciently, that is, without paying linear time in |𝒱𝑖| when the latter set is exponential in the dimension
𝑑𝑖. Yet, as pointed by Farina, G., Lee, C.-W., Luo, H., & Kroer, C. [Far+22], that is possible in many
cases, including all settings mentioned above. The key insight is that we can use a kernel function to
compute the expectations and the gradients in the algorithm efficiently. Specifically, in this lecture
we will prove the following result. For simplicity, we will assume a player has been fixed and drop
the subscript 𝑖 from the notation.
Theorem 2.1 (Main theorem). There exists a kernel function 𝐾𝒱 : ℝ𝑑 × ℝ𝑑 → ℝ, which depends
on the combinatorial structure of the set 𝒱 of the player, such that Algorithm 1 can be
implemented using 𝑑 + 1 evaluations of 𝐾𝒱 at each iteration.
We call this kernelized implementation of Algorithm 1 the Kernelized MWU/OMWU algorith,
abbreviated as KMWU or KOMWU depending on whether optimism is used.
Note that Theorem 2.1 is crucially independent on the number of strategies |𝒱|, and only depends on
the dimension 𝑑 of the strategy set, which is polynomial in the size of the input game (for example,
in extensive-form games, 𝑑 is upper bounded by the number of edges in the game tree)! The main
takeaway is the following.
Corollary 2.1. As long as the kernel function 𝐾𝒱 can be evaluated efficiently, then Algorithm 1
can be simulated efficiently too.
2.3 The 0/1-polyhedral kernel
In order to introduce the notion of 0/1-polyhedral kernel, we need to first introduce the notion of
0/1-polyhedral feature map.
Definition 2.1 (0/1-polyhedral feature map). The 0/1-polyhedral feature map 𝜙𝒱 is defined as
𝜙𝒱 : ℝ𝑑 → ℝ𝒱, 𝜙𝒱(𝑥)[𝑣] ≔ ∏
𝑘:𝑣[𝑘]=1
𝑥[𝑘] ∀𝑣 ∈ 𝒱.
As is common with kernels, we define the 0/1-polyhedral kernel as the inner product of the feature
maps.
Definition 2.2 (0/1-polyhedral kernel).
𝐾𝒱 : ℝ𝑑 × ℝ𝑑 → ℝ, 𝐾𝒱(𝑥, 𝑦) ≔ ⟨𝜙𝒱(𝑥), 𝜙𝒱(𝑦)⟩ = ∑
𝑣∈𝒱
∏
𝑘:𝑣[𝑘]=1
𝑥[𝑘]𝑦[𝑘].
2.4 Keeping track of the distribution over vertices
We start from showing how the 0/1-polyhedral kernel can help implement Line 8 of Algorithm 1
efficiently. The key insight is that the strategies 𝜆(𝑡) are fully captured by the feature map of a low-
dimensional vector at all iterations.
Theorem 2.2. At all times 𝑡, the distribution 𝜆(𝑡) over strategies 𝒱 computed by Algorithm 1 is
proportional to the feature map of the vector
𝑏(𝑡) ≔ exp{𝜂 ∑
𝑡−1
𝜏=1
̃
𝑔(𝜏)}.
Proof . We prove the result by induction over the time 𝑡.
• Base case. At time 𝑡 = 1, we have
𝑏(1) = 1 ⟹ 𝜙𝒱(𝑏(1)) = 1 ∝ 1
|𝒱| = 𝜆(1).
• Inductive step. At time 𝑡 + 1, the probability of the strategy 𝑣 ∈ 𝒱 computed by KOMWU is
𝜆(𝑡+1)[𝑣] ∝ 𝜆(𝑡)[𝑣] ⋅ exp{𝜂⟨̃𝑔(𝑡), 𝑣⟩}.
The key insight now is that since 𝑣 ∈ {0, 1}𝑑, then
exp{𝜂⟨̃𝑔(𝑡), 𝑣⟩} = exp
{{{
{{
𝜂 ∑
𝑘:𝑣[𝑘]=1
̃
𝑔(𝑡)[𝑘]
}}}
}}
= ∏
𝑘:𝑣[𝑘]=1
exp{𝜂̃𝑔(𝑡)[𝑘]}.
Substituting the inductive hypothesis,
𝜆(𝑡+1)[𝑣] ∝ 𝜙𝒱(𝑏(𝑡))[𝑣] ⋅ ∏
𝑘:𝑣[𝑘]=1
exp{𝜂̃𝑔(𝑡)[𝑘]}
=
(
(( ∏
𝑘:𝑣[𝑘]=1
𝑏(𝑡)[𝑘]
)
)) ⋅
(
(( ∏
𝑘:𝑣[𝑘]=1
exp{𝜂̃𝑔(𝑡)[𝑘]}
)
))
= ∏
𝑘:𝑣[𝑘]=1
𝑏(𝑡+1)[𝑘]
= 𝜙𝒱(𝑏(𝑡+1))[𝑣],
completing the proof. □
In fact, we can even slightly refine the previous result by quantifying exactly the proportionality
constant. We do so in the next corollary.
Corollary 2.2. At all times 𝑡, one has
𝜆(𝑡) = 𝜙𝒱(𝑏(𝑡))
𝐾𝒱(𝑏(𝑡), 1) .
Proof . Since we know from Theorem 2.2 that 𝜆(𝑡)[𝑣] ∝ 𝜙𝒱(𝑏(𝑡))[𝑣], and the sum ∑𝑣∈𝒱 𝜆(𝑡)[𝑣] =
1, the proportionality constant must be the inverse of
∑
𝑣∈𝒱
𝜙𝒱(𝑏(𝑡))[𝑣] = ∑
𝑣∈𝒱
𝜙𝒱(𝑏(𝑡))[𝑣] ⋅ 1 = ∑
𝑣∈𝒱
𝜙𝒱(𝑏(𝑡))[𝑣] ⋅ 𝜙𝒱(1)[𝑣] = 𝐾𝒱(𝑏(𝑡), 1),
completing the proof. □
2.5 Reconstructing the expectation
We now show how one can reconstruct the expectation
∑
𝑣∈𝒱
(𝜆(𝑡)[𝑣] ⋅ 𝑣)
which is needed in NextStrategy. The key is provided in the next theorem, which extends a nice
insight by Takimoto, E., & Warmuth, M. K. [TW03] .
Theorem 2.3. At all times 𝑡, the expectation ∑𝑣∈𝒱 𝜆(𝑡)[𝑣] ⋅ 𝑣 can be computed via 𝑑 + 1 kernel
computations according to the formula
∑
𝑣∈𝒱
(𝜆(𝑡)[𝑣] ⋅ 𝑣) = (1 − 𝐾𝒱(𝑏(𝑡), 𝑒1)
𝐾𝒱(𝑏(𝑡), 1) , ..., 1 − 𝐾𝒱(𝑏(𝑡), 𝑒𝑑)
𝐾𝒱(𝑏(𝑡), 1) ),
where
𝑒𝑘 ≔ (1, ..., 1, 0, 1, ..., 1) = 1 − 𝑒𝑘 ∈ ℝ𝑑
is the vector that is equal to 1 in all coordinates except the 𝑖-th one where it is zero.
Proof . The key insight is that, for all 𝑘 ∈ [𝑑],
𝜙𝒱(𝑒𝑘)[𝑣] = ∏
𝑗:𝑣[𝑗]=1
𝑒𝑘[𝑗] = {0 if 𝑣[𝑘] = 1
1 otherwise
= 1 − 𝑣[𝑘].
So,
(𝜙𝒱(1) − 𝜙𝒱(𝑒𝑘))[𝑣] = 𝜙𝒱(1)[𝑣] − 𝜙𝒱(𝑒𝑘)[𝑣] = 1 − (1 − 𝑣[𝑘]) = 𝑣[𝑘],
and we can write, for all 𝑘 ∈ [𝑑],
(∑
𝑣∈𝒱
𝜆(𝑡)[𝑣] ⋅ 𝑣)[𝑘] = ∑
𝑣∈𝒱
(𝜆(𝑡)[𝑣] ⋅ (𝜙𝒱(1) − 𝜙𝒱(𝑒𝑘))[𝑣])
= 1
𝐾𝒱(𝑏(𝑡), 1) ∑
𝑣∈𝒱
𝜙𝒱(𝑏(𝑡))[𝑣] ⋅ (𝜙𝒱(1)[𝑣] − 𝜙𝒱(𝑒𝑘)[𝑣])
= 1
𝐾𝒱(𝑏(𝑡), 1)(𝐾𝒱(𝑏(𝑡), 1) − 𝐾𝒱(𝑏(𝑡), 𝑒𝑘)) = 1 − 𝐾𝒱(𝑏(𝑡), 𝑒𝑘)
𝐾𝒱(𝑏(𝑡), 1) ,
which is the statement. □
3 Examples of efficiently-computable kernels
The previous theorems show that, as long as the kernel can be computed efficiently, then the OMWU
algorithm can be simulated efficiently even if the cardinality of every 𝒱𝑖 is exponential in the size of
the input game.
3.1 Hypercube
When 𝒱 = {0, 1}𝑑, then
𝐾𝒱(𝑥, 𝑦) = (1 + 𝑥[1] 𝑦[1])(1 + 𝑥[2] 𝑦[2]) ⋅ ... ⋅ (1 + 𝑥[𝑑] 𝑦[𝑑])
can be evaluated in linear time in 𝑑.
3.2 Multiple choices: m-sets
When 𝒱 = {𝑥 ∈ {0, 1}𝑑 : ‖𝑥‖1 = 𝑚}, the kernel can be computed efficiently by using dynamic
programming or equivalently, by considering the polynomial of 𝛾 given as
(1 + 𝛾 ⋅ 𝑥[1]𝑦[1])(1 + 𝛾 ⋅ 𝑥[2]𝑦[2])⋯(1 + 𝛾 ⋅ 𝑥[𝑑]𝑦[𝑑])
When expanding the product (which can be done in polynomial time in 𝑑), the coefficient of the
term 𝛾𝑚 is the kernel value, since it will be precisely the sum of the products of 𝑥, 𝑦 at all choices
of subsets of 𝑚 coordinates.
3.3 Set of flows in a DAG
In this case, the kernel can be computed in linear time in the number of DAG edgs by performing
dynamic programming on the topological order of the DAG. This case was already covered by
Takimoto, E., & Warmuth, M. K. [TW03], though we remark that in kernelized OMWU the concept
of weight pushing is not needed, so the final algorithms are slightly different.
3.4 Extensive-form games
In particular, as we will see below, KOMWU can be implemented with linear-time iterations in the
number of sequences 𝑑𝑖.
■ Worst-case linear complexity for a single evaluation. We start by verifying that the sequence-
form kernel can be evaluated in linear time for any pair of points 𝑥, 𝑦 ∈ ℝ𝑑𝑖 . To do so, we introduce
a partial kernel function 𝐾𝐼 : ℝ𝑑𝑖 × ℝ𝑑𝑖 → ℝ for every information set 𝐼 ∈ ℐ,
𝐾𝐼 : ℝ𝑑𝑖 × ℝ𝑑𝑖 → ℝ, 𝐾𝐼 (𝑥, 𝑦) ≔ ∑
𝑣∈𝒱⪰𝐼
∏
𝑘:𝑣[𝑘]=1
𝑥[𝑘]𝑦[𝑘].
where 𝒱⪰𝐼 denotes the projection of the strategy set 𝒱 onto only those actions at 𝐼 or below (removing
duplicates). We have the following.
Theorem 3.1. For any vectors 𝑥, 𝑦 ∈ ℝ𝑑𝑖 , the two following recursive relationships hold:
𝐾𝒱(𝑥, 𝑦) = ∏
𝐼∈ℐtop
𝐾𝐼 (𝑥, 𝑦), (1)
where ℐtop denotes the “top-level” information sets (those information sets that contain nodes
where the player is acting for the first time), and, for all information sets 𝐼 ∈ ℐ,
𝐾𝐼 (𝑥, 𝑦) = ∑
𝑎∈𝒜(𝐼)(
((𝑥[𝐼𝑎]𝑦[𝐼𝑎] ∏
𝐼′∈𝒞(𝐼)
𝐾𝐼′ (𝑥, 𝑦)
)
)), (2)
where 𝒞(𝐼) denotes those information sets that are immediate successors of 𝐼. In particular,
(1) and (2) give a recursive algorithm to evaluate the polyhedral kernel 𝐾𝒱 associated with the
strategy space of any player 𝑖 in an imperfect-information extensive-form game in linear time in
𝑑𝑖.
Corollary 3.1. For each player 𝑖, KOMWU can be implemented in polynomial time in the size
of the game tree. As a consequence, all the regret guarantees of OMWU can be extended to the
setting of imperfect-information extensive-form games as a black box.
We also make the following tangential remarks.
Remark 3.1. The feasibility of the kernelization approach in extensive-form games counters
the common belief that learning in the normal-form equivalent of an extensive-form game is
computationally infeasible.
Remark 3.2. Surprisingly, even for the non-optimistic version the kernelized MWU algorithm
achieves better regret bounds than all prior algorithms for learning in extensive-form games.
■ Amortizated constant-time kernel computation. We can actually refine the result above by showing
an implementation of KOMWU with linear-time per-iteration complexity in the size of the game
tree, by exploiting the structure of the particular set of kernel evaluations needed at every iteration
and amortizing computation of the 𝑑 + 1 kernels required by KOMWU at each iteration.
Bibliography
[Far+22] G. Farina, C.-W. Lee, H. Luo, and C. Kroer, “Kernelized Multiplicative Weights for 0/1-
Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-
Form Games,” in International Conference on Machine Learning, 2022.
[TW03] E. Takimoto and M. K. Warmuth, “Path kernels and multiplicative updates,” Journal of
Machine Learning Research, vol. 4, pp. 773–818, 2003.
★These notes are class material that has not undergone formal peer review. The TA and I are grateful for any
reports of typos.

Content

Typo or question?

Metadata