Faster Rates for No-Regret Learning in General Games via Cautious Optimism
Ashkan Soleymani, Georgios Piliouras, Gabriele Farina
Abstract
We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T )$ per-player regret in multi-player general-sum games, where n is the number of players, d is the number of actions available to the player, and $T$ is the number of repetitions of the game. Our results improve exponentially the dependence on d in $O(n d \log T )$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c] and also reduce the dependence on the number of iterations T from $\log^4 T$ to $\log T$ compared to the Optimistic Hedge, the previously well-studied algorithm with $O(n \log d \log^4 T )$ regret [DFG21]. Our algorithm is obtained by combining the classic optimistic multiplicative weights update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players such that they become more cautious when their regret becomes too negative.