Learning by playing
AI agents often face situations where they have to make decisions in highly non-stationary environments, which can arise from factors like the presence of other agents making decisions at the same time or strategic manipulations of data by an adversary. As a result, developing machine learning algorithms that can effectively learn and make decisions in such dynamic and reactive environments is crucial.
In scenarios where an AI agent interacts with other agents in the same environment, the learning algorithm should consider the interactions between each agent's decisions, objectives, and the impact of decisions on the environment. In these multi-agent learning scenarios, the simple algorithms based on gradient descent often perform poorly and cannot guarantee good solutions. For instance, let's consider a scenario where two AI agents are playing a game and aiming to learn a winning strategy. If both agents use standard optimization algorithms like gradient descent, they may become trapped in cyclic patterns, resulting in the failure to converge to good strategies. This highlights the importance of developing customized algorithms for these multi-agent learning tasks.
A crucial step towards designing successful multi-agent learning algorithms is incorporating principles from game theory into the design of the algorithm. In particular, such algorithms should be able to take into account the incentives of other entities involved in the interaction implicitly. A recent example of the successful application of techniques from computational game theory is the creation of superhuman AI for two-player and multiplayer poker, developed by Noam Brown and Tuomas Sandholm at Carnegie Mellon University. These games are especially complex, as the players lack complete information about the environment (e.g., they do not know the cards in their opponents' hands). The AI agent developed for this task consists of three key components: an algorithm that learns an approximation of an equilibrium strategy by repeatedly playing against itself (i.e., learning through "self-play") and without any input from human or prior AI play, a subgame improvement algorithm that enhances the coarse equilibrium strategy for the specific subgames reached during play, and a self-improver algorithm that addresses potential weaknesses identified by opponents in the approximate equilibrium strategy.
Equilibrium computation algorithms typically combine machine learning and game theory in order to compute equilibrium strategies for challenging problems that involve numerous strategic agents. Equilibrium computation algorithms have a wide range of applications beyond playing games like poker, and are used extensively to tackle other real-world issues. For example, equilibrium computation techniques are used in scenarios where defensive resources must be allocated to protect vulnerable environments such as airports or ports. Other applications range from financial markets to the management of road congestions and energy grids. Moreover, such learning algorithms are crucial for building machine learning systems that are robust to adversarial attacks and cannot be exploited by malicious agents. Finally, the equilibrium computation framework can provide valuable tools for studying and better understanding other more general machine learning problems. For instance, modern bidding algorithms for online ad allocation on Internet advertising platforms must satisfy a variety of constraints such as budget and return on investment (ROI). These constrained optimization problems can be modeled as two-player games where two AI agents compete against each other, and they can be solved through multi-agent learning algorithms.
These are only a few of the applications in which equilibrium computation and multi-agent learning are playing a significant role. In the future, as we move towards a world that is increasingly interconnected, and where several tasks and decisions are entrusted to AI algorithms, such techniques are expected to play an even more significant role.