13 July 2019

AI Beats Human Poker Champions

Carnegie Mellon University in collaboration with Facebook developed an AI application called Pluribus that reliably beat five professional poker players in the same game, or one pro pitted against five independent copies of itself. It’s a major leap forward in capability for the machines, and amazingly is also far more efficient than previous agents, as well. One-on-one poker is a weird game, but the zero-sum nature of it makes it susceptible to certain strategies in which a computer able to calculate out far enough can put itself at an advantage. But add four more players into the mix and things get real complex, real fast. With six players, the possibilities for hands, bets and possible outcomes are so numerous that it is effectively impossible to account for all of them, especially in a minute or less. It’d be like trying to exhaustively document every grain of sand on a beach between waves. Yet over 10,000 hands played with champions, Pluribus managed to win money at a steady rate, exposing no weaknesses or habits that its opponents could take advantage of. The secret is consistent randomness.


Pluribus was trained, like many game-playing AI agents these days, not by studying how humans play but by playing against itself. The training program used something called Monte Carlo counterfactual regret minimization. Sounds like when you have whiskey for breakfast after losing your shirt at the casino, and in a way it is machine learning-style. Regret minimization just means that when the system would finish a hand, it would then play that hand out again in different ways, exploring what might have happened had it checked here instead of raised, folded instead of called and so on. A Monte Carlo tree is a way of organizing and evaluating lots of possibilities, akin to climbing a tree of them branch by branch and noting the quality of each leaf you find, then picking the best one once you think you’ve climbed enough. If you do it ahead of time (this is done in chess, for instance) you’re looking for the best move to choose from. But if you combine it with the regret function, you’re looking through a catalog of possible ways the game could have gone and observing which would have had the best outcome.

More information: