Hey
The journey from the first simple bots to superhuman AI like Libratus and Pluribus is a fascinating 27-year story of academic rivalries, commercial scandals, and massive algorithmic breakthroughs. It’s a perfect case study in AI for imperfect information games.
I’ve written up a little history, from the first academic attempts in 1998 to the modern “GTO Solvers.” Grab a coffee, this is a long one.
The History of Online Poker Bots: From Loki’s Rules to CFR’s Divinity
How we went from hand-crafted heuristics to mathematical perfection in 20 years.
Intro: Why is Poker the “Holy Grail” of AI?
For decades, poker was one of the most difficult problems for artificial intelligence. Why? Because unlike chess or Go, where everything is visible on the board, poker is…
Hey
The journey from the first simple bots to superhuman AI like Libratus and Pluribus is a fascinating 27-year story of academic rivalries, commercial scandals, and massive algorithmic breakthroughs. It’s a perfect case study in AI for imperfect information games.
I’ve written up a little history, from the first academic attempts in 1998 to the modern “GTO Solvers.” Grab a coffee, this is a long one.
The History of Online Poker Bots: From Loki’s Rules to CFR’s Divinity
How we went from hand-crafted heuristics to mathematical perfection in 20 years.
Intro: Why is Poker the “Holy Grail” of AI?
For decades, poker was one of the most difficult problems for artificial intelligence. Why? Because unlike chess or Go, where everything is visible on the board, poker is a game of imperfect information. You don’t know what your opponent has. You have to bluff. You have to read patterns. You have to manage risk under conditions of fundamental uncertainty.
For a programmer, it’s a fascinating challenge: how do you write an algorithm that “guesses” what the opponent has without seeing their cards? How do you teach a machine to bluff? And how do you make it do that better than a human?
This story is a journey through 27 years (1998-2025) – from simple, manually written rules, through loud scandals with commercial bots, to algorithms achieving mathematical perfection. A history in which academia, criminals, and corporations fought a battle for the future of a game worth billions of dollars.
Chapter 1: The Academic Pioneers (1998-2002) – Loki and Poki
Computer Poker Research Group (University of Alberta)
It all began in 1996 at the University of Alberta in Canada. A research group, the Computer Poker Research Group (CPRG), decided to systematically tackle poker as an AI problem. Their goal: create a program that could play Limit Texas Hold’em at the level of human professionals.
Loki (1998) – The First Real Bot
In 1998, Denis Papp defended his master’s thesis describing Loki – the first serious poker bot in history. What was so groundbreaking about it?
Loki’s Key Innovations:
- Hand Strength (HS) – a mathematical calculation of the hand’s current strength.
- Hand Potential (PPOT/NPOT) – the potential to win (Positive) or lose (Negative) on future streets.
- Effective Hand Strength (EHS) – combining the above into a single metric:
EHS = HS + (1-HS) * PPOT - Opponent Modeling – the real revolution: Loki maintained “weight tables” for each opponent and updated them dynamically.
How did Opponent Modeling work?
// Pseudocode: a simplified opponent model in Loki
struct OpponentModel {
float handStrengthWeights[169]; // 169 unique starting hand types
void update(Hand opponentHand, Action action) {
if (action == RAISE) {
// If the opponent raises, increase weights for strong hands
for (int i = 0; i < 20; i++) { // Top 20 hands
handStrengthWeights[i] *= 1.05;
}
} else if (action == FOLD) {
// If the opponent folds, decrease weights
for (int i = 0; i < 50; i++) {
handStrengthWeights[i] *= 0.95;
}
}
normalize(); // Normalize to a probability distribution
}
};
Sources:
- Denis Papp’s Thesis on Loki (1998) - PDF
- Opponent Modeling in Poker (AAAI 1998)
- Poker as Testbed for AI Research (1998)
Poki (2002) – Loki’s Successor
Poki refined its predecessor’s methods:
- It used selective sampling – on-the-fly simulations to make decisions.
- It was one of the early attempts to apply simple neural networks to poker, using them to predict opponent actions based on betting patterns.
- It was already good enough to consistently win at low stakes online.
Fun fact: Poki was later licensed for the video game “Stacked” (2005), featuring Daniel Negreanu.
Sources:
Chapter 2: The “Wild West” (2005) – The WinHoldEm Scandal
The Online Poker Boom and its Dark Side
In the mid-2000s, online poker exploded. Millions of players online. Billions of dollars at stake. And where there’s money, there’s crime.
Ray E. Bornert II and WinHoldEm – The “Democratization” of Cheating
In 2005, Jeff Atwood (of Coding Horror and later Stack Overflow fame) wrote a famous article: “The Rise of the PokerBots” which exposed the activities of Ray E. Bornert II – the creator of WinHoldEm.
WinHoldEm’s Offer:
- $25: Basic version – hand analysis.
- $200: “Team Edition” – FULL BOT + COLLUSION MODULE.
What does “collusion” mean?
It allowed multiple bots at the same table to exchange information about their cards. The result? A single, multi-headed player against individual humans. A crushing advantage.
Bornert’s Justification (quote from an interview):
“Online poker is full of cheaters anyway. I’m just ‘democratizing’ access to the weapons. It’s deliberate civil disobedience. I’d rather be unethical than be a victim.”
Programming Trivia
1. Screen-Scraping (The amateur’s method)
WinHoldEm used literal screen-scraping – the bot “looked” at the screen, reading pixels:
// Pseudocode: how WinHoldEm read cards
COLORREF pixelColor = GetPixel(hdc, cardX, cardY);
if (pixelColor == RGB(255, 0, 0)) {
card = ACE_OF_HEARTS;
} else if (pixelColor == RGB(0, 0, 0)) {
card = ACE_OF_SPADES;
}
// ... and so on for every card
Requirement: The user had to lock their Windows graphics settings to specific values. Changing DPI or a theme = the bot stopped working.
2. Better Methods (For serious bots)
In the comments of Atwood’s article, it was revealed that advanced bots DID NOT use screen-scraping:
A) Reading Log Files:
// PartyPoker wrote hand history to a file in real-time
std::ifstream logFile("C:\\PartyPoker\\handhistory.txt");
std::string line;
while (std::getline(logFile, line)) {
if (line.find("dealt") != std::string::npos) {
// Parse the hand
}
}
B) Windows Messages:
// Empire Poker used standard Windows controls
HWND buttonHandle = FindWindow(NULL, "Call");
SendMessage(buttonHandle, BM_CLICK, 0, 0); // "Click" the button
// Read the text
char buffer[256];
GetWindowText(potSizeLabel, buffer, 256);
int potSize = atoi(buffer);
3. Ultrafast Equity Libraries (2005)
Around the same time, the first open-source C++ libraries for fast equity calculations appeared, like poker-eval:
// Example usage of poker-eval
#include "poker_defs.h"
StdDeck_CardMask hand, board;
// ... initialize cards
double equity = calculate_equity(hand, board, 1000000); // 1M simulations
Calculating millions of hands per second became accessible to everyone.
Source:
Consequences of the Scandal
Atwood’s article caused a panic. Poker rooms invested millions in security systems. WinHoldEm was banned, but the cat was already out of the bag.
Sources:
- The Rise of the PokerBots (Jeff Atwood, 2005)
- NBC News: Are poker ‘bots’ raking online pots? (2004)
- Boing Boing: Video-poker bots (2005)
Chapter 3: The Academic Breakthrough (2007-2015) – The CFR Era
Counterfactual Regret Minimization – The Most Important Algorithm in Poker AI History
An arms race ensued for the next decade. The real breakthrough came in 2007 when Martin Zinkevich (now at Google) published a paper on Counterfactual Regret Minimization (CFR).
What is CFR? (An explanation for programmers)
CFR is a Reinforcement Learning algorithm, but completely different from Deep Q-Learning or Policy Gradients.
The Key Idea:
- The bot plays billions of hands against itself.
- In each iteration, it analyzes its “regret” – a mathematical measure of how much it regrets not playing a different action.
- It updates its strategy to more often play the actions it “regretted” not playing.
- After trillions of iterations, the total regret drops to zero = the strategy reaches a Nash Equilibrium.
Nash Equilibrium = a strategy that cannot be improved upon if the opponent is also playing optimally.
Pseudocode for CFR (simplified):
def cfr(gameState, player, reachProb):
if gameState.isTerminal():
return gameState.utility(player)
infoSet = gameState.getInfoSet(player)
strategy = regretMatching(infoSet.regrets)
utilities = {}
nodeUtil = 0
for action in gameState.legalActions():
# Recursively calculate the value of each action
nextState = gameState.apply(action)
utilities[action] = cfr(nextState, player,
reachProb * strategy[action])
nodeUtil += strategy[action] * utilities[action]
# Update regrets
for action in gameState.legalActions():
regret = utilities[action] - nodeUtil
infoSet.regrets[action] += reachProb * regret
return nodeUtil
def regretMatching(regrets):
# Strategy is proportional to positive regrets
posRegrets = {a: max(0, r) for a, r in regrets.items()}
sumRegrets = sum(posRegrets.values())
if sumRegrets > 0:
return {a: r/sumRegrets for a, r in posRegrets.items()}
else:
# Play uniformly
return {a: 1.0/len(regrets) for a in regrets}
Why is CFR brilliant?
- No combinatorial explosion – it processes the game tree sequentially.
- Mathematical guarantee of convergence to a Nash Equilibrium.
- It can be sampled (Monte Carlo CFR) – you don’t need to process the whole tree.
Sources:
- Regret Minimization in Games with Incomplete Information (Zinkevich et al., 2007) - PDF
- Counterfactual Regret Minimization - the core of Poker AI (int8.io)
- An Introduction to CFR (Neller) - PDF
- Monte Carlo CFR (Lanctot et al., 2009) - PDF
Cepheus (2015) – Limit Hold’em is Solved
Michael Bowling and the team from the University of Alberta present Cepheus – a bot that essentially solved 1-on-1 Limit Hold’em.
The Numbers:
- 70 billion CFR iterations
- Strategy occupies 11 TB of data
- Plays so close to perfection that a human would have to play for 70 years, 12h a day, without errors, to statistically prove they were better.
Exploitability: 0.000986 big blinds/game
What does “essentially solved” mean?
A theoretically perfect strategy (Nash Equilibrium) has an exploitability of 0. Cepheus has 0.000986, which in practice means it is impossible to beat in any reasonable timeframe.
Sources:
- Heads-up limit hold’em poker is solved (Science, 2015)
- Cepheus Poker Project
- Wikipedia: Cepheus (poker bot)
- Washington Post: Meet Cepheus (2015)
Chapter 4: Conquering No-Limit (2017) – DeepStack and Libratus
DeepStack (2017) – Continual Re-solving + Deep Learning
Limit Hold’em was solved, but No-Limit is a completely different league – unlimited bets = unbelievable complexity.
DeepStack’s Innovations:
- Continual Re-solving – instead of storing the strategy offline, the bot solves ONLY the current part of the game “on the fly.”
- Deep Neural Networks as “intuition” – estimating the value of future moves without a full simulation.
Architecture:
# Pseudocode: how continual re-solving works
def deepstack_play(gameState):
# 1. Use a blueprint for early rounds
if gameState.street <= 1:
return blueprint_strategy(gameState)
# 2. For later rounds: solve a subgame
subgame = extract_subgame(gameState)
# 3. Use a neural network to estimate terminal values
leaf_values = neural_net.predict(subgame.terminal_states)
# 4. Solve the subgame with CFR using leaf_values as "intuition"
strategy = cfr_solve(subgame, leaf_values, iterations=1000)
return strategy.get_action()
Results:
- December 2016: DeepStack defeated 33 professional players from 17 countries.
- 44,852 hands played.
- It beat 11 of them with a margin of statistical significance.
Sources:
- DeepStack: Expert-level AI in heads-up no-limit poker (Science, 2017) - PDF
- DeepStack Official Website
- ScienceDaily: Poker-playing AI (2017)
Libratus (2017) – Crushing the Professionals
Tuomas Sandholm and Noam Brown from Carnegie Mellon created Libratus – the most advanced bot in history (at the time).
“Brains vs. AI” – The Tournament of the Century:
-
Jan 11-30, 2017, Rivers Casino, Pittsburgh
-
120,000 hands against 4 top professionals:
-
Jason Les
-
Dong Kim
-
Daniel McAulay
-
Jimmy Chou
The Result:
- Libratus won $1,766,250 in chips.
- 14.72 big blinds / 100 hands – an astronomically high win rate.
- 99.98% statistical significance.
Libratus’s Three Modules:
1. Blueprint Strategy (calculated offline):
// Simplified game abstraction
// Instead of 10^161 states -> 10^12 states
AbstractGame abstraction = create_abstraction(TexasHoldem);
Strategy blueprint = mccfr(abstraction, iterations=1e12);
2. Nested Subgame Solving (during play):
// The bot calculates the strategy for the current situation on the fly
Subgame current = extract_subgame(gameState);
Strategy refined = solve_subgame(current, blueprint, real_time=true);
3. Self-Improvement (nightly analysis):
What REALLY terrified the pros:
“Every night after play, Libratus analyzed our unusual plays. The next day, the ‘holes’ in its strategy were gone. It was like fighting an opponent that adapted.” – Jimmy Chou
Technical Details:
- Bridges supercomputer (Pittsburgh Supercomputing Center)
- ~600 of 846 compute nodes
- 1.35 petaflops of computing power
- ~25 million CPU hours
Sources:
- Superhuman AI for heads-up no-limit poker (Science, 2017)
- Libratus: The Superhuman AI (IJCAI 2017) - PDF
- Carnegie Mellon: AI Beats Top Poker Pros (2017)
- Carnegie Mellon: Inner Workings of Libratus (2017)
- TIME: How This Poker-Playing Computer Beat the Best (2017)
Chapter 5: Multiplayer (2019) – Pluribus Changes Everything
The Multiplayer Problem
1-on-1 was “cracked.” But 6-player poker is an exponentially harder problem:
- It’s no longer a zero-sum game.
- A Nash Equilibrium for >2 players is astronomically difficult to compute.
- You have to model coalitions, dynamic alliances, and multi-way bluffs.
Pluribus (2019) – The First 6-Max Bot
Noam Brown and Tuomas Sandholm (the same guys as Libratus) + Facebook AI Research present Pluribus.
The Breakthrough:
- June-July 2019: Pluribus defeated 15 top professionals in 6-player No-Limit Hold’em.
- 10,000+ hands
- Played against pros like Darren Elias and Chris Ferguson.
But how?
1. Blueprint Strategy (training):
- 8 days of training on a 64-core server.
- <512GB RAM, ZERO GPUs.
- $150 total cost.
This is insane – AlphaGo needed 1920 CPUs + 280 GPUs. Pluribus: a regular server.
2. Real-time Search:
# Pluribus doesn't solve the whole game – just a "lookahead" of a few moves
def pluribus_search(gameState):
# 1. Use blueprint for the first round
if gameState.street == 0:
return blueprint_strategy(gameState)
# 2. For later rounds: limited-lookahead search
# Key innovation: this can be done for >2 player games
subgame = limited_lookahead(gameState, depth=4)
# 3. At the "leaves" of the subgame: consider 5 possible continuation strategies
for terminal_node in subgame.leaves:
continuation_strategies = [
aggressive, passive, balanced, exploitative, random
]
# Choose the best response relative to the blueprint
return solve_subgame(subgame)
Pluribus’s Unusual Strategies:
The pros noticed that Pluribus did things humans just don’t do:
- Donk betting (betting “out of position”) with strange sizes.
- Overbetting in situations where humans would fold.
- Limping (minimal call) from positions where pros always raise.
“There were plays that people just don’t make, especially regarding bet sizing.” – Michael Gagliano ($2M tournament earnings)
Sources:
- Superhuman AI for multiplayer poker (Science, 2019)
- Carnegie Mellon & Facebook AI: Pluribus (2019)
- PokerNews: Pluribus First AI to Beat Humans (2019)
- HPCwire: Pluribus Humbles the Pros (2019)
Chapter 6: Democratization of Knowledge (2019-2025)
The Problem is Still Open
Despite Pluribus’s success, 6-player poker is NOT “solved” in the mathematical sense:
- Pluribus beat humans, but its strategy is not a perfect Nash Equilibrium.
- There is still room for improvement.
- The problem remains an open challenge for AI.
The New Generation of Creators
1. Alexandre Marangoni Costa (Brazil, 2019)
Thesis: “A Study on Neural Networks for Poker Playing Agents”
Created Pucker – a framework for building poker bots in Python. Fun fact: to get the neural networks to work, she had to add... the exact same concepts as Loki from 1998:
- Hand Strength
- Hand Potential
- Opponent Modeling
21 years later, ’90s-era “feature engineering” is still necessary.
2. Kylie Ying (USA, 2021)
A popular AI YouTuber. In 2021, she published the series “How to Build a Superhuman Poker AI using CFR” – a step-by-step implementation of CFR in Python.
Sample code from her tutorials:
class KuhnPoker:
def __init__(self):
self.nodeMap = {} # Map information sets -> nodes
def cfr(self, cards, history, p0, p1):
plays = len(history)
player = plays % 2
opponent = 1 - player
# Terminal states
if plays > 1:
terminalPass = history[-1] == 'p'
doubleBet = history[-2:] == 'bb'
if terminalPass:
return 1 if history[-2:] == 'bp' else -1
elif doubleBet:
return 2 if cards[player] > cards[opponent] else -2
infoSet = str(cards[player]) + history
# Get or create node
if infoSet not in self.nodeMap:
self.nodeMap[infoSet] = Node(['p', 'b'])
node = self.nodeMap[infoSet]
strategy = node.getStrategy(p0 if player == 0 else p1)
# Recursive CFR
util = [0, 0]
nodeUtil = 0
for action in ['p', 'b']:
nextHistory = history + action
if player == 0:
util[ord(action) - ord('p')] = -self.cfr(
cards, nextHistory, p0 * strategy[action], p1
)
else:
util[ord(action) - ord('p')] = -self.cfr(
cards, nextHistory, p0, p1 * strategy[action]
)
nodeUtil += strategy[action] * util[ord(action) - ord('p')]
# Update regrets
for action in ['p', 'b']:
regret = util[ord(action) - ord('p')] - nodeUtil
node.regretSum[action] += (
p1 if player == 0 else p0
) * regret
return nodeUtil
Sources:
- A Study on Neural Networks for Poker (Costa, 2019)
- Kylie Ying YouTube: How to Build a Poker AI (2021)
- StackWild (2024) Youtube
- CFR-Explained GitHub (2024)
Chapter 7: Implications & A Question for r/programming The history shows a clear trajectory: from simple academic bots (Loki), to superhuman GTO solvers (Libratus), and now, reportedly, sophisticated commercial bots dominating online games.
The “MCP Moment” – When Building Bots Became Accessible From a programmer’s perspective, the poker bot story is fascinating because it represents a massive multi-component project (MCP?) that suddenly became achievable for individual developers around 2005. Take this programmer’s story from 2006 as an example. He documented building a working poker bot – a titanic undertaking that required:
Computer vision (card recognition) Game state parsing Opponent modeling Decision algorithms Real-time execution
As he notes in his article:
“First of all, there’s a very easy way to detect hole cards via a screen-scraping ‘poor-man’s OCR’ approach” or DLL injection.
That sentence captures a pivotal moment: the technical barriers had fallen. What was once impossible became a weekend project. If you browse the Internet Archive from 2005-2006, you’ll see an explosion of forums, tutorials, and communities dedicated to building poker bots “how-i-built-a-working-poker-bot”. By 2006, these bots had evolved rapidly – from simple rule-based systems to sophisticated probability calculators. Why This History Matters for Modern Developers I’ll be honest: I’ve never built a bot based on a trained model, and I don’t intend to. But as someone learning to train specialized models, poker card recognition is an ideal learning example. It’s a constrained problem:
Limited set of classes (52 cards + a few UI elements) Clear success metrics (accuracy, inference speed) Real-world challenges (lighting, angles, different skins)
The 2005 “MCP moment” teaches us something important: when the right tools become accessible (fast equity calculators, OCR libraries, cheap compute), individuals can suddenly tackle problems that once required research labs. We’re at a similar inflection point now with on-device AI, mobile ML frameworks, and pre-trained models. The question isn’t whether these tools will be used – it’s how we as a community shape their development and use. A Question for the Community Can you share examples of projects that use trained AI models in mobile phone apps to assist with video games? I’m curious about the technical approaches – not necessarily for poker, but for any competitive game:
Real-time computer vision pipelines on mobile devices On-device inference for game state recognition Strategies for handling different screen resolutions, lighting conditions, and UI variations
I’m interested in both the ethical implications and the technical challenges of building “AI co-pilots” for games. As bots become ubiquitous, is there a place for defensive AI assistants, or does this just escalate the arms race? Would love to hear your thoughts on the whole history and where we go from here.