What Deep Blue Actually Did

Module 1 · Section 3 of 3

Deep Blue evaluated 200 million chess positions per second.

Let that number land. Kasparov, playing at the top of his game, might seriously consider a few dozen positions during his allotted thinking time. Deep Blue was working through 200 million every second. Not 200 million across the whole game — 200 million per second, sustained, for the entire match.

This is brute force computation. Deep Blue didn’t understand chess. It had no model of the game as a contest between two minds. It generated legal moves, evaluated the resulting positions using a scoring function, and selected the move that led to the highest-scoring position several steps down the tree. Repeat. Repeat. Repeat.

The scoring function itself was sophisticated — built by grandmasters who encoded decades of chess knowledge into numerical weights for things like pawn structure, king safety, piece mobility, and control of the centre. But the process of applying that function was purely mechanical. Generate positions. Score them. Pick the best branch. Move.

The minimax algorithm

The core of Deep Blue’s decision-making was an algorithm called minimax. The idea is straightforward: you build a tree of possible futures. At the end of each branch, you score the position. Then you work backwards — assuming your opponent will always choose the move that’s worst for you, and you’ll always choose the move that’s best for you. The optimal move is the one that gives you the best outcome under those assumptions.

Humans use this logic too when they calculate variations. The difference is scale. A strong human player might calculate four or five moves deep on critical lines, ten moves deep in highly tactical positions. Deep Blue searched to depths of six to eight moves routinely and went much deeper in sharp positions where the tree narrowed. At 200 million positions per second, it could afford to.

This is not thinking. It is thorough search.

What the machine could not do

Deep Blue had no understanding of why the moves it chose were good. It couldn’t explain its reasoning. It couldn’t transfer what it “learned” from one game to another — each game started fresh from the same static evaluation function. It couldn’t adapt its personality to the situation. It couldn’t decide to play more aggressively because it was in trouble, or conservatively because it was ahead. It executed an algorithm. The algorithm happened to produce excellent chess.

This matters because of a persistent confusion about what AI systems actually do. When Deep Blue made a brilliant-looking positional sacrifice in Game 2 — the move that rattled Kasparov — it wasn’t being creative. It wasn’t sensing an opportunity the way a human player might. It was doing what it always did: searching the tree, scoring the leaves, picking the best branch. The sacrifice simply scored higher than the alternatives when evaluated several moves out.

The appearance of insight was a byproduct of thorough computation, not evidence of understanding.

Why this model still applies

Deep Blue is old technology. The AI systems you encounter at work today — large language models, recommendation engines, fraud detection systems, image classifiers — are built on different architectures and operate at scales Deep Blue’s engineers couldn’t have imagined. But the core dynamic is the same.

Modern AI systems are, at their heart, pattern-matching engines. They don’t understand the content they process. A language model doesn’t know what a contract means — it knows which words tend to follow which other words in contexts that look like contracts, and it generates text accordingly. An image classifier doesn’t know what a tumour is — it knows that certain pixel patterns in training data were labelled as tumours by radiologists, and it applies that pattern to new images.

This produces outputs that can look like understanding. Sometimes the outputs are better than what a human would produce in the same time. But the mechanism is pattern matching, not cognition.

The practical implication

Understanding this distinction changes how you evaluate AI outputs.

If an AI system gives you a confident answer, that confidence is a product of how well the input matches patterns in its training data — not a measure of whether the answer is correct. The system has no way to flag when it’s operating outside its reliable range. It will generate an answer that looks like its other answers even when the question is something it has never seen before.

This is why human judgment remains essential, not as a fallback for when AI fails obviously, but as a constant check on AI outputs that look fine but may be quietly wrong in ways the system itself cannot detect.

Kasparov lost to Deep Blue because he underestimated how powerful thorough search could be. The organisations that run into trouble with AI tend to make the opposite mistake: they overestimate what thorough pattern matching can do, and fail to check the outputs at the points where the pattern matching breaks down.

Deep Blue couldn’t sense the difference between a position it understood well and one it was navigating by extrapolation. Neither can most AI systems deployed today. That’s not a flaw to be patched — it’s a property of the architecture. Working with AI effectively means knowing where that line sits and staying alert at the boundary.