From Deep Blue to AlphaZero

This is a member-only chapter. Log in with your Signal Over Noise membership email to continue.

Module 2 · Section 2 of 3

From Deep Blue to AlphaZero

To understand modern AI, it helps to start with a machine that wasn’t very modern at all.

Path one: brute force

Deep Blue, the IBM chess computer that defeated world champion Garry Kasparov in 1997, worked through a method called brute-force search. Every time it had to move, it evaluated roughly 200 million board positions per second, looking six to twelve moves ahead. It explored every possible response, then every counter-response, then every counter-counter-response, building a tree of possibilities and picking the branch that led to the best outcome.

Its intelligence came from two things: raw computational speed, and rules written by human chess experts. Grandmasters had encoded decades of chess knowledge directly into the system — which positions were strong, which sacrifices were worth making, how to evaluate material against position. Deep Blue didn’t discover any of this. It ran calculations on top of it.

This approach worked brilliantly within its domain. Outside that domain, it was useless. Deep Blue could not play draughts, let alone hold a conversation or recognise a face in a photograph. It was a specialist built entirely from pre-specified knowledge.

Path two: self-teaching

Jump to 2017. DeepMind’s AlphaZero learns chess from scratch — no human-encoded strategies, no grandmaster-curated rules. It receives only the basic rules of the game: how pieces move, what constitutes a win. Then it plays against itself, millions of times, learning through trial and error.

Within 24 hours of training, it defeats the world’s strongest chess program. Within the same training run, it learns Go and Japanese chess (shogi) to superhuman level.

Nobody told AlphaZero that controlling the centre of the board matters, or that rook endgames require different thinking than middlegame tactics. It discovered these principles by playing enough games to find patterns that worked. Its “knowledge” isn’t a list of rules someone wrote — it’s compressed into numerical weights across billions of parameters, shaped by everything it learned through self-play.

The result was a playing style that surprised even grandmasters. AlphaZero made moves that looked strange to human experts but turned out to be correct several moves later. It wasn’t following chess theory. It had developed something closer to its own.

What the difference actually means

The contrast isn’t just technical. It changes what each system can do when things get unexpected.

A hardcoded system handles situations its designers anticipated. Put it in a situation they didn’t account for, and it has no good answer. It’s fast, predictable, and transparent — you can trace exactly why it made any given decision. But it can only be as good as the knowledge that went into it, and only within the scope that knowledge covers.

A learning system handles situations it has encountered patterns of, even if it hasn’t seen that exact situation before. It’s adaptable and can generalise across context. It can discover things its designers didn’t anticipate. But it’s harder to explain, sometimes unpredictable, and its failures can be strange — it may behave confidently in situations where it’s actually extrapolating badly.

Both paths are still in use

Modern AI tools typically combine both approaches. The conversational AI you use for work probably uses a learned model at its core, but the company has added hardcoded constraints around it — things it will always do, things it will never do, formats it always follows. Self-driving systems use learning algorithms to recognise pedestrians and read road conditions, but hardcoded rules govern safety-critical decisions like responding to a red light.

Understanding which part of a system is learned versus hardcoded helps you predict where it will be reliable and where it might surprise you. Hardcoded elements are consistent and auditable. Learned elements are flexible but may produce unexpected outputs when input patterns drift far from training data.

The tools you use every day live somewhere on this spectrum. Knowing that the spectrum exists is the first step to working with them more deliberately.