For about three weeks this spring, Cell Division’s AI stack went all-in on the same method that beat human chess and Go world champions — AlphaZero, tree search guided by a neural net, learning purely from self-play. Then we didn’t ship it. Elite on your phone today is something different: a small network that plays instantly, offline, with no tree search at all. Here’s the short version of why we tried, why we changed course, and why the time wasn’t wasted.
The experiment
AlphaZero is the system DeepMind used to teach a neural network to beat the best human chess players without showing it a single human game. The recipe is elegant: have a neural net play itself, use Monte Carlo Tree Search to pick strong moves, record who won, and train the net to predict those moves next time. Repeat a few million times. The net gets stronger by playing stronger versions of itself.
We built that whole loop for Cell Division. It worked — the trained network started beating our Hard opponent, then started beating our previous Elite. The math cleared, the training curves pointed in the right direction, the self-play games looked sharp.
Why we changed course
The problem was what happens at inference time. AlphaZero at the table doesn’t just ask the network for a move — it searches. Every move, the net gets called dozens of times as the search explores hypothetical futures. On a training machine with a GPU, that’s fine. On a phone, running through Python, it’s hundreds of milliseconds per turn. Enough that you’d watch the AI think, notice the delay, and wonder if the app had frozen.
We could have trimmed: smaller network, less search, a custom C++ tree engine. But that’s six to eight weeks of work for a result that’s still doing tree search on a phone. The cost curve didn’t bend the right way.
What we shipped instead
We kept the AlphaZero model — as a teacher. The full AlphaZero network, with all the search, lives on the training machine. It plays millions of positions and records which move it picked in each one. Then a much smaller, faster network — the student — learns to imitate those choices. The student doesn’t need to search; it just needs to reproduce the teacher’s taste. That student is what ships as Elite.
On your phone, when you play against Elite, there’s no tree search. Elite looks at the board and plays a move. That’s it — one quick pass through the student network. All the heavy lifting happened months before you installed the app, on a computer much more powerful than your phone, with the teacher that Elite learned from.
The weights didn’t ship. The engineering did. Nearly every hard problem we had to solve during the AlphaZero experiment — batching the search efficiently, picking the right value target, a painful MCTS sign bug, how to pad boards of different sizes without confusing the network — shows up somewhere in the production pipeline that trains today’s Elite. A month of deep work on something that gets deleted is only a waste if the lessons delete with it.
For developers
The full engineering writeup — batched MCTS with virtual loss, three value-target formulations we shipped in six hours, the one-character PUCT sign bug that wasted two days, and why centered board padding mattered — lives on the Island & Pine studio blog: The AlphaZero detour: batched MCTS, three value targets, and a sign bug that hid in plain sight.