Cell Division ships with four AI tiers that all run instantly on a phone — no server, no cloud inference, no GPU. Three of them are built from fourteen hand-crafted features and linear models you could write on an index card. The fourth, Elite, is a tiny CNN distilled from an AlphaZero teacher. This is how they fit together.
Two families of model, one game
Under the hood, Cell Division runs two completely different kinds of AI. The first three tiers — Easy, Medium, Hard — are linear models on top of fourteen hand-crafted features. They are small enough to read, fast enough to run inside a render frame, and structured enough that each tier can have its own personality without retraining. The top tier, Elite, is a small ResNet-style policy network distilled from a much larger AlphaZero teacher that searches with MCTS. At inference time we don’t run MCTS at all — the student has already absorbed the teacher’s action distribution into its weights.
It took us a while to get here. The original plan was to train everything end-to-end with PPO self-play on the same 13-weight linear model; Elite started life as a switching linear model with 26 weights, one set for the opening and one for the endgame. That version still ships as the web fallback and it plays a credible game, but the distilled CNN is noticeably stronger — it learned patterns the linear features simply cannot represent. More on that below.
The decision each turn
On every turn the AI looks at every empty square and picks one. The scoring rule is simple: a cell scores 1 point if it has no same-color neighbors, otherwise it scores 2 points per active connection axis, with four possible axes (vertical, horizontal, and the two diagonals). Eight points is the ceiling — fully connected on all four axes.
The engine ships a function called fastScoreDelta in src/engine/scoring/calculateScore.ts that answers the question “if I placed a cell here, how many points would the board gain?” in O(neighbors) rather than rescanning the whole board. That cheap lookahead is the hinge the rest of the AI hangs on: we can ask it once per candidate move without breaking a sweat.
Fourteen features, one vocabulary
For the three linear tiers, every candidate cell is summarized by the same fourteen features. They are defined in src/engine/ai/features.ts and mirrored in Python at ai/src/ai/features.py so training and inference agree on the numbers to the last bit. Every feature is normalized to roughly [0, 1] — which means the same trained weights work on any board size from 4×4 to 8×8 with no retraining.
- immediate_aiPoints the AI would score by playing this cell next — normalized by 8 (max per cell).
- immediate_oppPoints the opponent would score by playing here next — the same lookahead, flipped.
- opennessFraction of the 8 surrounding cells that are empty. High openness = room to grow.
- ai_neighborsFriendly cells in the 8 surrounding spaces, normalized by 8. Captures local cluster density.
- opp_neighborsEnemy cells in the 8 surrounding spaces. Tells the model how contested the space is.
- ai_connectivityFriendly cells among the 4 orthogonal neighbors. Orthogonal adjacency drives two scoring axes.
- opp_connectivityEnemy cells among the 4 orthogonal neighbors. A mirror of ai_connectivity for the other player.
- ai_underlapNumber of axes where the AI already has cells on both sides — the bridge-potential signal.
- opp_underlapSame idea for the opponent — axes where they already flank this square.
- boundary_neighborsCount of neighbors that are off-board or blocked. High values mean the cell sits against an edge.
- ai_half_axisAxes with friendly cells on exactly one side — asymmetric extension opportunities.
- opp_half_axisAxes where the opponent is extending asymmetrically into the square.
- second_order_opennessEmpty cells at Chebyshev distance exactly 2 — the 16-cell ring one step beyond the immediate neighborhood.
- game_progressFraction of the playable board already filled. The only global feature — used only by Elite to switch between early- and late-game weights.
The features fall into four groups. Two ask what’s the immediate payoff? — one from the AI’s perspective, one from the opponent’s. Six describe local density (how crowded is the neighborhood, and who owns it). Four describe shape: bridge potential, asymmetric extension, and whether the cell is being flanked. One looks one ring further out. And one — game_progress — is a global feature used by the legacy switching-linear Elite model (described later) to tell early game from endgame.
The CNN Elite doesn’t use any of this. It works directly from a (3, 8, 8) board tensor and learns its own features inside the convolutions. The fourteen hand-crafted features exist to make the linear tiers tractable, not because they’re fundamental.
Four tiers, four strategies
Easy, Medium, and Hard all share the 14-feature pipeline. What differs is how they weight the numbers and whether they pick the best move or sample from a distribution. Elite lives on its own code path. The move-selection dispatch lives in src/engine/ai/engine.ts.
Easy — open and airy
Easy uses two features and nothing else: openness - 0.3 × boundary_neighbors. It then runs a softmax with a high temperature (~1.5) so even the best move isn’t certain. The net effect is a bot that plays into space, avoids walls, and occasionally surprises you with a weird choice — which is exactly what you want from an Easy opponent.
Medium — greedy for points
Medium adds the immediate-score features: immediate_ai + immediate_opp + 0.3 × openness, picked deterministically. It always takes the highest-scoring move and uses openness as a tie-breaker. No training, no surprises — just a reliable “make the scoring play” opponent.
Hard — 13 trained weights, blended
Hard is where training starts earning its keep. It uses a 13-weight linear model, defined as LinearNet in ai/src/ai/linear_torch.py, trained with PPO self-play. At inference time we blend 75% Medium heuristic with 25% trained weights:
blended = 0.75 × MEDIUM_WEIGHTS + 0.25 × HARD_WEIGHTS
score = features · blended // argmax over candidatesThe blend is deliberate. Pure trained policies can make moves that look bizarre to humans — correct, but inscrutable. Mixing the greedy heuristic back in keeps the obvious scoring moves visible while layering learned positional judgment on top. It’s a dial between “predictable” and “sharp.”
Elite — a distilled AlphaZero CNN
Elite is the one tier where we gave up on linear-with-features and let a proper neural network do the work. The runtime model, defined as SmallPolicyNet in ai/scripts/train_student.py, is a 32-channel ResNet with 3 residual blocks and a single policy head — around 60k parameters. It takes a (3, 8, 8) observation (friendly stones, opponent stones, a valid-cell mask centered inside the 8×8 tensor) and outputs 64 logits over grid positions. We mask illegal moves and take the argmax. No search, no MCTS at inference.
The model ships as an ONNX file at assets/models/elite_cnn.onnx and runs on-device via onnxruntime-react-native. Inference is under a millisecond on a modern phone, and the session is cached after the first load, so the AI responds instantly between taps. The runtime glue lives in src/engine/ai/cnn-elite.ts.
The interesting part is how it was trained. We don’t train the shipping model directly — we train a much larger teacher first (64 channels, 6 blocks, policy + value heads) with full AlphaZero-style self-play in ai/scripts/train_cnn_teacher.py: MCTS rollouts guided by the current network, policy targets taken from the visit counts at the root, value targets from the game outcome. Once the teacher stops improving, ai/scripts/generate_teacher_data.py replays it with 256-simulation MCTS to generate positions and their MCTS action distributions. Then ai/scripts/train_student.py trains the tiny ResNet to imitate those distributions — plain cross-entropy distillation, no search in the loop. The student ends up with a meaningful fraction of the teacher’s strength at a tiny fraction of the size.
Elite is not unbeatable, and we like it that way. On small boards a careful human can absolutely win; even on 8×8 it makes the occasional move that doesn’t quite max out a tactical exchange. It’s very good, not perfect — a worthy opponent, not a brick wall. If you ever find yourself thinking “I could have played that better than it did,” you’re probably right, and that’s the game we wanted to ship.
The legacy Elite — switching linear
Before the CNN, Elite was a 26-weight SwitchingLinearNet trained with PPO self-play. It stores two separate weight vectors — one tuned for the opening, one for the endgame — and interpolates between them on the fly using game_progress:
gp = game_progress // 0 at move 1, 1 at the last move
score = Σ feat[j] × (w_early[j] × (1 - gp)
+ w_late[j] × gp) // j = 0..12The idea captures something real about the game: what makes a good opening move (claim space, stay away from edges) is not what makes a good endgame move (squeeze every last connection axis out of tight territory). One set of weights has to compromise. Two sets can specialize and blend smoothly between them. That was enough to beat Hard consistently, and for a long time it was the strongest thing we shipped. But it’s still a linear model over 14 features — whole classes of positional pattern (distant threats, multi-move tactics, shape recognition) are literally outside what it can represent. The CNN closed that gap by a lot.
We kept the switching linear model around rather than deleting it. It still runs as the fallback whenever the ONNX native module isn’t linked — most notably on the web build of the game, where we don’t ship a native ML runtime at all. So if you’re playing Cell Division in a browser, the “Elite” you’re facing is the 26-weight model, not the CNN. On a phone, it’s the distilled network.
Training the linear tiers: PPO self-play
Hard and the legacy switching-linear Elite are trained with the same loop — only the model changes. The scripts are ai/scripts/train_linear.py and ai/scripts/train_switching_linear.py. The loop:
- self-playEach iteration plays 64 games on board sizes 4 through 8, with randomized starting positions — some pre-placed stones, some blocked cells. Variety in starts keeps the model from overfitting a single opening.
- behavior policyMoves are sampled from a softmax for roughly the first 75% of each game (minimum temperature 0.05), then greedy for the tail. PPO needs a behavior policy with non-zero probability on alternative actions to learn anything.
- rewardWhen a game ends, each move in the trajectory gets the same reward:
tanh(margin / scale), where margin is the AI’s final score minus the opponent’s. Squashing with tanh keeps blowouts from drowning out close games. - ppo updateLearning rate 0.01, clip coefficient 0.2, 4 epochs per batch, entropy bonus 0.05, target KL 0.03 for early stopping. Standard PPO, tiny model — it converges in minutes on a laptop CPU.
Because the model is so small, we can train it against slightly older versions of itself, push the learning rate higher than you ever would with a deep net, and still end up with stable weights. The whole pipeline fits in a single Python file and exports a JSON weight vector that gets checked into src/engine/ai/weights.ts.
What the learned weights say
Here is the actual 13-weight vector Hard ships with today, pulled straight from src/engine/ai/weights.ts:
immediate_ai+4.09immediate_opp+4.34openness+14.49ai_neighbors-1.73opp_neighbors-1.65ai_connectivity+1.90opp_connectivity+1.82ai_underlap-1.84opp_underlap-1.52boundary_neighbors-3.89ai_half_axis-0.77opp_half_axis-0.37second_order_openness-3.51
A few things jump out. openness dominates everything else at +14.49 — a huge signal that training converged on “play into empty space.” The two immediate-score features land around +4 each, confirming the greedy baseline was right about what matters most at the tactical level. Then the penalties: boundary_neighbors at −3.89 punishes hugging edges, and second_order_openness at −3.51 — the interesting one — says that once you already have local openness, grabbing even more distant emptiness is mildly bad. Intuitively: the model learned to prefer cells that are open but bordered by structure, not floating in the middle of nothing.
This is the quiet superpower of a 13-parameter model: you can actually read it. A 10-million-parameter network would play as well, maybe slightly better, and you would have no idea why.
Training the CNN: teacher, data, student
The Elite CNN pipeline looks nothing like the linear loop. Three stages, run offline on a GPU box:
- train the teacher
train_cnn_teacher.pyruns AlphaZero-style self-play on a 64-channel, 6-block ResNet with both policy and value heads. Each move comes from a 100–400 simulation MCTS guided by the current network; policy targets are the search visit counts, value targets are the game outcome. This is the expensive part. - generate teacher data
generate_teacher_data.pyplays thousands of games with the frozen teacher at high simulation count and records each position plus its MCTS action distribution to a single.npzfile. - distill the student
train_student.pytrains the shipping 32-channel, 3-blockSmallPolicyNetto match the teacher’s action distribution via cross-entropy. No MCTS, no self-play — just supervised learning on a fixed dataset. This part runs in minutes.
Finally export_onnx.py exports the student to elite_cnn.onnx, which gets bundled into the app. The upshot: the expensive, search-heavy intelligence happens once, in the training pipeline, and what ships to the phone is a lean feedforward network that inherits the teacher’s taste without paying the search cost at runtime.
Why this combination is the right shape
The thing we care about is that the game feels different at each difficulty. Easy should be playful, Medium should be stubborn, Hard should punish mistakes, and Elite should make you work for it. Tiny linear models give us the first three almost for free — you can tune the personality of each tier by hand, and inference is free. For the top tier, where “play genuinely strong moves” is the only requirement, a distilled CNN earns its keep.
The whole AI stack — the 13-weight Hard model, the 26-weight switching linear fallback, and the 60k-parameter distilled ResNet — fits comfortably in an app bundle, runs on-device with no network calls, and retrains on a single GPU box over a weekend. That’s the combination we wanted: something you can read, something you can beat, and something that still surprises you.