The in-game hint — the button that tells you what to play when you’re stuck — has a surprisingly messy history. Today it just calls the Elite CNN. For a while it had its own trained models, one per (board size, difficulty, side) slot, optimized with CMA-ES to beat a specific opponent from a specific distribution of positions. This post is about why we built that, why we threw it out, and why we deliberately never shipped the version that could have beaten Elite for you.
The problem hints are actually trying to solve
Hints feel like a small feature, but the underlying question is subtle. A hint isn’t “what’s the objectively best move in this position.” It’s what move gives the player the highest chance of winning this game, against this specific opponent, from here. Those two are not the same thing. The best move against a tactically sharp Elite AI might be a quiet positional play that denies it the opening it wants. Against Medium, the best move is usually the biggest immediate score, because Medium will happily give you the whole interior if you show up to take it. A general strong player can hit a local optimum that an opponent-specific one would beat.
That observation — that the right hint depends on the opponent — is what got us into training opponent-specific hint models in the first place.
The CMA-ES era
The original hint trainer lives at ai/src/training/train_hint.py, and it uses CMA-ES instead of PPO. PPO is the right tool when you want to learn a policy that plays a full game well. But the hint problem is shaped differently: we want to find a weight vector that beats a specific opponent from a specific distribution of starting positions. That collapses into black-box fitness maximization — pick a weight vector, play 100 games against the target opponent, count wins. No gradients required.
CMA-ES is the classical answer to exactly that problem. The configuration we shipped with:
- population size 30Each generation evaluates 30 candidate weight vectors. Enough to get a decent covariance estimate, small enough to fit 80 generations in an overnight run.
- 80 generations2,400 total evaluations per (board, difficulty, side) slot. Fitness stabilizes well before then, but the extra generations tighten up the search around the mode.
- initial sigma 0.5Starting spread over the weight space. Too small and you never leave the initial basin; too large and early generations are pure noise.
- 100+ games per candidateWith randomized opening moves so the fitness score reflects performance from a distribution of positions, not a single fixed start. Fewer games and the fitness signal gets drowned in variance.
Each trained hint model inherited the same 14-feature architecture as the gameplay AI, so the optimizer was searching over the same ~13-dimensional weight space — just with a different objective. The result was a small dictionary of opponent-specific weight vectors, keyed by board size, difficulty, and which side the player was on.
Why we threw it out
Two reasons, roughly equal weight.
One: it doesn’t generalize. A weight vector trained to beat Medium on a 6×6 board is nothing more than that. Change the board size, change the difficulty, or let the player start second, and you need a different vector. When we finished counting, we were looking at around twenty separate slots, each with its own overnight CMA-ES run, each of which needed to be retrained from scratch any time we tweaked the feature set or the scoring rule. That’s a lot of machinery for a button.
Two: Elite got strong enough. When the gameplay Elite was a 26-weight switching linear model, there was a real gap between “best move a linear model can find” and “best hint we could train.” The opponent-specific models genuinely helped. Once we replaced gameplay Elite with the distilled AlphaZero CNN, that gap mostly closed. The CNN is not an opponent-specific policy — it’s just stronger in general, and “play the objectively strongest move” turned out to be a near-indistinguishable hint from “play the move optimized against this specific opponent,” for the vast majority of positions real players actually ask for help in.
So the current hint implementation is almost embarrassingly simple: call the Elite CNN from the player’s perspective and return its move. The code is a few lines in src/engine/ai/engine.ts, and the weight dictionary that used to hold trained hint models is now an empty object. One model, zero retraining, fewer moving parts.
Replacing a whole trained subsystem with “just use the main model” is almost always the right call once the main model gets strong enough. The savings compound: less training, less code, less to explain, less to break when the rules change.
A hint that beats Elite is a hint we don’t want
There’s another reason we stopped chasing opponent-specific hint models, and it’s more a design principle than an engineering one. Elite is meant to be a challenge. That’s the whole point of the top difficulty. If we shipped a hint model that was specifically trained to exploit Elite’s weaknesses, a player could mash the hint button every turn and grind out a win they never really earned. The game becomes a payment interface wearing a chessboard.
We absolutely could build that. A CMA-ES hint vector trained against Elite from the human side, with enough games per fitness eval, would find the seams. We chose not to. Hints should help you notice a strong move you missed — a tactic that’s on the board but invisible to you — not hand you a script that beats an opponent you can’t otherwise beat. Victories against Elite should feel like you dragged them out of the game with your own hands. Anything else cheapens the tier.
So the rule we ended up with is: the hint system is never stronger than the gameplay AI it’s helping you fight. On Easy, Medium, and Hard the CNN is overkill, which is fine — those tiers are designed to be beatable and the hint just accelerates you past a mistake. On Elite, the hint is exactly as strong as your opponent, which means a hint tells you what Elite would play in your seat. Useful, but not a shortcut.
The lesson from all of this
A lot of early AI engineering is load-bearing only until the core model gets better. We spent real effort building an opponent-aware hint pipeline, and then the right answer became “delete it and call Elite.” That’s not a failure — the earlier work tells us when opponent-specific hints matter, which is what we’ll rely on if we ever come back to it. For now, the simplest thing that works is the best thing that works.