We trained a 65K-parameter CNN in PyTorch and we wanted it to run on a phone — offline, inside a React Native app, fast enough that the AI’s move feels instant. That turned out to be three problems stacked on top of each other: PyTorch doesn’t run on phones (so, ONNX), the obvious React Native binding doesn’t auto-link on Android, and we wanted real inference in our regression tests instead of mocks. This is how we solved all three.
Why ONNX at all
The Elite CNN is tiny: a 32-channel ResNet, three residual blocks, about 65K parameters. The final .onnx file on disk is ~262 KB. At that size, the temptation is to just export the weights as JSON and reimplement the forward pass in TypeScript. Don’t. You lose batched convolutions, you lose vectorization, you reinvent a runtime that’s already been written five times, and you get to hand-write an argmax every time the model architecture changes.
The alternative — CoreML for iOS, TFLite for Android — means two exports, two inference paths, and two subtly different sources of floating-point drift. ONNX Runtime ships a first-party onnxruntime-react-native package, which means one .onnx artifact runs on both platforms through the same runtime, from the same TypeScript. One model, two platforms, no per-platform retraining. That’s the deal.
Exporting from PyTorch
The exporter lives at ai/scripts/export_onnx.py. It’s short, but every line earns its place:
torch.onnx.export(
net,
dummy_input, # (1, 3, 8, 8)
args.output,
input_names=["obs"],
output_names=["logits"],
dynamic_axes={"obs": {0: "batch"},
"logits": {0: "batch"}},
opset_version=13,
)
import onnx
onnx.checker.check_model(onnx.load(args.output))- Dynamic batch axisWe mark dimension 0 of both
obsandlogitsas dynamic so the same file handles single-move inference on the phone and batched inference in our Node-side regression tests. One artifact, both call sites. - Opset version 13Opset is the cross-platform gotcha nobody warns you about. Each ONNX Runtime version supports a window of opset versions, and iOS and Android can drift when the wheels don’t match. 13 is the highest version that the pinned
onnxruntime-react-nativesupports on both platforms, so that’s the ceiling. - Checkpoint key fallbackOur distillation checkpoints write the state dict under
best_model, but PPO checkpoints usemodel. The loader tries both:ckpt.get("best_model", ckpt.get("model", )). A one-line fallback kept the exporter usable for every training pipeline we’ve shipped. onnx.checker.check_modelDirt cheap, catches a whole class of silent corruption bugs before the model touches a device. Always run it. If it fails, your Android users will find out instead of you.
Loading the model in React Native
On the app side, the whole inference wrapper lives in src/engine/ai/cnn-elite.ts. Three responsibilities: load the .onnx file out of the Expo asset bundle, build a session once and cache it, and convert the engine’s board state into the tensor shape the CNN expects.
The session cache uses a two-variable trick so concurrent first calls collapse into a single load:
let session: any = null;
let sessionLoading: Promise<any> | null = null;
async function getSession() {
if (session) return session;
if (sessionLoading) return sessionLoading;
sessionLoading = (async () => {
const { InferenceSession } = require('onnxruntime-react-native');
const [asset] = await Asset.loadAsync(
require('../../../assets/models/elite_cnn.onnx')
);
const uri = asset.localUri ?? asset.uri;
session = await InferenceSession.create(uri);
return session;
})();
return sessionLoading;
}The sessionLoading promise is the important bit. If the app dispatches two AI moves during the first second of gameplay — say, the player takes their first turn and the AI needs to respond immediately — both calls await the same promise instead of racing to create two sessions. We also export preloadCNNElite() so the app can warm the session during a loading screen and the first real AI move doesn’t eat a 200 ms cold start.
Shaping the board into a (1, 3, 8, 8) tensor
Cell Division supports 5×5 through 8×8 boards, but the CNN only knows about one input shape. Rather than train four models, we trained one 8×8 model with a third input channel that masks off the cells that don’t exist in smaller games. Channel 0 is the current player’s stones, channel 1 is the opponent’s stones, channel 2 is a valid-cell mask, and the NxN board is centered inside the 8×8 grid:
function boardOffset(N: number): number {
return Math.floor((MAX_BOARD - N) / 2);
}
// Channel 2: valid mask
for (let r = 0; r < N; r++) {
for (let c = 0; c < N; c++) {
obs[2 * 64 + (o + r) * MAX_BOARD + (o + c)] = 1.0;
}
}The training data was generated with the same offset + mask convention, so the CNN learned the mask as a first-class signal rather than pad-and-pray. After inference, we mask the logits against the set of legal moves and take the argmax — no softmax, no sampling, Elite is deterministic by design.
The Android linking landmine
This is where a weekend became two weekends. The iOS build worked on the first try: CocoaPods linked theonnxruntime-react-native pod, the session loaded, the CNN played. The Android build compiled, launched, and then crashed on the first AI move with a very unhelpful cannot read property ‘create’ of undefined.
The root cause: onnxruntime-react-native does not ship a react-native.config.js, so Expo’s auto-linking never registers its OnnxruntimePackage with Android’s MainApplication. On iOS this is fine because the pod links at the native project level. On Android, the JavaScript side can require() the module without error, but NativeModules.Onnxruntime is null at runtime, and InferenceSession.create(...) crashes trying to dispatch into a native module that was never registered.
Short-term fix: guard the call so we fail gracefully instead of taking down the game, and defer the require so even importing the file doesn’t blow up on an unlinked build:
function isOnnxAvailable(): boolean {
try {
const { NativeModules } = require('react-native');
return NativeModules.Onnxruntime != null;
} catch {
return false;
}
}
async function getSession(): Promise<any> {
if (session) return session;
if (sessionLoading) return sessionLoading;
if (!isOnnxAvailable()) {
throw new Error('ONNX native module not available on this platform');
}
sessionLoading = (async () => {
const { InferenceSession } = require('onnxruntime-react-native');
// ...
})();
}The deferred require is load-bearing. A top-level import would try to resolve the native module at module-load time, which means the engine file would throw during the initial JavaScript bundle evaluation on web and on any misconfigured Android build. Pushing the require inside getSession means the cost — and the risk — only lands when we actually ask for an Elite move.
But the guard alone doesn’t fix Android; it just stops the crash. To make Elite actually work, we still had to manually add the package to MainApplication.kt — one import, one add(OnnxruntimePackage()) inside getPackages(). That worked, until we ran npx expo prebuild --clean the next day and watched it get wiped out, because prebuild regenerates the native project from scratch from the Expo config.
The right answer is an Expo config plugin — a small JavaScript function that mutates the generated native source during prebuild, so the edit survives. The whole thing is 30 lines:
const { withMainApplication } = require('@expo/config-plugins');
const IMPORT_LINE =
'import ai.onnxruntime.reactnative.OnnxruntimePackage';
const PACKAGE_LINE = ' add(OnnxruntimePackage())';
module.exports = function withOnnxruntimePackage(config) {
return withMainApplication(config, (config) => {
let contents = config.modResults.contents;
if (!contents.includes(IMPORT_LINE)) {
contents = contents.replace(
/^(package .+\n)/m,
`$1\n${IMPORT_LINE}\n`
);
}
if (!contents.includes('OnnxruntimePackage()')) {
contents = contents.replace(
/(override fun getPackages\(\).*\n.*\.apply \{)\n/,
`$1\n${PACKAGE_LINE}\n`
);
}
config.modResults.contents = contents;
return config;
});
};Register it in app.config.js by listing './plugins/withOnnxruntimePackage' in the plugins array right after 'onnxruntime-react-native'. Both includes() checks make the plugin idempotent, so running prebuild twice is harmless. One file, one line in the config, zero manual intervention after every clean prebuild.
When a React Native package fails to auto-link, the fix is almost never to fork the package. Write a ten-line Expo config plugin that patches the generated native source. It’s smaller than a patch file, it survives prebuilds, and it keeps the upstream dependency completely untouched — so when the package finally ships proper auto-linking, you delete the plugin and move on.
Testing real ONNX inference in Jest
We wanted regression tests like hint suggestions never lose to Elite given the same position and the CNN’s move matches the Python argmax on a fixed fixture board. Both require the actual CNN — not a mock, not a stub, the real model running on the real bytes. Mocking would defeat the point: a bad export, a wrong opset, a transposed channel, a tensor-shape mismatch — none of those are visible to a mock.
The obstacle: onnxruntime-react-native’s native module doesn’t load inside jest-expo, and Jest’s default worker can’t dlopen random native libraries anyway. The fix is to run ONNX inference in a separate Node process that uses onnxruntime-node — the Node.js binding — pointed at the exact same .onnx file the app ships:
// jest.onnx-worker.js
const ort = require('onnxruntime-node');
const path = require('path');
let session;
async function init() {
session = await ort.InferenceSession.create(
path.resolve(__dirname, 'assets/models/elite_cnn.onnx')
);
process.send({ type: 'ready' });
}
process.on('message', async (msg) => {
if (msg.type === 'inference') {
const tensor = new ort.Tensor(
'float32', new Float32Array(msg.obs), msg.dims
);
const results = await session.run({ obs: tensor });
process.send({
type: 'result',
id: msg.id,
logits: Array.from(results.logits.data),
});
}
});
init();The Jest setup forks this worker once at suite start and talks to it via IPC (process.send / process.on('message')). Two reasons for the separate process. First, InferenceSession is expensive to build but cheap to call, so a persistent worker amortizes setup across every test that needs inference. Second, keeping onnxruntime-node out of Jest’s own process sidesteps the module-resolution war between jest-expo, onnxruntime-react-native, and onnxruntime-node — which all want to claim different onnxruntime submodules under the same require graph.
The payoff is significant. Every time a PR touches the CNN wrapper, the tests run real inference on the real .onnx artifact, and any export bug, opset drift, channel-order mistake, or shape mismatch fails in CI instead of in a player’s hand three days after release.
The pipeline end to end
- TrainDistill a 32-channel student from a larger AlphaZero teacher via
ai/scripts/train_student.py. - Export
ai/scripts/export_onnx.pyserializes to ONNX opset 13 with a dynamic batch axis and runsonnx.checkeron the result. - BundleThe
.onnxfile lands inassets/models/elite_cnn.onnxand ships as a regular Expo asset — no separate CDN, no OTA model fetch. - Load
src/engine/ai/cnn-elite.tsloads the file viaexpo-asset, caches the session with a coalesced-load guard, and defers theonnxruntime-react-nativerequire until it’s actually needed. - Link
plugins/withOnnxruntimePackage.jsregistersOnnxruntimePackagein Android’sMainApplicationon every prebuild so linking never rots. - Test
jest.onnx-worker.jsruns the same.onnxfile throughonnxruntime-nodein a forked child process, so hint and Elite regression tests exercise the real model.
Shipping a trained model to a mobile app is three problems in a trench coat: export, link, and test. ONNX handles the first, an Expo config plugin handles the second, a child-process worker handles the third. None of them are hard on their own — the thing that would have been hard was trying to ship without any of them.