How a full translator runs on your phone
No server. No API. After a one-time model download, the entire speech-to-speech pipeline — recognition, translation, synthesis, even voice cloning — executes on the phone in your pocket. Here's the actual engineering, with measured numbers.
The pipeline
War story #1: the GPU that lied
The obvious plan was to run everything on the phone's GPU. Speech recognition loved it. Translation didn't: on mobile OpenCL, long autoregressive decodes accumulated numerical error until the output silently corrupted — plausible-looking text that drifted wrong. The fix wasn't more engineering heroics on the GPU path; it was accepting that a well-quantized 3B model on modern phone CPUs is both correct and fast (~1.2 s for a sentence). Speech recognition stays on the GPU, translation runs on CPU, and the stages pipeline so they overlap.
War story #2: the licensing minefield
Shipping open models commercially eliminates most of the leaderboard. Meta's NLLB-200 translates beautifully — and is CC-BY-NC, non-commercial. Popular open TTS engines are GPLv3 (viral for an app) or "research only." Every model in this app was chosen twice: once for quality, once for a license that legally ships — which is how the translation stage ended up on Apache-2.0 MADLAD-400 and the voice-clone stack on permissively licensed models.
Measured on real hardware
Samsung Galaxy S24 Ultra (Snapdragon 8 Gen 3); timings from in-app instrumentation. GPU stages vary ±10% run to run.
| Speech recognition (5.9 s clip, base model) | ~1.1 s | free tier |
| Speech recognition (5.9 s clip, large-v3-turbo) | ~9 s | Pro max-accuracy tier |
| Translation (short sentence, MADLAD-3B warm) | ~1.2 s | all tiers — pre-loaded at app start |
| Translation model load from disk | ~0.7 s | happens in the background at launch |
| Voice-clone synthesis engine start (warm cache) | ~4.3 s first use | then resident |
Models aren't bundled in the APK — they download once (encrypted, ~30 MB/s including decryption) and live on the device. After that, airplane mode is a supported configuration: that's the whole point. The privacy property isn't a policy promise — there is simply no server for your conversations to reach.
Try it free — unlimited text translation + 10 spoken translations — then a one-time unlock from $2.99. No subscription.