Table of Contents
Table of Contents
- Why an Agent, Not a Translation API
- The 23-Tool Architecture
- Reading Tools: Full Autonomy
- Writing Tools: Human-in-the-Loop Approval
- Human-in-the-Loop: Engineering the Approval Flow
- Why This Matters for AI Translation Quality
- Progressive Rendering: Streaming Translation Tables
- Context Management: Staying Grounded
- 30-Second Project Context Cache
- Context Stripping (slimToolResults)
- 50-Step Conversation Limit
- Chat History: Dual-Storage Architecture
- Real-World Workflow: Adding Korean to a Production App
- What We Chose Not to Build
- Try It
Most AI translation tools follow the same pattern: paste text in, get translation out. That works for a quick email, but it falls apart when you are managing thousands of translation keys across a production application with multiple languages, a glossary of brand terms, and a team that needs to review changes before they ship.
We built Better i18n's AI system differently. Instead of a simple translation API, we created a conversational agent with 23 specialized tools, human-in-the-loop approval for every write operation, and an architecture designed for real-world translation management workflows.
This post walks through the engineering behind it.
Why an Agent, Not a Translation API
The typical machine translation review workflow looks like this: export strings, send them to a translation service, get results back, import them, manually review, fix issues, re-export. It is slow, error-prone, and disconnects the translation process from the context that makes translations accurate.
An agent-based approach changes the model fundamentally. Instead of operating on files, the AI operates on your project directly — reading your keys, understanding your glossary, checking your sync configuration, and proposing changes that you approve in real-time.
The key insight is that translation management is not a single task. It is a workflow that involves reading project state, making decisions, executing changes, and verifying results. An agent with multiple tools handles this naturally. A translation API does not.
The 23-Tool Architecture
The agent has access to 23 purpose-built tools, split into two categories with very different permission models.
Reading Tools: Full Autonomy
Ten tools give the agent read access to your project. These execute automatically without requiring approval, because they cannot modify data:
- getTranslations — fetches translations with filtering by key, namespace, language, and status
- getKeyDetails — retrieves metadata for individual keys including context notes, tags, and per-language status
- getLanguages — lists configured languages with completion percentages
- getProjectStats — returns project-wide metrics: total keys, languages, translation coverage
- getDoctorReport — runs diagnostics to identify missing translations, unused keys, plural form issues, and terminology inconsistencies
- getSyncs and getSyncDetails — inspects GitHub/GitLab sync integrations and their recent activity
- getContentModels and getContentEntries — browses CMS content structure and entries
- createPlan — generates an execution plan when the agent needs to coordinate multiple steps
The agent uses these tools to build context before proposing any changes. When you ask "translate all missing keys to French," the agent first calls getTranslations to identify exactly which keys are missing, then calls getProjectStats to understand the scope, before generating a single targeted proposal.
Writing Tools: Human-in-the-Loop Approval
Eleven tools can modify your project data. Every single one of them requires explicit human approval before execution. This is the core of our approach to AI translation quality — the AI proposes, the human decides.
Translation tools:
- proposeTranslations — generates new translations for keys that are missing in target languages
- proposeTranslationEdits — suggests improvements to existing translations based on context, glossary, or your feedback
- translateBatch — processes multiple keys across multiple languages in a single operation
Key management tools:
- proposeKeys — suggests new translation keys based on codebase analysis
- proposeDeleteKeys — identifies unused or duplicate keys and proposes cleanup
Language management tools:
- proposeLanguages — recommends new languages to add based on project needs
- proposeLanguageEdits — modifies language display names, fallback chains, or configuration
Publishing tools:
- publishChanges — pushes approved translations to the CDN or triggers a GitHub PR
Content management tools:
- proposeContentEntries — creates or updates CMS content entries
- proposeContentModel — suggests schema changes to content models
- proposePublishEntries — queues content entries for publishing
Human-in-the-Loop: Engineering the Approval Flow
The term "human-in-the-loop" gets thrown around a lot in AI marketing. Here is how it actually works in our system.
When a writing tool is called, the agent does not execute it directly. Instead, it generates a proposal — a structured diff showing exactly what will change. The proposal appears in the chat interface as a reviewable artifact.
For translation proposals, this means you see:
- The source string in your base language
- The proposed translation in the target language
- Any glossary terms that were applied
- The confidence context (is this a simple UI label or a complex marketing sentence?)
You then have three options:
- Approve all — accept every proposed change in one click
- Selective approval — accept some translations and reject others
- Request changes — tell the agent what to fix and it generates a revised proposal
Only after approval does the write operation execute. This is not a "confirm/cancel" dialog — it is a genuine review step where you can inspect, edit, and iterate.
Why This Matters for AI Translation Quality
Machine translation review is the bottleneck in most localization workflows. Teams either skip review (and ship errors) or review everything manually (and move slowly). Our HITL approach hits the middle ground:
- The AI handles the 80% of translations that are straightforward
- Humans focus their review effort on the 20% that require judgment
- Every translation has a clear provenance: AI-generated, human-reviewed, or human-edited
- The audit trail records who approved what, making compliance straightforward
Progressive Rendering: Streaming Translation Tables
When the agent generates translations for a batch of keys, the results do not appear all at once. The translation table streams into the chat interface progressively — each row renders as its translation completes.
This is an engineering choice driven by user experience. When you are translating 150 keys across 6 languages, that is 900 individual translations. Waiting for all 900 to complete before showing anything would mean staring at a loading spinner for minutes. Progressive rendering lets you start reviewing the first results immediately.
The implementation uses server-sent events to stream tool results back to the chat interface. The frontend maintains a mutable translation table component that appends rows as they arrive.
Context Management: Staying Grounded
Large language models have a tendency to lose context in long conversations. We address this with three mechanisms:
30-Second Project Context Cache
When the agent reads your project data, the results are cached for 30 seconds. If the agent needs to reference your project state multiple times within a multi-step operation, it hits the cache instead of making redundant API calls. This reduces latency and prevents the agent from seeing inconsistent state during a complex workflow.
Context Stripping (slimToolResults)
Tool responses from the Better i18n API can be large — a project with 2,000 keys and 12 languages generates substantial payloads. The slimToolResults system automatically strips non-essential data from tool responses before they enter the conversation context.
For example, when the agent calls getTranslations, the full response includes metadata like creation timestamps, version IDs, and user attribution. The slimToolResults pass retains only the data the agent needs: key names, source strings, and translations. This reduces token usage significantly and prevents context window overflow.
50-Step Conversation Limit
Each conversation supports up to 50 agent steps (tool calls). This is enough for complex workflows — translating an entire namespace, reviewing the results, making edits, and publishing — while preventing runaway loops. The step counter is visible in the UI so you always know how much capacity remains.
Chat History: Dual-Storage Architecture
Agent conversations are stored in two places simultaneously:
- IndexedDB (browser-local) — provides instant conversation loading with zero network latency when you return to the dashboard
- Postgres (server-side) — maintains a persistent, searchable audit trail of every agent interaction
The dual-storage approach solves two competing requirements. Developers want instant access to recent conversations (IndexedDB delivers sub-millisecond reads). Teams need audit trails for compliance and knowledge sharing (Postgres provides durable, queryable storage).
When you open the AI chat, conversations load from IndexedDB immediately. The Postgres copy syncs in the background and serves as the source of truth if local storage is cleared.
Real-World Workflow: Adding Korean to a Production App
Here is a concrete example of how the agent handles a real task.
Step 1: You ask — "I need to add Korean to the project. Translate everything in the common and settings namespaces."
Step 2: Agent reads — It calls getLanguages (sees Korean is not configured), getTranslations for the common namespace (finds 89 keys), and getTranslations for the settings namespace (finds 34 keys). Total: 123 keys to translate.
Step 3: Agent proposes language addition — It calls proposeLanguages to add Korean (ko) to the project. You see the proposal and approve it.
Step 4: Agent translates in batches — It calls translateBatch for the common namespace, then the settings namespace. Translations stream into the chat progressively. You see the Korean translations appearing alongside the English source strings.
Step 5: You review — You scan the translations, flag two that use overly formal register for a casual app UI, and tell the agent to adjust them.
Step 6: Agent revises — It calls proposeTranslationEdits with your feedback and generates revised translations for the two flagged strings. You approve.
Step 7: You publish — You tell the agent to publish, it calls publishChanges, and the Korean translations are live on the CDN.
Total time: about 10 minutes for 123 translations, reviewed and published. Without the agent, this workflow typically takes hours of export-translate-import-review cycles.
What We Chose Not to Build
Transparency about limitations matters as much as feature documentation.
- No proprietary translation engine — we use Google Gemini as the underlying model. We do not claim a custom "neural translation engine" or proprietary AI.
- No automated A/B testing of translations — you pick the model; there is no framework comparing outputs from multiple models.
- No Translation Memory — we use glossary-based term consistency, not TM fuzzy matching. If you need TM, Better i18n is not the right tool today.
- No guaranteed accuracy metrics — AI translation quality varies by language pair and content type. We recommend human review for all customer-facing content, which is exactly why HITL is built into every write operation.
Try It
The AI agent is available on all Better i18n plans. Open the dashboard, click the chat icon, and start with something simple: "Show me the translation status for this project."
From there, try a real task. Ask it to find missing translations, generate them, and walk you through the approval process. The agent is designed to be explored conversationally — you do not need to memorize tool names or API endpoints.