Engineering

Inside the AI Translation Tool: How 23 Agent Tools and Human-in-the-Loop Approval Keep Translation Quality High

Eray Gündoğmuş

March 12, 2026·9 min read

Share

Inside the AI Translation Tool: How 23 Agent Tools and Human-in-the-Loop Approval Keep Translation Quality High

Table of Contents

Most AI translation tools follow the same pattern: paste text in, get translation out. That works for a quick email, but it falls apart when you are managing thousands of translation keys across a production application with multiple languages, a glossary of brand terms, and a team that needs to review changes before they ship.

We built Better i18n's AI system differently. Instead of a simple translation API, we created a conversational agent with 23 specialized tools, human-in-the-loop approval for every write operation, and an architecture designed for real-world translation management workflows.

This post walks through the engineering behind it.

Why an Agent, Not a Translation API

The typical machine translation review workflow looks like this: export strings, send them to a translation service, get results back, import them, manually review, fix issues, re-export. It is slow, error-prone, and disconnects the translation process from the context that makes translations accurate.

An agent-based approach changes the model fundamentally. Instead of operating on files, the AI operates on your project directly — reading your keys, understanding your glossary, checking your sync configuration, and proposing changes that you approve in real-time.

The key insight is that translation management is not a single task. It is a workflow that involves reading project state, making decisions, executing changes, and verifying results. An agent with multiple tools handles this naturally. A translation API does not.

The 23-Tool Architecture

The agent has access to 23 purpose-built tools, split into two categories with very different permission models.

Reading Tools: Full Autonomy

Ten tools give the agent read access to your project. These execute automatically without requiring approval, because they cannot modify data:

getTranslations — fetches translations with filtering by key, namespace, language, and status
getKeyDetails — retrieves metadata for individual keys including context notes, tags, and per-language status
getLanguages — lists configured languages with completion percentages
getProjectStats — returns project-wide metrics: total keys, languages, translation coverage
getDoctorReport — runs diagnostics to identify missing translations, unused keys, plural form issues, and terminology inconsistencies
getSyncs and getSyncDetails — inspects GitHub/GitLab sync integrations and their recent activity
getContentModels and getContentEntries — browses CMS content structure and entries
createPlan — generates an execution plan when the agent needs to coordinate multiple steps

The agent uses these tools to build context before proposing any changes. When you ask "translate all missing keys to French," the agent first calls getTranslations to identify exactly which keys are missing, then calls getProjectStats to understand the scope, before generating a single targeted proposal.

Writing Tools: Human-in-the-Loop Approval

Eleven tools can modify your project data. Every single one of them requires explicit human approval before execution. This is the core of our approach to AI translation quality — the AI proposes, the human decides.

Translation tools:

proposeTranslations — generates new translations for keys that are missing in target languages
proposeTranslationEdits — suggests improvements to existing translations based on context, glossary, or your feedback
translateBatch — processes multiple keys across multiple languages in a single operation

Key management tools:

proposeKeys — suggests new translation keys based on codebase analysis
proposeDeleteKeys — identifies unused or duplicate keys and proposes cleanup

Language management tools:

proposeLanguages — recommends new languages to add based on project needs
proposeLanguageEdits — modifies language display names, fallback chains, or configuration

Publishing tools:

publishChanges — pushes approved translations to the CDN or triggers a GitHub PR

Content management tools:

proposeContentEntries — creates or updates CMS content entries
proposeContentModel — suggests schema changes to content models
proposePublishEntries — queues content entries for publishing

Human-in-the-Loop: Engineering the Approval Flow

The term "human-in-the-loop" gets thrown around a lot in AI marketing. Here is how it actually works in our system.

When a writing tool is called, the agent does not execute it directly. Instead, it generates a proposal — a structured diff showing exactly what will change. The proposal appears in the chat interface as a reviewable artifact.

For translation proposals, this means you see:

The source string in your base language
The proposed translation in the target language
Any glossary terms that were applied
The confidence context (is this a simple UI label or a complex marketing sentence?)

You then have three options:

Approve all — accept every proposed change in one click
Selective approval — accept some translations and reject others
Request changes — tell the agent what to fix and it generates a revised proposal

Only after approval does the write operation execute. This is not a "confirm/cancel" dialog — it is a genuine review step where you can inspect, edit, and iterate.

Why This Matters for AI Translation Quality

Machine translation review is the bottleneck in most localization workflows. Teams either skip review (and ship errors) or review everything manually (and move slowly). Our HITL approach hits the middle ground:

The AI handles the 80% of translations that are straightforward
Humans focus their review effort on the 20% that require judgment
Every translation has a clear provenance: AI-generated, human-reviewed, or human-edited
The audit trail records who approved what, making compliance straightforward

Progressive Rendering: Streaming Translation Tables

When the agent generates translations for a batch of keys, the results do not appear all at once. The translation table streams into the chat interface progressively — each row renders as its translation completes.

This is an engineering choice driven by user experience. When you are translating 150 keys across 6 languages, that is 900 individual translations. Waiting for all 900 to complete before showing anything would mean staring at a loading spinner for minutes. Progressive rendering lets you start reviewing the first results immediately.

The implementation uses server-sent events to stream tool results back to the chat interface. The frontend maintains a mutable translation table component that appends rows as they arrive.

Context Management: Staying Grounded

Large language models have a tendency to lose context in long conversations. We address this with three mechanisms:

30-Second Project Context Cache

When the agent reads your project data, the results are cached for 30 seconds. If the agent needs to reference your project state multiple times within a multi-step operation, it hits the cache instead of making redundant API calls. This reduces latency and prevents the agent from seeing inconsistent state during a complex workflow.

Context Stripping (slimToolResults)

Tool responses from the Better i18n API can be large — a project with 2,000 keys and 12 languages generates substantial payloads. The slimToolResults system automatically strips non-essential data from tool responses before they enter the conversation context.

For example, when the agent calls getTranslations, the full response includes metadata like creation timestamps, version IDs, and user attribution. The slimToolResults pass retains only the data the agent needs: key names, source strings, and translations. This reduces token usage significantly and prevents context window overflow.

50-Step Conversation Limit

Each conversation supports up to 50 agent steps (tool calls). This is enough for complex workflows — translating an entire namespace, reviewing the results, making edits, and publishing — while preventing runaway loops. The step counter is visible in the UI so you always know how much capacity remains.

Chat History: Dual-Storage Architecture

Agent conversations are stored in two places simultaneously:

IndexedDB (browser-local) — provides instant conversation loading with zero network latency when you return to the dashboard
Postgres (server-side) — maintains a persistent, searchable audit trail of every agent interaction

The dual-storage approach solves two competing requirements. Developers want instant access to recent conversations (IndexedDB delivers sub-millisecond reads). Teams need audit trails for compliance and knowledge sharing (Postgres provides durable, queryable storage).

When you open the AI chat, conversations load from IndexedDB immediately. The Postgres copy syncs in the background and serves as the source of truth if local storage is cleared.

Real-World Workflow: Adding Korean to a Production App

Here is a concrete example of how the agent handles a real task.

Step 1: You ask — "I need to add Korean to the project. Translate everything in the common and settings namespaces."

Step 2: Agent reads — It calls getLanguages (sees Korean is not configured), getTranslations for the common namespace (finds 89 keys), and getTranslations for the settings namespace (finds 34 keys). Total: 123 keys to translate.

Step 3: Agent proposes language addition — It calls proposeLanguages to add Korean (ko) to the project. You see the proposal and approve it.

Step 4: Agent translates in batches — It calls translateBatch for the common namespace, then the settings namespace. Translations stream into the chat progressively. You see the Korean translations appearing alongside the English source strings.

Step 5: You review — You scan the translations, flag two that use overly formal register for a casual app UI, and tell the agent to adjust them.

Step 6: Agent revises — It calls proposeTranslationEdits with your feedback and generates revised translations for the two flagged strings. You approve.

Step 7: You publish — You tell the agent to publish, it calls publishChanges, and the Korean translations are live on the CDN.

Total time: about 10 minutes for 123 translations, reviewed and published. Without the agent, this workflow typically takes hours of export-translate-import-review cycles.

What We Chose Not to Build

Transparency about limitations matters as much as feature documentation.

No proprietary translation engine — we use Google Gemini as the underlying model. We do not claim a custom "neural translation engine" or proprietary AI.
No automated A/B testing of translations — you pick the model; there is no framework comparing outputs from multiple models.
No Translation Memory — we use glossary-based term consistency, not TM fuzzy matching. If you need TM, Better i18n is not the right tool today.
No guaranteed accuracy metrics — AI translation quality varies by language pair and content type. We recommend human review for all customer-facing content, which is exactly why HITL is built into every write operation.

Try It

The AI agent is available on all Better i18n plans. Open the dashboard, click the chat icon, and start with something simple: "Show me the translation status for this project."

From there, try a real task. Ask it to find missing translations, generate them, and walk you through the approval process. The agent is designed to be explored conversationally — you do not need to memorize tool names or API endpoints.

Get started with Better i18n →

Inside the AI Translation Tool: How 23 Agent Tools and Human-in-the-Loop Approval Keep Translation Quality High

Why an Agent, Not a Translation API

The 23-Tool Architecture

Reading Tools: Full Autonomy

Writing Tools: Human-in-the-Loop Approval

Human-in-the-Loop: Engineering the Approval Flow

Why This Matters for AI Translation Quality

Progressive Rendering: Streaming Translation Tables

Context Management: Staying Grounded

30-Second Project Context Cache

Context Stripping (slimToolResults)

50-Step Conversation Limit

Chat History: Dual-Storage Architecture

Real-World Workflow: Adding Korean to a Production App

What We Chose Not to Build

Try It

Related Posts

Online Translation Tools for Developers: Beyond Google Translate

AI-Powered Translation Workflows: From Machine Translation to Post-Editing

How Better i18n Secures Enterprise Translation Workflows: Auth, Encryption & Compliance

Explore More

For Developers

For Translators

For Product Teams

All Features