SEO

Machine Translation Post-Editing (MTPE): Best Practices for Quality

Eray Gündoğmuş
Eray Gündoğmuş
·13 min read
Share
Machine Translation Post-Editing (MTPE): Best Practices for Quality

Machine Translation Post-Editing (MTPE): Best Practices for Quality

Machine translation post-editing (MTPE) has moved from an experimental workflow to the mainstream practice of the localization industry. As machine translation quality has improved—particularly with neural MT systems from DeepL, Google, Microsoft, and specialized engines—the question is no longer "should we use MT?" but "how do we use it effectively?"

MTPE is the practice of using machine-generated translation as a starting point, then having human translators review and correct the output to reach the required quality level. Done well, it can reduce translation costs by 30-50% and cut turnaround times significantly. Done poorly, it produces inconsistent, error-prone content that damages your brand and user trust.

This guide covers the full MTPE workflow—from MT engine selection through quality measurement and process optimization.

The MTPE Quality Spectrum

Not all MTPE is the same. The industry distinguishes between two main approaches:

Light Post-Editing (LPE)

Light post-editing aims for "good enough" quality—output that is accurate and comprehensible, but not necessarily polished prose. The goal is correctness, not style.

LPE guidelines typically instruct editors to:

  • Fix errors that change meaning or cause misunderstanding
  • Correct grammar errors that impede comprehension
  • NOT improve style, restructure sentences, or apply brand voice
  • NOT fix minor awkwardness if meaning is clear

LPE is appropriate for: internal documentation, large-scale content with short shelf life, gist translation for information access.

Full Post-Editing (FPE)

Full post-editing aims for translation quality equivalent to conventional human translation. The output should be indistinguishable from a translation that was done from scratch.

FPE guidelines instruct editors to:

  • Fix all errors (accuracy, grammar, style, terminology)
  • Apply brand voice and tone guidelines
  • Ensure terminology matches the approved glossary
  • Restructure sentences where the MT output is awkward or unnatural

FPE is appropriate for: customer-facing content, marketing, legal documents, product UI, high-visibility content.

Selecting the Right MT Engine

MT engine selection significantly impacts post-editing effort and overall quality. Key considerations:

General-Purpose vs. Domain-Specific Engines

General-purpose engines (Google Translate, DeepL, Microsoft Translator) perform well across a broad range of content types. They're a reasonable starting point for most localization programs.

Domain-specific or custom-trained engines can outperform general-purpose engines significantly for specialized content:

  • Legal: Legal-specific MT engines trained on case law and legal documents
  • Medical: Medical MT engines trained on clinical literature and drug information
  • Technical: Engines trained on product documentation and technical manuals

Custom engine training typically requires 500,000+ high-quality sentence pairs in the domain.

Language Pair Quality Varies Dramatically

MT quality is not uniform across language pairs. "High-resource" language pairs with abundant training data (English↔French, English↔German, English↔Spanish, English↔Chinese) perform far better than "low-resource" pairs (English↔Nepali, English↔Zulu, English↔Icelandic).

Before committing to an MT+MTPE workflow for a language pair, run a pilot:

  1. Translate 1,000-2,000 words of representative content
  2. Have a qualified translator evaluate the raw MT output
  3. Measure estimated post-editing effort compared to translating from scratch
  4. Calculate cost-effectiveness based on actual post-editing time

MT Evaluation Metrics

When evaluating MT engines, use a combination of:

BLEU score: Automated metric comparing MT output to human reference translations. Useful for comparing engines but not reliable as a standalone quality indicator.

COMET: Neural MT quality estimation metric that correlates better with human judgments than BLEU. Increasingly preferred in the industry.

Human evaluation: Have a qualified linguist score a sample of MT output on a 1-5 scale for adequacy (is the meaning preserved?) and fluency (does it read naturally?).

Post-editing effort: The most operationally useful metric—measure the time it takes post-editors to bring MT output to required quality, compared to translating from scratch.

Building the MTPE Workflow

Pre-Translation Preparation

Before MT is applied to your content:

Terminology integration: Feed your approved glossary into the MT engine (most engines support terminology glossaries). This reduces terminology errors significantly and speeds up post-editing.

Translation memory leverage: Apply your existing TM before MT. Exact matches and high-fuzzy matches from TM are cheaper and higher quality than MT output. Only send the remaining segments (new or low-match content) to the MT engine.

Source quality review: Clean up source content before MT processing. Typos, inconsistent terminology, and complex sentence structures all degrade MT output quality. See our guide on translation context and source quality.

Content filtering: Some content types produce poor MT output regardless of engine quality: highly creative content, puns, cultural references, idiomatic expressions. Flag these for human translation rather than MTPE.

Post-Editing Guidelines

Develop clear, written post-editing guidelines for each content type and quality tier. Guidelines should specify:

  • Error types to fix: Required fixes (accuracy errors, critical terminology) vs. optional improvements (style, flow)
  • What NOT to do: Common over-editing behaviors that slow down the workflow without improving quality
  • Terminology requirements: How to handle unknown terms, brand names, product names
  • Formatting rules: When to preserve MT formatting vs. adjust it
  • Escalation criteria: When to abandon MT output and translate from scratch

Quality Assurance Integration

MTPE requires systematic QA to catch patterns of MT errors:

Linguistic Quality Assurance (LQA): Sample-based review of post-edited content by a senior linguist. Identify categories of errors (terminology, grammar, style) and trace them back to MT engine weaknesses or post-editor gaps.

Automated QA tools: Tools like Xbench, Verifika, or built-in TMS QA check for:

  • Terminology inconsistency against glossary
  • Untranslated segments
  • Formatting errors
  • Number and date format issues
  • Punctuation errors

Post-editing effort tracking: Track time per segment type to identify content categories where MT is not providing productivity gains.

Training Post-Editors

Post-editing is a distinct skill from translation. Good translators don't automatically become good post-editors. The common failure mode is "over-editing"—applying the same effort as full human translation, negating the cost savings.

Effective post-editor training covers:

Understanding MT strengths and weaknesses: What types of errors are common for this engine/language pair? What types of errors does the engine handle well?

Working with the post-editing mindset: The goal is to fix errors efficiently, not to improve the output beyond the required quality level. Resist the urge to rewrite sentences that are adequate but not ideal.

Keyboard shortcuts and CAT tool efficiency: Post-editors work faster when they're fluent in their CAT tool. Shortcut keys for accepting segments, common replacements, and QA functions matter.

Time tracking and productivity targets: Post-editors should understand their productivity benchmarks and work toward them.

For comparison of different translation approaches, see AI translation vs. human translation.

Measuring MTPE Productivity and Quality

Key Productivity Metrics

Words per hour (WPH): Average post-editing speed. Industry benchmarks:

  • Light post-editing: 2,000–3,000+ words/hour
  • Full post-editing: 1,200–2,000 words/hour
  • Human translation from scratch: 500–800 words/hour

Track WPH by content type, language pair, and MT engine to identify where MTPE is most effective.

Post-editing effort (PEE): The ratio of characters changed in post-editing to total characters. Calculated automatically by most CAT tools. Lower PEE = less editing = more effective MT.

Productivity gain: Compare WPH for MTPE vs. human translation for the same content type. If productivity gain is <30%, MT may not be providing sufficient value for that content type.

Quality Metrics

Apply translation quality metrics systematically to MTPE output:

MQM (Multidimensional Quality Metrics) or LISA QA Model: Structured error taxonomies for classifying translation errors by type and severity. Apply to LQA samples from post-edited content.

Customer-facing feedback: Monitor user feedback, support tickets, and reviews mentioning translation quality. These are lagging indicators but reflect real-world quality perception.

A/B testing: For high-volume content, A/B test MT+MTPE output against human translation to measure conversion rate, engagement, or support ticket rates.

Common MTPE Mistakes to Avoid

Skipping pilot testing: Never roll out MTPE for a language pair or content type without a pilot that measures actual post-editing effort.

Using the same guidelines for all content types: Light PE guidelines for marketing content will produce inadequate results. Different content types need different workflows.

Not providing glossaries to the MT engine: Terminology errors are the most common and most damaging MT errors. Glossary integration is non-negotiable.

Ignoring post-editor feedback: Post-editors surface patterns of MT errors. Collect, analyze, and act on their feedback to improve the workflow.

Applying MTPE where it doesn't make economic sense: If post-editing effort is equivalent to human translation, the MT step adds cost without benefit. Identify these cases and route them to human translation directly.

Forgetting to update the translation memory: Post-edited segments must be added back to the TM. If they're not, you lose the learning from the post-editing effort.

Integrating MTPE Into Your Localization Pipeline

Modern MTPE workflows integrate with translation management systems and CI/CD pipelines:

  1. Source content pushed to TMS (manually or via i18n CI/CD automation)
  2. TMS applies TM leverage (exact and fuzzy matches from previous translations)
  3. Remaining segments sent to MT engine via API
  4. MT output delivered to post-editors in CAT tool interface within TMS
  5. Post-editors review and correct segments
  6. LQA sampling applied to QA samples
  7. Approved translations exported and deployed

Look for TMS platforms that support:

  • Direct MT engine API integrations (not manual copy-paste)
  • Post-editing effort tracking per segment
  • Productivity reporting by editor and content type
  • Automated QA with glossary checking

Take your app global with better-i18n

better-i18n combines AI-powered translations, git-native workflows, and global CDN delivery into one developer-first platform. Stop managing spreadsheets and start shipping in every language.

Get started free → · Explore features · Read the docs