Table of Contents
Table of Contents
- Database Localization: Strategies for Storing Multilingual Data
- The Problem
- Strategy 1: Column Per Locale
- Querying
- Advantages
- Disadvantages
- Best for
- Strategy 2: Separate Translation Table
- Querying
- Advantages
- Disadvantages
- Best for
- Strategy 3: JSON Column
- Querying (PostgreSQL)
- Advantages
- Disadvantages
- Best for
- Comparison Summary
- Handling Locale Fallbacks
- Frequently Asked Questions
- Should translatable and non-translatable columns be in the same table?
- How do I handle full-text search across languages?
- What about NoSQL databases?
- How does this relate to UI string localization?
Database Localization: Strategies for Storing Multilingual Data
When building a multilingual application, one of the earliest architectural decisions is how to store translated content in the database. The approach you choose affects query performance, schema complexity, and how easily you can add new languages.
There is no single "correct" design — the best choice depends on the number of languages, volume of translatable content, query patterns, and team preferences. This guide covers the three most common strategies with practical examples.
The Problem
Consider a product catalog with names, descriptions, and categories. In a single-language application, the schema is straightforward:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW()
);
Once you need multiple languages, every text column becomes a localization challenge. The price and created_at columns are language-independent, but name and description need separate values for each locale.
Strategy 1: Column Per Locale
The simplest approach adds a column for each language:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name_en VARCHAR(255) NOT NULL,
name_de VARCHAR(255),
name_fr VARCHAR(255),
name_ja VARCHAR(255),
description_en TEXT,
description_de TEXT,
description_fr TEXT,
description_ja TEXT,
price DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW()
);
Querying
-- Fetch product with German translations, falling back to English
SELECT
id,
COALESCE(name_de, name_en) AS name,
COALESCE(description_de, description_en) AS description,
price
FROM products
WHERE id = 42;
Advantages
- Simple queries — No joins required. All data is in a single row.
- Easy to understand — The schema is self-documenting.
- Good read performance — Single table scan with no joins.
Disadvantages
- Schema changes for new languages — Adding a language requires an
ALTER TABLEfor every translatable column. For a table with 5 translatable fields and 10 languages, that's 50 columns. - Sparse data — Most rows will have NULL values for languages without translations.
- Hard to scale — Query complexity grows linearly with the number of languages.
Best for
Applications with a small, fixed set of languages (2-4) that rarely changes. Simple content models with few translatable fields.
Strategy 2: Separate Translation Table
This approach normalizes translations into a dedicated table:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
price DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE product_translations (
id SERIAL PRIMARY KEY,
product_id INTEGER NOT NULL REFERENCES products(id) ON DELETE CASCADE,
locale VARCHAR(10) NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
UNIQUE (product_id, locale)
);
CREATE INDEX idx_product_translations_lookup
ON product_translations (product_id, locale);
Querying
-- Fetch product with French translations, falling back to English
SELECT
p.id,
COALESCE(t.name, fallback.name) AS name,
COALESCE(t.description, fallback.description) AS description,
p.price
FROM products p
LEFT JOIN product_translations t
ON t.product_id = p.id AND t.locale = 'fr'
LEFT JOIN product_translations fallback
ON fallback.product_id = p.id AND fallback.locale = 'en'
WHERE p.id = 42;
Advantages
- No schema changes for new languages — Adding a language is just inserting new rows.
- Clean separation — Language-independent data stays in the main table.
- Flexible — Easy to query which languages are available for any entity.
- Standard pattern — Well-understood by ORMs and frameworks. Rails, Django, and Laravel all have libraries supporting this pattern.
Disadvantages
- Join required — Every query for translated data needs at least one JOIN.
- Fallback complexity — Language fallback chains (e.g.,
pt-BR→pt→en) require multiple JOINs or application logic. - More tables — Each translatable entity needs its own translation table.
Best for
Applications with many languages, content models that grow over time, and teams that value clean database normalization.
Strategy 3: JSON Column
Modern databases (PostgreSQL 9.4+, MySQL 5.7+, SQLite 3.38+) support JSON columns:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name JSONB NOT NULL DEFAULT '{}',
description JSONB DEFAULT '{}',
price DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW()
);
-- Insert with translations
INSERT INTO products (name, description, price)
VALUES (
'{"en": "Wireless Headphones", "de": "Kabellose Kopfhörer", "fr": "Écouteurs sans fil"}',
'{"en": "Premium wireless headphones with noise cancellation.", "de": "Premium kabellose Kopfhörer mit Geräuschunterdrückung."}',
79.99
);
Querying (PostgreSQL)
-- Fetch product with French translations, falling back to English
SELECT
id,
COALESCE(name->>'fr', name->>'en') AS name,
COALESCE(description->>'fr', description->>'en') AS description,
price
FROM products
WHERE id = 42;
Advantages
- No schema changes — New languages are just new JSON keys.
- No joins — All translations are in the same row.
- Flexible — Different rows can have different sets of languages.
- GIN indexing — PostgreSQL JSONB columns support GIN indexes for efficient querying.
Disadvantages
- Weaker type safety — The database cannot enforce that every JSON object has the required keys or value types.
- No referential integrity — Unlike the translation table approach, there's no foreign key to validate locale codes.
- ORM support varies — Some ORMs don't handle JSON columns natively, requiring custom serialization.
- Full-text search complexity — Searching across all languages in a JSON column requires specialized indexing.
Best for
Applications using PostgreSQL or MySQL with moderate translation volumes, teams that prefer minimal schema, and projects where translation completeness varies significantly across languages.
Comparison Summary
| Criterion | Column Per Locale | Translation Table | JSON Column |
|---|---|---|---|
| New language cost | Schema migration | Insert rows | No change |
| Query complexity | Simple | JOIN required | Medium |
| Read performance | Excellent | Good (with index) | Good |
| Type safety | Full | Full | Partial |
| ORM support | Built-in | Widely supported | Varies |
| Best for | Few fixed languages | Many languages | Flexible models |
Handling Locale Fallbacks
Regardless of which strategy you choose, your application needs a fallback chain for missing translations. A common pattern:
- Exact match (
fr-CAfor Canadian French) - Language match (
frfor generic French) - Default language (
enfor English)
This fallback logic is typically implemented in the application layer rather than in SQL, as it may involve user preferences and locale negotiation.
Frequently Asked Questions
Should translatable and non-translatable columns be in the same table?
It depends on your query patterns. If you frequently query translated content, keeping it close to the entity reduces joins. If you frequently query only non-translatable data (pricing, inventory), separating translations avoids loading unnecessary data.
How do I handle full-text search across languages?
PostgreSQL supports language-specific tsvector configurations. Create separate text search indexes for each language, or use the simple configuration for language-agnostic searching. With the translation table approach, you can create a tsvector column per row since each row already represents one language.
What about NoSQL databases?
Document databases like MongoDB naturally support the JSON approach — each document embeds translations as nested objects. This works well for content-heavy applications but lacks the referential integrity guarantees of relational databases.
How does this relate to UI string localization?
Database localization handles user-generated or admin-managed content (product names, page titles, CMS content). UI string localization — button labels, error messages, navigation text — is typically handled by i18n libraries (react-intl, vue-i18n, etc.) using JSON or other file-based formats. Most applications need both.