Top Diacritics Remover Tools for Developers and Content Creators

Build a Simple Diacritics Remover in JavaScript (Step-by-Step)Removing diacritics (accents, cedillas, tildes, etc.) from text is a common task when normalizing input for search, matching, sorting, URL slugs, or simple ASCII-only storage. This tutorial walks through several practical approaches in JavaScript: built-in Unicode normalization, a mapping table, and a small npm-friendly utility. Each approach includes code, trade-offs, and usage suggestions so you can pick what fits your needs.


Why remove diacritics?

  • Improves search and matching by making “résumé” match “resume”.
  • Simplifies generation of slugs and filenames.
  • Helps systems that only support ASCII characters.

JavaScript’s Unicode normalization can decompose characters into base letters plus combining marks. Removing the combining marks leaves the base ASCII (or non-accented) characters.

Example:

function removeDiacriticsNormalize(input) {   // NFD decomposes combined letters into letter + diacritic marks   return input.normalize('NFD').replace(/[̀-ͯ]/g, ''); } // Usage console.log(removeDiacriticsNormalize('résumé — São Paulo — Voilà')); // "resume — Sao Paulo — Voila" 

Pros:

  • Very short and fast for most Latin-script use-cases.
  • No external dependencies.

Cons:

  • Doesn’t convert some letters that are considered distinct letters (eg. Polish ł → l is fine, but some scripts/letters like German ß remain ß because it’s not a combining accent; ß might need special handling).
  • For full ASCII-only conversion you may want additional substitutions (e.g., “œ” → “oe”, “ß” → “ss”).

Approach 2 — Normalize + small post-processing map (balanced coverage)

Combine normalization with a small mapping table for characters that normalization doesn’t split into base + combining marks (ligatures, special letters).

Example:

const EXTRA_MAP = {   'ß': 'ss',   'Æ': 'AE', 'æ': 'ae',   'Œ': 'OE', 'œ': 'oe',   'Ø': 'O', 'ø': 'o',   'Ł': 'L', 'ł': 'l'   // add other special cases you need }; function removeDiacriticsWithMap(input) {   const normalized = input.normalize('NFD').replace(/[̀-ͯ]/g, '');   return normalized.replace(/[ -ɏ]/g, (ch) => EXTRA_MAP[ch] || ch); } // Usage console.log(removeDiacriticsWithMap('straße, Œuvre, Łódź')); // "strasse, OEuvre, Lodz" 

Pros:

  • Handles common special-cases while keeping code small.
  • Gives predictable ASCII outputs for commonly problematic characters.

Cons:

  • You must maintain the map for any additional characters you want to convert.
  • Map-based replacements may miss rare characters.

Approach 3 — Full mapping table (highest control)

If you need exact conversion for many languages, build or use a comprehensive mapping table covering Latin-extended ranges. This method is deterministic and works offline without relying on Unicode decomposition correctness across environments.

Example (truncated):

const FULL_MAP = {   'À':'A','Á':'A','Â':'A','Ã':'A','Ä':'A','Å':'A','Ā':'A','Ă':'A','Ą':'A',   'à':'a','á':'a','â':'a','ã':'a','ä':'a','å':'a','ā':'a','ă':'a','ą':'a',   'Ç':'C','ç':'c','Ć':'C','ć':'c','Č':'C','č':'c',   // ... many more entries }; function removeDiacriticsFullMap(input) {   return input.split('').map(ch => FULL_MAP[ch] || ch).join(''); } 

Pros:

  • Total control over every mapped character.
  • Useful for critical systems where deterministic mapping is required.

Cons:

  • Large data structure (increases bundle size).
  • Time-consuming to build and maintain.

Approach 4 — Use a tiny library (quickest for production)

If you prefer not to write and maintain mapping data, use a small, well-tested library like diacritics or remove-accents on npm. Example (pseudo):

npm install remove-accents 
import removeAccents from 'remove-accents'; console.log(removeAccents('résumé — São Paulo')); // "resume — Sao Paulo" 

Pros:

  • Saves development time.
  • Libraries usually cover many edge cases.

Cons:

  • Adds a dependency and slightly increases bundle size.
  • Verify maintenance and licensing before using.

Performance notes

  • normalize(‘NFD’).replace(…) is very fast in modern engines for typical strings.
  • Full mapping via split/map/join is slightly slower but predictable.
  • For large-scale processing (millions of strings), benchmark options in your environment and consider server-side batch normalization.

Tests and edge cases to consider

  • Ligatures: œ → oe, æ → ae.
  • Language-specific letters: ß → ss, ł → l.
  • Characters outside Latin script: Cyrillic, Greek, Arabic should generally be left unchanged unless you intentionally transliterate them.
  • Combining marks beyond U+036F (rare) — consider extending regex if you find others.
  • Unicode normalization availability: modern browsers and Node.js support it; very old environments might lack it.

Putting it together — a practical utility

A compact utility that uses normalization plus a small extras map, suitable for most web apps:

const EXTRA_MAP = {   'ß': 'ss',   'Æ': 'AE', 'æ': 'ae',   'Œ': 'OE', 'œ': 'oe',   'Ø': 'O', 'ø': 'o',   'Ł': 'L', 'ł': 'l' }; export function removeDiacritics(input) {   if (!input) return input;   const normalized = input.normalize('NFD').replace(/[̀-ͯ]/g, '');   return normalized.replace(/[ -ɏ]/g, ch => EXTRA_MAP[ch] || ch); } 

Use this in forms, slug generators, search normalization, or anywhere you need consistent ASCII-like text.


Final recommendations

  • For most cases: use normalize(‘NFD’) + regex and add a tiny map for special characters.
  • If you need broad, maintained coverage and don’t mind a dependency: use a lightweight npm package.
  • If you must control every mapping (legal/localization constraints): build a full mapping table and include tests.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *