Levenshtein Distance — Compare Two Strings
Input & Options
A: 0 chars · B: 0 chars
Tips: Ctrl/Cmd + K focuses “A”. Ctrl/Cmd + Enter runs Compare.
Results
Distance: —
Similarity: —
Ops: —
Lengths: —
Deletion (A only)
Insertion (B only)
Substitution
A with changes highlighted
B with changes highlighted
Show aligned operations
What is Levenshtein Distance?
Levenshtein distance (also called edit distance) is the smallest number of single-character edits needed to turn one string into another. The allowed edits are insertion, deletion, and substitution. For example, turning kitten
into sitting
takes three edits (k→s, e→i, insert g), so the distance is 3.
Why people use it
- Spell checking & fuzzy search: find words that are “close enough” to a query.
- Data cleaning: match near-duplicate names, addresses, or product titles.
- Natural language & bioinformatics: compare tokens, lines, or genetic sequences.
- UX matching: tolerate typos in forms and search boxes.
Character vs. Word mode
- Character mode: great for short strings, usernames, variable names, and typos.
- Word mode: splits on whitespace and compares tokens, which is clearer for sentences and paragraphs. It answers “how many word-level changes are needed?”
Similarity score
We show a simple similarity percentage: 1 − (distance / max(length(A), length(B)))
. In word mode, “length” means word count.
Case, Unicode, and language notes
- Case sensitivity: toggle on/off. For case-insensitive matching, we compare lowercase forms but keep your original text for display.
- Unicode: operates on code points. Some grapheme clusters (e.g., emoji + modifiers) may count as multiple edits.
How it’s computed (light overview)
We use dynamic programming over a (len(A)+1) × (len(B)+1)
grid. The bottom-right cell is the distance; backtracking yields the sequence of edits we highlight.
Related metrics
- Damerau–Levenshtein: counts adjacent transpositions as one edit.
- Jaro / Jaro–Winkler: emphasize matching characters and order.
- Cosine / Jaccard on tokens: compare word sets or vectors for document similarity.