Skip to main content
2 of 2
typo
Reinderien
  • 71.2k
  • 5
  • 76
  • 256

Calculate Levenshtein distance between two strings in Python

I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:

from difflib import ndiff def calculate_levenshtein_distance(str_1, str_2): """ The Levenshtein distance is a string metric for measuring the difference between two sequences. It is calculated as the minimum number of single-character edits necessary to transform one string into another """ distance = 0 buffer_removed = buffer_added = 0 for x in ndiff(str_1, str_2): code = x[0] # Code ? is ignored as it does not translate to any modification if code == ' ': distance += max(buffer_removed, buffer_added) buffer_removed = buffer_added = 0 elif code == '-': buffer_removed += 1 elif code == '+': buffer_added += 1 distance += max(buffer_removed, buffer_added) return distance 

Then calling it as:

similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2)) 

How sloppy/prone to errors is this code? How can it be improved?

Kyra_W
  • 333
  • 1
  • 2
  • 5