Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):
Get-LevenshteinDistanceOrig (= 1×, for comparison): the original function body (Copy&Paste); Get-LevenshteinDistanceOrig+ (> 3×, still insufficient): removed all time-expensive Write-Verbose output; Get-LevenshteinDistanceTry1 (> 50×, a substantial improvement): time-expensive pipeline to Measure-Object replaced by [math]::Min(n,m) static method (see .NET system.math class). The following are only minor advancement steps: Get-LevenshteinDistanceTry2 (~ 1×..2×): slight algorithm renovation: - instead calculating the cost (
$cost variable): if compared characters are equal then computing and comparing values $cellAbove, $cellLeft and $cellUpperLeft is useless as we already know the necessary value; - added: if strings are equal the LD is known (zero) and we can stop;
Get-LevenshteinDistance (~ 2×): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.
Comparison table: .\CR\164518test.ps1 | Format-Table -AutoSize
cmdlet similarity runtime (ms) lengths LD ------ ---------- ------------ ------- -- Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0 Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0 Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0 Get-LevenshteinDistanceTry2 equal .1795 106 106 0 Get-LevenshteinDistance equal .1539 106 106 0 Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79 Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79 Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79 Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79 Get-LevenshteinDistance stochastic 13.4852 106 102 79 Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4 Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4 Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4 Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4 Get-LevenshteinDistance similar 8.4234 106 102 4 Get-LevenshteinDistanceOrig different 6613.5267 106 102 106 Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106 Get-LevenshteinDistanceTry1 different 35.1824 106 102 106 Get-LevenshteinDistanceTry2 different 27.1371 106 102 106 Get-LevenshteinDistance different 13.8924 106 102 106
Column explanation:
cmdlet - function name similarity - in brief "like" similarity of input strings runtime (ms) - TotalMilliseconds lengths - lengths of input strings, space delimited LD - the Levenshtein Distance of input strings
Comparison script 164518test.ps1:
Function TestLD { param ([string]$Similarity = '') $aux = 0 Write-Progress -Activity "LD info ($Similarity strings)" ` -CurrentOperation 'Start' -PercentComplete $aux $cmdletTails = 'Orig','Orig+','Try1','Try2','' foreach ( $cmdletTail in $cmdletTails ) { $cmdlet = "Get-LevenshteinDistance$cmdletTail" $scriptBlock = { $LevenshteinDistance = & $cmdlet ` -CompareString $strC -DifferenceString $strD } $TimeSpan = Measure-Command -Expression $scriptBlock $aux += [int](100 / $cmdletTails.Count) Write-Progress -Activity "LD info ($Similarity strings)" ` -CurrentOperation $cmdlet -PercentComplete $aux [PSCustomObject]@{ 'cmdlet' = $cmdlet 'similarity' = $Similarity 'runtime (ms)' = $($TimeSpan.TotalMilliseconds. ToString('#####.###0',$cultureinfo). PadLeft(10)) 'lengths' = "$($strC.Length) $($strD.Length)" 'LD' = $LevenshteinDistance } } } . D:\PShell\CR\164518.ps1 # reload the functions $cultureinfo = [cultureinfo]::InvariantCulture # my one is different $strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia' $strD=$strC TestLD -Similarity 'equal' $strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est' TestLD -Similarity 'stochastic' $strC='a' * $strC.Length $strD='a' * $strD.Length TestLD -Similarity 'similar' $strD='x' * $strD.Length TestLD -Similarity 'different'
The main script 164518.ps1 (functions are defined here in the reverse order):
function Get-LevenshteinDistance{ param( [string]$CompareString, [string]$DifferenceString ) # Collect the string lengths $compareStringLength = $CompareString.Length $differenceStringLength = $DifferenceString.Length # If either of the strings are empty the LD is known and we can stop if($compareStringLength -eq 0){return $differenceStringLength} if($differenceStringLength -eq 0){return $compareStringLength} # If strings are equal the LD is known and we can stop if ($CompareString -ceq $DifferenceString) { return 0 } # Build the comparison matrix as a (rectangular) jagged array # $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1) $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1)) for($index=0; $index -le $differenceStringLength; $index++){ # Create row # $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1) $comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1)) # Populate the first column $comparisonMatrix[$index][0]=$index } # Populate the first row for($index=0; $index -le $compareStringLength; $index++){ $comparisonMatrix[0][$index]=$index } # Calculate the Levenshtein distance by working through each position in the matrix. # Working the columns for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++){ # Cycle each character in the list $compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable # Working the rows for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++){ # Cycle each character in the list (a variable for the only use) # $differenceCharacter = $differenceString[$rowIndex-1] # the only use # Calculate the cost if($compareCharacter -ceq $differenceString[$rowIndex-1]){ $comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)] } else { # $cost = 1 # unneeded variable for the only use # The cell immediately above plus 1 $cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1 # The cell immediately to the left plus 1 $cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1 # # The cell diagonally above and to the left plus the cost ↓ $cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1 # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position. $comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft) } } } # The cacluated LD will now be in the bottom right of the matrix. return $comparisonMatrix[$differenceStringLength][$compareStringLength] } function Get-LevenshteinDistanceTry2{ param( [string]$CompareString, [string]$DifferenceString ) # Collect the string lengths $compareStringLength = $CompareString.Length $differenceStringLength = $DifferenceString.Length # If either of the strings are empty the LD is known and we can stop if($compareStringLength -eq 0){return $differenceStringLength} if($differenceStringLength -eq 0){return $compareStringLength} # If strings are equal the LD is known and we can stop if ($CompareString -ceq $DifferenceString) { return 0 } # Build the comparison matrix and populate the first column and first row. $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1) # $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1)) # Populate the first row for($index=0; $index -le $compareStringLength; $index++){ $comparisonMatrix[0,$index]=$index } # Populate the first column for($index=0; $index -le $differenceStringLength; $index++){ $comparisonMatrix[$index,0]=$index } # Calculate the Levenshtein distance by working through each position in the matrix. # Working the columns for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++){ # Cycle each character in the list $compareCharacter = $compareString[$columnIndex-1] # Working the rows for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++){ # Cycle each character in the list $differenceCharacter = $differenceString[$rowIndex-1] # Calculate the cost if($compareCharacter -ceq $differenceCharacter){ $comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)] } else { $cost = 1 # a variable for the only use? Useless! # The cell immediately above plus 1 $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1 # The cell immediately to the left plus 1 $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1 # The cell diagonally above and to the left plus the cost $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position. # $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft | Measure-Object -Minimum | select -ExpandProperty Minimum # $comparisonMatrix[$rowIndex,$columnIndex] = ($cellAbove,$cellLeft,$cellUpperLeft | Measure-Object -Minimum).Minimum $comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft) } } } # The cacluated LD will now be in the bottom right of the matrix. return $comparisonMatrix[$differenceStringLength,$compareStringLength] } function Get-LevenshteinDistanceTry1{ #[cmdletbinding()] param( [string]$CompareString, [string]$DifferenceString ) # Collect the string lengths $compareStringLength = $CompareString.Length $differenceStringLength = $DifferenceString.Length ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'" ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'" # If either of the strings are empty the LD is known and we can stop if($compareStringLength -eq 0){return $differenceStringLength} if($differenceStringLength -eq 0){return $compareStringLength} # Build the comparison matrix and populate the first column and first row. $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1) # Populate the first row for($index=0; $index -le $compareStringLength; $index++){ $comparisonMatrix[0,$index]=$index } # Populate the first column for($index=0; $index -le $differenceStringLength; $index++){ $comparisonMatrix[$index,0]=$index } # Calculate the Levenshtein distance by working through each position in the matrix. # Working the columns for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++){ # Cycle each character in the list $compareCharacter = $compareString[$columnIndex-1] # Working the rows for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++){ # Cycle each character in the list $differenceCharacter = $differenceString[$rowIndex-1] ##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]" ##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter" # Calculate the cost $cost=if($compareCharacter -ceq $differenceCharacter){0}else{1} ##Write-Verbose "Cost: $cost" # The cell immediately above plus 1 $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1 ##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove" # The cell immediately to the left plus 1 $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1 ##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft" # The cell diagonally above and to the left plus the cost $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost ##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft" # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position. # $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft | Measure-Object -Minimum | select -ExpandProperty Minimum $comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft) } } # The cacluated LD will now be in the bottom right of the matrix. return $comparisonMatrix[$differenceStringLength,$compareStringLength] } function Get-LevenshteinDistanceOrig+{ [cmdletbinding()] param( [string]$CompareString, [string]$DifferenceString ) # Collect the string lengths $compareStringLength = $CompareString.Length $differenceStringLength = $DifferenceString.Length ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'" ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'" # If either of the strings are empty the LD is known and we can stop if($compareStringLength -eq 0){return $differenceStringLength} if($differenceStringLength -eq 0){return $compareStringLength} # Build the comparison matrix and populate the first column and first row. $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1) # Populate the first row for($index=0; $index -le $compareStringLength; $index++){ $comparisonMatrix[0,$index]=$index } # Populate the first column for($index=0; $index -le $differenceStringLength; $index++){ $comparisonMatrix[$index,0]=$index } # Calculate the Levenshtein distance by working through each position in the matrix. # Working the columns for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++){ # Cycle each character in the list $compareCharacter = $compareString[$columnIndex-1] # Working the rows for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++){ # Cycle each character in the list $differenceCharacter = $differenceString[$rowIndex-1] ##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]" ##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter" # Calculate the cost $cost=if($compareCharacter -ceq $differenceCharacter){0}else{1} ##Write-Verbose "Cost: $cost" # The cell immediately above plus 1 $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1 ##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove" # The cell immediately to the left plus 1 $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1 ##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft" # The cell diagonally above and to the left plus the cost $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost ##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft" # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position. $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft | Measure-Object -Minimum | select -ExpandProperty Minimum } } # The cacluated LD will now be in the bottom right of the matrix. return $comparisonMatrix[$differenceStringLength,$compareStringLength] } function Get-LevenshteinDistanceOrig{ [cmdletbinding()] param( [string]$CompareString, [string]$DifferenceString ) # Collect the string lengths $compareStringLength = $CompareString.Length $differenceStringLength = $DifferenceString.Length Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'" Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'" # If either of the strings are empty the LD is known and we can stop if($compareStringLength -eq 0){return $differenceStringLength} if($differenceStringLength -eq 0){return $compareStringLength} # Build the comparison matrix and populate the first column and first row. $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1) # Populate the first row for($index=0; $index -le $compareStringLength; $index++){ $comparisonMatrix[0,$index]=$index } # Populate the first column for($index=0; $index -le $differenceStringLength; $index++){ $comparisonMatrix[$index,0]=$index } # Calculate the Levenshtein distance by working through each position in the matrix. # Working the columns for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++){ # Cycle each character in the list $compareCharacter = $compareString[$columnIndex-1] # Working the rows for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++){ # Cycle each character in the list $differenceCharacter = $differenceString[$rowIndex-1] Write-Verbose "Matrix location: [$rowIndex, $columnIndex]" Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter" # Calculate the cost $cost=if($compareCharacter -ceq $differenceCharacter){0}else{1} Write-Verbose "Cost: $cost" # The cell immediately above plus 1 $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1 Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove" # The cell immediately to the left plus 1 $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1 Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft" # The cell diagonally above and to the left plus the cost $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft" # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position. $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft | Measure-Object -Minimum | select -ExpandProperty Minimum } } # The cacluated LD will now be in the bottom right of the matrix. return $comparisonMatrix[$differenceStringLength,$compareStringLength] }
Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:
- permanently: change
-ceq to -eq in their two occurrences, or - ad hoc, in a particular case: use
.ToUpper() or .ToLower() functions, e.g. as
Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper() # or Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()
itteratingshould probably beiterating\$\endgroup\$