Mole is a simple solution that demonstrates how to off-load the collecting of file system information, such as the lengths and attributes of files, by using a separate process.

Continue reading

# All posts by m1chal1s

# Calculating a hash code for a binary stream

Some time ago, as I was organizing my digital photographs, I found out I had a lot of duplicates.

One of the ways to check for perfect duplicates of files is to compare their hash codes.

Continue reading

# Locks

Here’s a simple example of double-checked locking in C#.

Continue reading

# HardRandom

The C# Random class is not thread-safe.

Continue reading

# Permutations

Have you ever wanted to enumerate through all the permutations of a set in C#?

Continue reading

# A fast implementation of the Levenshtein algorithm in C#

Given two strings, Levenshtein gives us the distance between these strings expressed as the minimum count of character operations (deletions, insertions, or substitutions) that are necessary to convert one string to the other.

To give a few examples, here are all the Levenshtein distances of the strings “Jack”, “Jake”, “Moe”, “Jo”, “Joe”, “Stew”, and “Stewart”:

First string | Second string | Levenshtein |
---|---|---|

Jack | Jake | 2 |

Jack | Moe | 4 |

Jack | Jo | 3 |

Jack | Joe | 3 |

Jack | Stew | 4 |

Jack | Stewart | 6 |

Jake | Moe | 3 |

Jake | Jo | 3 |

Jake | Joe | 2 |

Jake | Stew | 4 |

Jake | Stewart | 6 |

Moe | Jo | 2 |

Moe | Joe | 1 |

Moe | Stew | 3 |

Moe | Stewart | 6 |

Jo | Joe | 1 |

Jo | Stew | 4 |

Jo | Stewart | 7 |

Joe | Stew | 3 |

Joe | Stewart | 6 |

Stew | Stewart | 3 |

The Levenshtein value of two identical strings is always 0. That is why I have omitted all the rows where the first string is the same as the second string. Also, the Levenshtein function is commutative. That is, Levenshtein(s, t) equals Levenshtein(t, s).

Levenshtein is a rather simple algorithm – but it’s a useful one. It allows us to easily compare anything that can be expressed as a string.

There’s a rather good implementation of the Levenshtein algorithm on the Wikipedia page, but I’ve made a few changes of my own. Why? Well, even though the changes I’ve made are minor, my implementation is faster. In fact, depending on the hardware, my implementation is 5% to 20% faster.

To give an example, I’ve run both implementations (the one on Wikipedia, and mine) on a modern laptop with a 4th generation i7 processor 1073741824 (2^30) times – that is a bit more than a billion calls. This many calls take 00:09:10.1372578 (9 minutes and ≈10 seconds) on the Wikipedia algorithm, and 00:07:32.4470743 (7 minutes and ≈32 seconds) on mine. That’s 17,76% faster – an improvement of about 424 extra calls per millisecond.

Calls | Original (Wikipedia) | Improved | % |
---|---|---|---|

67108864 | 00:00:33.8682674 | 00:00:27.3321221 | 19,30% |

134217728 | 00:01:07.7492047 | 00:00:55.2858276 | 18,40% |

268435456 | 00:02:17.7560969 | 00:01:52.9447724 | 18,01% |

536870912 | 00:04:35.9671041 | 00:03:46.1908114 | 18,04% |

1073741824 | 00:09:10.1372578 | 00:07:32.4470743 | 17,76% |

So, without further ado, here’s the Levenshtein code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
public static int Levenshtein(string s, string t) { if (s == t) return 0; if (s.Length == 0) return t.Length; if (t.Length == 0) return s.Length; var tLength = t.Length; var columns = tLength + 1; var v0 = new int[columns]; var v1 = new int[columns]; for (var i = 0; i < columns; i++) v0[i] = i; for (var i = 0; i < s.Length; i++) { v1[0] = i + 1; for (var j = 0; j < tLength; j++) { var cost = (s[i] == t[j]) ? 0 : 1; v1[j + 1] = Math.Min(Math.Min(v1[+j] + 1, v0[j + 1] + 1), v0[j] + cost); v0[j] = v1[j]; } v0[tLength] = v1[tLength]; } return v1[tLength]; } |

# Average

I am obsessed with code performance, and calculating the performance of long-running algorithms entails – a lot of the times – calculating averages. Continue reading