New York City provides tens of gigs of data of taxi routes all over the city. What I'd like to do, is use this data (or some other method), to come up with an algorithm that can take a persons GPS data over a short span of time (say an hour), and answer the question: Is this person driving a cab?
The algorithm should work in any location, not just NYC. The idea is that I'd like to be able to determine patterns, that signal that a route being driven by a person is the type of route a person driving a cab would take.
Ideally, I'd like to write this in Ruby. But I am open to other suggestions, approaches, and implementations. Links to projects I should research, suggestions on languages to use, approaches to take, etc are all appreciated.