I need to group together people driving together based on GPS data. Data are collected by mobile phones. From each user we receive them in batches every 10 seconds. Each batch have list of GPS data (location, speed, direction) collected every 2 seconds.
Ideal solution would be to process this data in real time and identify/update groups of people driving together. But we may receive data from users out-of-order (e.g., due to connectivity loss). Eventually we should get all entries, but this makes real time processing much more complicated.
Instead, I want to start with post-processing first. I plan to normalize data from each user in given period using linear regression - to have locations at same timestamps. And then group users together using some clustering algorithm. Would this be a good approach? If yes, then which clustering algorithm would you recommend. Or maybe there better ways to solve this?