Where can I find a software library for pairwise matching (ideally, Python, R, Java)?

Question

I am looking for a library that implements a pairwise ranking algorithm. For example, if I have 200 writing samples from 100 people (two samples from each individual) and I want to identify which samples belong together (i.e., were written by the same person), what library could I use?

Do you have details about the number of samples written by a single person? Is it 200 together or by each? — Hima Varsha
– Hima Varsha, Commented Jul 13, 2016 at 8:51
Do you just want a person to handwriting match? Or a ranking giving the highest priority to the ones with the maximum match? — Hima Varsha
– Hima Varsha, Commented Jul 13, 2016 at 12:51
Just a match. E.g, if I have person_1_writing_sample_1, person_1_writing_sample_2, person_2_writing_sample_1, and person_2_writing_sample_2, I want to match the two former and the two latter. — You_got_it
– You_got_it, Commented Jul 13, 2016 at 13:04
Try k-means with 100 clusters. You should be able to find a library for it in every language. — Emre
– Emre, Commented Jul 13, 2016 at 18:50

K3---rnc · Accepted Answer · 2016-07-14 20:11:51Z

If you can transform those sentences into number vectors (e.g. into a bag of words or tf-idf representation), I guess you could use k-Means or hierarchical clustering functionality from Orange, a GUI and machine learning library written in Python.

It also has an add-on for text mining specifically, but I cannot attest to it as I haven't tried it yet.

Thanks. Ultimately, I decided to go with difference metrics (Jaccard, etc.). — You_got_it
– You_got_it, Commented Jul 26, 2016 at 19:44

Stack Exchange Network

Where can I find a software library for pairwise matching (ideally, Python, R, Java)?

1 Answer 1

Hot Network Questions

Where can I find a software library for pairwise matching (ideally, Python, R, Java)?

1 Answer 1

Related

Hot Network Questions