0
$\begingroup$

I am looking for a library that implements a pairwise ranking algorithm. For example, if I have 200 writing samples from 100 people (two samples from each individual) and I want to identify which samples belong together (i.e., were written by the same person), what library could I use?

$\endgroup$
6
  • $\begingroup$ Do you have details about the number of samples written by a single person? Is it 200 together or by each? $\endgroup$ Commented Jul 13, 2016 at 8:51
  • $\begingroup$ It is 200 together (i.e., two samples per person). $\endgroup$ Commented Jul 13, 2016 at 12:16
  • $\begingroup$ Do you just want a person to handwriting match? Or a ranking giving the highest priority to the ones with the maximum match? $\endgroup$ Commented Jul 13, 2016 at 12:51
  • $\begingroup$ Just a match. E.g, if I have person_1_writing_sample_1, person_1_writing_sample_2, person_2_writing_sample_1, and person_2_writing_sample_2, I want to match the two former and the two latter. $\endgroup$ Commented Jul 13, 2016 at 13:04
  • $\begingroup$ Try k-means with 100 clusters. You should be able to find a library for it in every language. $\endgroup$ Commented Jul 13, 2016 at 18:50

1 Answer 1

0
$\begingroup$

If you can transform those sentences into number vectors (e.g. into a bag of words or tf-idf representation), I guess you could use k-Means or hierarchical clustering functionality from Orange, a GUI and machine learning library written in Python.

It also has an add-on for text mining specifically, but I cannot attest to it as I haven't tried it yet.

$\endgroup$
1
  • $\begingroup$ Thanks. Ultimately, I decided to go with difference metrics (Jaccard, etc.). $\endgroup$ Commented Jul 26, 2016 at 19:44

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.