I'm trying to locally replicate the pair classification task of MMTEB/MTEB. However, I didn't find train/dev sets for all datasets in this task.
Table 2 in the original MTEB paper (Mueninghoff et al, 2023) shows that there is no train data for the 3 pair classification datasets and only SprintDuplicateQuestions has a Dev data: 
However, the original MTEB paper also states on page 3 that an optimal binary threshold is determined:
A pair of text inputs is provided and a label needs to be assigned. Labels are typically binary variables denoting duplicate or paraphrase pairs. The two texts are embedded and their distance is computed with various metrics (cosine similarity, dot product, euclidean distance, manhattan distance). Using the best binary threshold accuracy, average precision, f1, precision and recall are computed. The average precision score based on cosine similarity is the main metric.
So I am wondering what data is used to determine that threshold?
Also, on page 3 Muenninghof et al (2023) say that various distance and performance metrics are used to find the optimal cutoff value (see quote above). But what is the exact algorithm to select the threshold considering the authors apply multiple metrics?