Using whatever definition of dissimilarity over sets that you'd like, what are the most dissimilar two digits in MNIST? I was thinking that a reasonable approach to answering the question would be to pass the two sets through some state-of-the-art VAE and then check the distances in the latent space. The dissimilarity metric could be distance to/from centroids of the clusters (easiest) or every point to every point in every other cluster ($O(n^2)$), or one of the other set measures of difference.
Has there been any research done on this (to the extent you even need research to answer the question)? Does anyone have an idea about what the answer might be? I've seen TSNE plots of the latent space for some VAEs, but they're not exactly rigorous measures of distance or dissimilarity.
