What are the Most Dissimilar MNIST Digits?

Question

Using whatever definition of dissimilarity over sets that you'd like, what are the most dissimilar two digits in MNIST? I was thinking that a reasonable approach to answering the question would be to pass the two sets through some state-of-the-art VAE and then check the distances in the latent space. The dissimilarity metric could be distance to/from centroids of the clusters (easiest) or every point to every point in every other cluster ($O(n^2)$), or one of the other set measures of difference.

Has there been any research done on this (to the extent you even need research to answer the question)? Does anyone have an idea about what the answer might be? I've seen TSNE plots of the latent space for some VAEs, but they're not exactly rigorous measures of distance or dissimilarity.

Oliver Foster · Accepted Answer · 2020-10-21 18:44:23Z

Not sure if this constitutes a "study" but I have investigated using PCA to decompose the MNIST dataset to visualize in 2D:

 pca = PCA(n_components=2) pca.fit(ziptrain[:, 1:]) Z_train = pca.transform(ziptrain[:, 1:]) fig, ax = plt.subplots() for digit in np.unique(ziptrain[:, 0]): x = Z_train[ziptrain[:, 0]==digit, 0] y = Z_train[ziptrain[:, 0]==digit, 1] ax.scatter(x, y, label=int(digit), alpha=1.0, edgecolors='none', marker='${}$'.format(int(digit))) ax.legend()

This yielded the following plot:

Visually I would say that 1's and 0's are very different. I'm sure you could continue the study by clustering them & measuring the cartesian distance from the centroids of their respective clusters.

It is important to note though that your definition of what "different" means will change the outcome of such an investigation and that this is simply one approach.

Stack Exchange Network

What are the Most Dissimilar MNIST Digits?

1 Answer 1

Hot Network Questions

What are the Most Dissimilar MNIST Digits?

1 Answer 1

Related

Hot Network Questions