2
$\begingroup$

Using whatever definition of dissimilarity over sets that you'd like, what are the most dissimilar two digits in MNIST? I was thinking that a reasonable approach to answering the question would be to pass the two sets through some state-of-the-art VAE and then check the distances in the latent space. The dissimilarity metric could be distance to/from centroids of the clusters (easiest) or every point to every point in every other cluster ($O(n^2)$), or one of the other set measures of difference.

Has there been any research done on this (to the extent you even need research to answer the question)? Does anyone have an idea about what the answer might be? I've seen TSNE plots of the latent space for some VAEs, but they're not exactly rigorous measures of distance or dissimilarity.

$\endgroup$

1 Answer 1

1
$\begingroup$

Not sure if this constitutes a "study" but I have investigated using PCA to decompose the MNIST dataset to visualize in 2D:

 pca = PCA(n_components=2) pca.fit(ziptrain[:, 1:]) Z_train = pca.transform(ziptrain[:, 1:]) fig, ax = plt.subplots() for digit in np.unique(ziptrain[:, 0]): x = Z_train[ziptrain[:, 0]==digit, 0] y = Z_train[ziptrain[:, 0]==digit, 1] ax.scatter(x, y, label=int(digit), alpha=1.0, edgecolors='none', marker='${}$'.format(int(digit))) ax.legend() 

This yielded the following plot:

MNIST Data Decomposed using PCA

Visually I would say that 1's and 0's are very different. I'm sure you could continue the study by clustering them & measuring the cartesian distance from the centroids of their respective clusters.

It is important to note though that your definition of what "different" means will change the outcome of such an investigation and that this is simply one approach.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.