I noticed that the MNIST dataset for digit recognition is just a CSV file
They don't provide the images.
https://www.kaggle.com/c/digit-recognizer/data
Is it possible to get the corresponding images for the dataset?
1 Answer
Those csv files contain the actual data that would normally in image format, with each column being the value of a single pixel in the image (28 x 28 image gives 784 pixels). See also the description of the linked dataset:
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
If you need the data as the image format you can simply reshape them to a shape of (28, 28) using np.reshape.
- $\begingroup$ Thanks. I can use the training data set for training my model. But what about testing. It seems Kaggle website has provided a test_data.csv which only contains the values of the independent variable and not the values of the dependent variable. So we don't know the output for those. One way could be instead to use 75% of training dataset for training and rest for testing $\endgroup$Juan– Juan2021-08-04 10:04:34 +00:00Commented Aug 4, 2021 at 10:04
- $\begingroup$ That would indeed be the correct way, split the training dataset into a training and validation/test dataset so you can optimize your model hyperparameters and then predict on the actual test dataset provided by Kaggle to get your score. $\endgroup$Oxbowerce– Oxbowerce2021-08-04 10:18:20 +00:00Commented Aug 4, 2021 at 10:18