Because I am new to data analysis with python, I want to improve my skills with tutorials and adjusting working code from others.
At the moment I am working on the fruit_data_with_colors data set, and want to understand the python code, available at:
One of the examples at the beginning shows a scatter matrix of the different numeric input variables (height, width, mass, color). With the mentioned code, the colors in the plotted images are purple, brown, yellow and black. I would like to change this to more appealing colors (e.g. red, blue, green, black)
I looked at the documentation of matplotlib and think that I should adjust the "c = y" part of my code. https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.scatter.html
Trying "c = ['blue']" worked well, but if I add another color in the form of "c = ['blue', 'red']" an error occures:
ValueError: 'c' argument has 2 elements, which is not acceptable for use with 'x' with size 59, 'y' with size 59.
X = fruits[feature_names] y = fruits['fruit_label'] from matplotlib import cm cmap = cm.get_cmap('gnuplot') scatter = pd.scatter_matrix(X, c = y, marker = 'o', s=40, hist_kwds={'bins':15}, figsize=(9,9), cmap = cmap) plt.suptitle('Scatter-matrix for each input variable') plt.savefig('fruits_scatter_matrix')```