I'm performing Principal Component Analysis (PCA) on a dataset containing samples from four materials (a, b, c, d) with varying impurity levels (10%, 20%, 30%, and 40%). Additionally, there's a pure sample (p).
I'm aiming to create a score plot using R to visualize the separation of these samples based on their principal components. I'd like to effectively represent both the sample type (a, b, c, d, p) and the impurity level (10%, 20%, 30%, 40%) within the plot.
I'm looking for advice on how to best encode this dual information (sample type and impurity level) visually in the score plot. Are there established best practices or recommendations for representing multiple categorical variables in a PCA score plot?
Additional details
• The data has 17 columns (a, b, c, d with their levels and also the pure sample) and 949 rows.
Any insights or suggestions would be greatly appreciated.