3
$\begingroup$

I'm performing Principal Component Analysis (PCA) on a dataset containing samples from four materials (a, b, c, d) with varying impurity levels (10%, 20%, 30%, and 40%). Additionally, there's a pure sample (p).

I'm aiming to create a score plot using R to visualize the separation of these samples based on their principal components. I'd like to effectively represent both the sample type (a, b, c, d, p) and the impurity level (10%, 20%, 30%, 40%) within the plot.

I'm looking for advice on how to best encode this dual information (sample type and impurity level) visually in the score plot. Are there established best practices or recommendations for representing multiple categorical variables in a PCA score plot?

Additional details

• The data has 17 columns (a, b, c, d with their levels and also the pure sample) and 949 rows.

Any insights or suggestions would be greatly appreciated.

$\endgroup$
2
  • $\begingroup$ Do you have multiple samples for each material and each impurity level? $\endgroup$ Commented Dec 24, 2024 at 10:53
  • $\begingroup$ Thank you. The data comes from an electronic nose equipped with eight different sensors. The data I'm discussing here is specifically from a single sensor. So, in that context, yes, I have multiple samples for each material and impurity level. Actually, I am also uncertain whether to analyze all samples together or separately. $\endgroup$ Commented Dec 24, 2024 at 11:40

1 Answer 1

1
$\begingroup$

You can use colour and shape to represent the two dimensions in your plot. If this is too dense on a single plot, you could also consider using rows and/or columns of plots, showing different levels of one of the dimensions in each row/column. Since you're using ggplots, you can use facet_grid() or facet_wrap() to achieve this.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.