I'm working with a dataset consisting of multiple CSV files, each representing time series data of accelerations (x, y, z) captured during vibration events. For each event, a sensor records data for the entire duration and then stops—so each file contains a full vibration event.
I've applied z-scoring, standardization, and PCA to reduce dimensionality. Then, I used k-means clustering on the principal components and obtained a meaningful clustering into 6 clusters. Since I don't have labeled anomaly data, I analyzed the distribution of samples across clusters. I noticed that Cluster 0 contains the majority of samples, Cluster 1 has significantly fewer, and the remaining clusters contain only a handful.
Based on this, I assumed that Cluster 0 likely represents normal behavior. I trained an autoencoder only on the data from Cluster 0 and then used it to test/validate data from the other clusters, aiming to detect anomalies based on reconstruction error.
Do you think this is a valid approach in my case? Would you suggest any improvements or alternative methods for anomaly detection in this kind of dataset?
Thanks in advance!