I am working with a compositional dataset:
- A very efficient way of dealing with compositional data is by applying clr-transform (or a similar), which effectively converts them to data in Eucledean space, making applicable many known methods (like PCA, PLS, etc.)
- The dataset contains many zeros, which needs to be replaced by finite values prior to the clr transform.
- The dataset contains many features (a few times the number of samples) and could benefit from preliminary filtering - e.g., throwing away all the features that occur in less than 10% of samples.
The question is at which point to perform the filtering: before or after the clr transform. In the former case, the data stops being compositional and one may have to renormalize it (or not??). In the latter case it might have other undesirable effects or might need corrections (like re-centering clr transform.) In addition, the imputed values for zeros might depend on the number of features.
Are there known solutions/recommendations for this situation?