0
$\begingroup$

I am working with a compositional dataset:

  • A very efficient way of dealing with compositional data is by applying clr-transform (or a similar), which effectively converts them to data in Eucledean space, making applicable many known methods (like PCA, PLS, etc.)
  • The dataset contains many zeros, which needs to be replaced by finite values prior to the clr transform.
  • The dataset contains many features (a few times the number of samples) and could benefit from preliminary filtering - e.g., throwing away all the features that occur in less than 10% of samples.

The question is at which point to perform the filtering: before or after the clr transform. In the former case, the data stops being compositional and one may have to renormalize it (or not??). In the latter case it might have other undesirable effects or might need corrections (like re-centering clr transform.) In addition, the imputed values for zeros might depend on the number of features.

Are there known solutions/recommendations for this situation?

$\endgroup$
1
  • 2
    $\begingroup$ Re "needs to be replaced:" not necessarily so. An alternative approach, extending the CLR strategy, is discussed at stats.stackexchange.com/a/259223/919. Because that's really an application of Box-Cox transformations in EDA, the methods to determine finite "start values" apply; these are discussed at stats.stackexchange.com/a/6177/919 (generally) and stats.stackexchange.com/a/60455/919 (with a specific example). Many will question your "filtering" approach, but to opine about that, we would need more information about its basis and your objectives for it. $\endgroup$ Commented Nov 1 at 16:27

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.