I want to use the package 'MatchIt' in R to perform propensity score analysis. However, my dataset has 100 variables and to reduce the dimensionality I'm using Principal Components Analysis(PCA) and selecting around 60 of them. My question is, can matching be done on the PCs so obtained? Or can only the original dataset be used for this?
- $\begingroup$ Why would you want to do matching on the PC scores? What's wrong with doing propensity scores using the 100 variables you have on hand directly? $\endgroup$StatsStudent– StatsStudent2017-09-02 15:08:03 +00:00Commented Sep 2, 2017 at 15:08
- $\begingroup$ Wouldn't reducing the dimensionality of the dataset help with the matching process? $\endgroup$Martand Aditya– Martand Aditya2017-09-03 18:16:46 +00:00Commented Sep 3, 2017 at 18:16
1 Answer
Both principal components and propensity scores are lower-dimensional summaries of vectors of covariates. It's possible that matching on PCs will yield adequate balance, but there is no theoretical reason to believe this will be so. There is a theoretical reason to believe matching on the propensity score will achieve balance, though; it has been shown that the propensity score is a balancing score, which means conditioning on it (e.g., with matching) yields covariate balance. But, of course, since we are dealing with the estimated propensity score and not the true propensity score, there is no guarantee that conditioning on the estimated propensity score will yield balance. No matter what you are matching on, you must assess balance. If you are using MatchIt, use the cobalt package to assess balance.
To enter PCs instead of propensity scores as the distance score to be matched upon, use the distance = argument in matchit(), and provide the vector containing your estimated PCs. You can do this with any value, be it propensity scores you have estimated yourself or otherwise outside MatchIt (e.g., using GBM from the twang package).
Perhaps instead you are asking about using the PC as a covariate in a propensity score model instead of using all the covariates you have available. Again, you can try this, see if matching yields balance, and, if not, respecify. There are no rules to generating propensity scores except that the model cannot depend on variables affected by treatment. Your goal is to achieve covariate balance, not to have a theoretically valid propensity score model. If your approach is one that works (i.e., one that achieves balance), then use it!