In my university statistics book it says "Dichotomous categorical variables are easily handled in MRA. This is because they are by definition, an interval (continuous) measure.". However I thought that categorical variables are by definition not continuous measures - would someone maybe be able to explain this further? Thank you very much!
- 2$\begingroup$ That sounds wrong. What book was it? Can you give us more of the quote? Maybe there is some context that makes it sensible. (Also, dichotomous variables are easily handled in multiple regression, but not because they are continuous). (The quote may be a typo .... I've edited a couple textbooks and typos are quite common). $\endgroup$Peter Flom– Peter Flom2024-06-11 11:07:42 +00:00Commented Jun 11, 2024 at 11:07
- 5$\begingroup$ When I was younger, MRA had the primary meaning of Moral Re-Armament. It presumably means multiple regression analysis here, but in my view it's not an abbreviation to be encouraged except between consenting adults. $\endgroup$Nick Cox– Nick Cox2024-06-11 12:15:00 +00:00Commented Jun 11, 2024 at 12:15
- $\begingroup$ Thanks very much for your response and help with this! :) It is a statistics book from my psychology research and statistics university course but unfortunately I believe its only been made available within the course as its written by the course coordinator - its followed by "Since there is only one interval, all intervals are equal. However, polytomous categorical variables (i.e., those with more than two levels) are clearly not continuous." $\endgroup$izzi3880– izzi38802024-06-16 07:52:23 +00:00Commented Jun 16, 2024 at 7:52
- 1$\begingroup$ The NOIR terminology for scales of variables was introduced by a psychologist, S.S. Stevens, and it seems no accident that it remains much mentioned by some psychologists, It is of some use as terminology, but little as a scheme for prescribing or proscribing which methods to use. Whether a scale has a true zero has some bearing on explaining e,g, why the coefficient of variation may be useless or misleading. As a more common issue, the idea that you shouldn't take means of ordinal variables is exaggerated; just take those means if they help analysis. $\endgroup$Nick Cox– Nick Cox2024-06-16 09:04:43 +00:00Commented Jun 16, 2024 at 9:04
1 Answer
I disagree and yet in a limited sense also I agree with that (currently) unsourced statement.
Binary (indicator, dichotomous, Boolean, logical, one-hot, quantal) variables coded as 0 and 1 are arguably not by definition interval and certainly not by definition continuous; they are surely discrete.
(At the same time binary indicators are special cases of count variables, and count variables surely qualify as ratio scale, because ratios make sense and the zero point is not arbitrary.)
But binary indicators are special in an elementary but fundamental sense. It makes perfect sense to take the means of a sample of 0s and 1s and to focus analysis on mean indicators as proportions, as if the binary variable arises from an underlying variable of interest, which can be treated as (approximately) continuous. It is as simple as this: if you code female as 1 and male as 0, the mean of a sample, say 1 1 1 1 1 1 1 0 0 0, is a proportion female, here 0.7.
It is often quite practical and positive to persuade people new to statistics to code binary variables as 0 or 1, not say 1 or 2, so that this key feature can be exploited. That way lies connections not just to binomial distributions but to logit and probit models and beyond.
Miniature rant The NOIR classification -- nominal, ordinal, interval, ratio -- can be used to make some valuable distinctions, but it can also be used confusingly. Whether it is authors or readers who are confused could be discussed at enormous length, but one of several limitations of that classification, or of how it is discussed, is that it doesn't really address the special and invaluable role of indicator variables, whether as outcomes or as predictors.
One point often missed in discussion, and relevant here, is that data may well arrive in one form but are transmutable to other forms without loss of important or even any information. So, original observations of the form "frog", "toad", "newt" may be denigrated as nominal scale, but that doesn't or shouldn't inhibit analysis. We can create indicators, is frog? is toad? is newt?, or count categories, or work with proportions, and so on. Hence, we can quantify in various ways without making indefensible assumptions or assertions.
- 1
- $\begingroup$ Thank you so much for taking the time to clarify this, this helps a lot and is much appreciated! $\endgroup$izzi3880– izzi38802024-06-16 07:55:21 +00:00Commented Jun 16, 2024 at 7:55