Questions tagged [categorical-data]
Categorical (also called nominal) data can take on a limited number of possible values called categories. Categorical values "label", they do not "measure". Please use [ordinal-data] tag for discrete but ordered data types.
3,624 questions
2 votes
1 answer
94 views
How do I estimate the linear effect for a factor so that my estimate doesn't depend on the sample size?
I’m trying to use the R poly() function with degree 1 to force glm to interpret a factor linearly. I’m puzzled by the fact that the size of the sample seems to increase the coefficient of the ...
1 vote
0 answers
23 views
Is there a way to perform a correspondence analysis with ordered variables?
I am trying to perform a correspondence analysis on a dataset of anatomical measurements of ecologically relevant features. Most of these variables are ordered factor variables representing binning of ...
1 vote
0 answers
13 views
Log-linear models and multiple comparisons: exploring multiple categorical and binary variables
I'm trying to understand how three categorical variables affect several binary variables. I am roughly following these instructions. Here is what my data look like (not my real data): Binary answers ...
2 votes
1 answer
314 views
Why are ordinal variable levels not kept in order in glm?
I've been following the method illustrated here: Polynomial contrasts for regression to transform the results .L, .Q, .C, etc. of a glm ordinal factor regression in the values for each of the levels ...
0 votes
0 answers
102 views
Linearity assumption with categorical mean-encoded variables
I'm struggling to understand the linearity assumption when running OLS with continuous dependent var and categorical independent variables that have been mean-encoded (simple group mean per category). ...
0 votes
0 answers
69 views
Confusion about stange results of difference checking test (Classical Chi-square test and Bayesian Chi-square test)
I am a newbie at conducting difference checking test (Chi square test). When I make contegency table for doing Chi square test (classical and Bayesian tests), I get some phenomena that they would be ...
2 votes
1 answer
59 views
Analyzing Differences in Dependent Categorical Variable Given Two Different Subject Types
I am trying to analyze some survey data in R but I am a bit confused about how to run the right type of analysis. In the survey of college students, the participants were put in a hypothetical ...
0 votes
0 answers
64 views
Which technique should I use to test the independence categorical variables over repeated samples
I have individual level data with a performance measure (good/bad) and characteristic variables for the individual (e.g. gender). I usually analyse this using a chi-squared test to see if the ...
4 votes
1 answer
151 views
How to generate random categorical data when number of categories is very large?
Problem in brief I would like to generate several samples of iid categorical data. The standard approach does not work because the potential number of categories is large, and I do not want to impose ...
0 votes
0 answers
68 views
meaning of "residuals" in calculating correlations from Spearman 1904
The free Statistics package "JASP" has a data library that illustrates various tests and operations. One of them is Factor Analysis. They use the data from Spearman's 1904 "General ...
0 votes
0 answers
47 views
Level-wise effect sizes of a categorical variable in a GLM
I am running a GLM (Gaussian Family; Identity link) on some medical data. I intend to find out if the level of disease severity has any effect on task performance. A minimum reproducible example (...
6 votes
2 answers
163 views
Are there rules of thumb for the sample size required when using a categorical predictor in linear regression?
I’ve had a reviewer suggest that I use ethnicity as a covariate in a linear regression. Some ethnic groups in the sample are small enough that I am a little worried that I will overfit if I do this. ...
2 votes
1 answer
124 views
GLM with 2 variables with factors, where neither has a "baseline"
I am trying to do a GLM with a dataset. My dataset consists of days individuals go on a social outing, and whether the outing was "better than average" (subjective). I have recorded the ...
9 votes
2 answers
291 views
Treating two columns in R with shared factors with the same coefficients
I am attempting to do analysis on a dataset using a GLM. In this dataset I have two columns with codes in about individuals, and trying to infer whether an individual passes. For example: ...
4 votes
2 answers
256 views
Should I dummy code my categorical variable in SEM model?
I am working on doing a path analysis and using lavaan(). One of my endogenous variables is an ordered factor HOWEVER, the difference between each group is not ...