Revisions to Should I use a Simple Linear or Logistic Regression

added 441 characters in body

Source Link

edited Mar 28, 2022 at 6:58

Sextus Empiricus

93.9k
6
127
338

In this case of Poisson regression, an exponential function (making the link function the inverse, a log-function) might work well. Then you model the outcome effectively as a multiplication of terms. E.g. the expected number of counts is modeled as a product of coefficients one for each main effect. An example of how the exponential function makes a multiplicative model is here.

added 148 characters in body

Source Link

edited Mar 27, 2022 at 23:09

Sextus Empiricus

93.9k
6
127
338

In addition, for mixing two or more main effects there is an influence on the model depending on which function you use.

Interaction term

Post Undeleted by Sextus Empiricus

occurred Mar 27, 2022 at 20:46

added 2074 characters in body

Source Link

edited Mar 27, 2022 at 20:46

Sextus Empiricus

93.9k
6
127
338

BinomialPoisson regression

What you could use is a binomial regression Poisson regression. This would model the probabilityrate/counts for the number of a patient being female or male,patients occurring as a function of the predictor.

However, when you are looking at situations like age as a numeric predictor variable, then using some function as a logistic curve might give a difference.

I'm also feeling uncertain about which variable would be my predictor (x) and response (y) (y~x) if I want to see if sex is associated with race of study patients.

You would use an interaction term. The combination of both gender and race. Gender and race are both predictors of the response which is the count in the number of patients.

The result should be more or less the same as the $\chi^2$ test. The R-computation below demonstrates this

n = (2^6*3^4*5^2*11*283) response = c(0.003530840, 0.005185921, 0.076133731, 0.077347457, 0.014895730, 0.017102505, 0.005627276, 0.006730663, 0.336864173, 0.456581706)*n gender = rep(c("female", "male"),times = 5) background = rep(c("asian", "black", "hispanic", "other", "white"), each = 2) mod = glm(response ~ background * gender, family = poisson()) anova(mod, test = "Chisq") # Analysis of Deviance Table # # Model: poisson, link: log # # Response: response # # Terms added sequentially (first to last) # # Df Deviance Resid. Df Resid. Dev Pr(>Chi) # NULL 9 759919240 # background 4 752371092 5 7548148 < 2.2e-16 *** # gender 1 6411572 4 1136576 < 2.2e-16 *** # background:gender 4 1136576 0 0 < 2.2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 chisq.test(matrix(response,5,byrow=1)) # # Pearson's Chi-squared test # # data: matrix(response, 5, byrow = 1) # X-squared = 1142900, df = 4, p-value < 2.2e-16

The values of the $\chi^2$ statistic are close to each other for both methods, 1136576 and 1142900. The difference between the two methods is that the chi-squared test is considering the marginals, the totals of female/male and totals of backgrounds as, fixed and the Poisson regression does not.

Binomial regression

What you could use is a binomial regression. This would model the probability of a patient being female or male, as function of the predictor.

However, when you are looking at situations like age as a numeric predictor variable, then using some function as a logistic curve might give a difference.

Poisson regression

What you could use is a Poisson regression. This would model the rate/counts for the number of patients occurring as a function of the predictor.

However, when you are looking at situations like age as a numeric predictor variable, then using some function as a logistic curve might give a difference.

I'm also feeling uncertain about which variable would be my predictor (x) and response (y) (y~x) if I want to see if sex is associated with race of study patients.

You would use an interaction term. The combination of both gender and race. Gender and race are both predictors of the response which is the count in the number of patients.

The result should be more or less the same as the $\chi^2$ test. The R-computation below demonstrates this

n = (2^6*3^4*5^2*11*283) response = c(0.003530840, 0.005185921, 0.076133731, 0.077347457, 0.014895730, 0.017102505, 0.005627276, 0.006730663, 0.336864173, 0.456581706)*n gender = rep(c("female", "male"),times = 5) background = rep(c("asian", "black", "hispanic", "other", "white"), each = 2) mod = glm(response ~ background * gender, family = poisson()) anova(mod, test = "Chisq") # Analysis of Deviance Table # # Model: poisson, link: log # # Response: response # # Terms added sequentially (first to last) # # Df Deviance Resid. Df Resid. Dev Pr(>Chi) # NULL 9 759919240 # background 4 752371092 5 7548148 < 2.2e-16 *** # gender 1 6411572 4 1136576 < 2.2e-16 *** # background:gender 4 1136576 0 0 < 2.2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 chisq.test(matrix(response,5,byrow=1)) # # Pearson's Chi-squared test # # data: matrix(response, 5, byrow = 1) # X-squared = 1142900, df = 4, p-value < 2.2e-16

The values of the $\chi^2$ statistic are close to each other for both methods, 1136576 and 1142900. The difference between the two methods is that the chi-squared test is considering the marginals, the totals of female/male and totals of backgrounds as, fixed and the Poisson regression does not.