To answer your question "Is there a reason why this shouldn't be done?":
Are you familiar with the concept of linear dependence? The columns of your $X$ matrix must be linearly independent, otherwise there will be multiple coefficient vectors that produce the same fit. I'll edit
An example:
set.seed(123987) link <- function(x) exp(x) / (1 + exp(x)) curve(link(x), -5, 5) # Maps R to [0, 1] n <- 100 df <- data.frame(x=runif(n, -0.5, 0.5)) # A continuous predictor, x df$f_1 <- factor(sample(letters[1:3], size=n, replace=T), levels=letters[1:3]) # Factor colors <- c("green", "purple", "blue") df$f_2 <- factor(sample(colors, size=n, replace=T), levels=colors) # A second factor df$y <- 1 * (runif(n) < link(rnorm(n) + df$x + ifelse(df$f_1=="a", -1, ifelse(df$f_1=="b", 1, 2)) + ifelse(df$f_2=="green", -0.5, ifelse(df$f_2=="purple", 0, 5)))) stopifnot(setequal(unique(df$y), c(0, 1))) fit <- glm(y ~ x + f_1 + f_2, data=df, family=binomial("logit")) coefficients(fit) # Constant, x, f_1b, f_1c, f_2purple, f_2blue X <- matrix(1, nrow=n, ncol=length(fit$coefficients)) # Manually create X matrix X[, 2] <- df$x ## No column for "a" X[, 3] <- 1*(df$f_1 == "b") X[, 4] <- 1*(df$f_1 == "c") ## No column for "green" X[, 5] <- 1*(df$f_2 == "purple") X[, 6] <- 1*(df$f_2 == "blue") colnames(X) <- c("constant", "x", "f_1b", "f_1c", "f_2green", "f_2purple") Y <- matrix(df$y, ncol=1) colnames(Y) <- "y" fit2 <- glm(Y ~ 0 + X, family=binomial("logit"), data=list(Y, X)) # X already includes const all(coefficients(fit) == coefficients(fit2)) # True # What happens if we drop the constant and put all levels of f_1 and f_2 in our matrix X? X <- matrix(NA, nrow=n, ncol=length(fit$coefficients) + 1) X[, 1] <- df$x X[, 2] <- 1*(df$f_1 == "a") X[, 3] <- 1*(df$f_1 == "b") X[, 4] <- 1*(df$f_1 == "c") X[, 5] <- 1*(df$f_2 == "green") X[, 6] <- 1*(df$f_2 == "purple") X[, 7] <- 1*(df$f_2 == "blue") colnames(X) <- c("x", "f_1a", "f_1b", "f_1c", "f_2green", "f_2purple", "f_2blue") ## The problem with this matrix is that the columns are linearly dependent X[, 2] + X[, 3] + X[, 4] # Gives a vector of all 1s -- do you understand why? X[, 5] + X[, 6] + X[, 7] # Gives a vector of all 1s, for the same reason zero_vector <- X[, 2] + X[, 3] + X[, 4] - (X[, 5] + X[, 6] + X[, 7]) all(zero_vector == 0) # True
If you have one factor, you can drop the constant in your model and addestimate coefficients for all factor levels. (Note that this produces the exact same fit either way, just with different interpretations of the coefficients).
But when you have two factors, it doesn't make sense to try and estimate coefficients for all levels of both factors: that will create linearly dependent columns in your X. You always have to drop one level from one factor (or two levels, one from each factor, if you include a simple R exampleconstant).
There is another aspect of your question which is about statistical significance. I think you slightly misunderstand the meaning of the coefficients in your model, and how the interpretation changes depending on whether or not you've included a constant.