Goal is to evaluate chess players using a novel analysis system I'm been working on -- not all wins are created equal, finding the only move in razor sharp positions is better than finding the best move when the ten-best-alternates are negligibly worse, etc.
Current dataset I'm working with towards proof of concept has 30 players. The design matrix has players as the columns, but each player gets two columns: one for when they're playing as white, one for when they're playing as black. Each row of the design matrix represents half of a match, and 1/0/-1 dummies are used for white/not present/black.
Example: if Player 4 and Player 9 played a match, the design matrix will have two rows for this match. One row will have p4w assigned a "1" and p9b assigned a "-1". The other row will have p4b assigned a "-1" and p9w assigned a "1". All other player columns are 0.
The result vector is the Engine's score for the player playing as white in that half of the match.
There's also two other columns, Sw and Sb, to attempt to quantify the value of being white first in any given match and if a penalty exists for the player who started as black once they switch to white -- since white always moves first, and white wins more games than black, black is more likely to be disadvantaged after the first game.
Using matrix math rather than an R function.
csv <- read.csv("~/chess.csv", header=TRUE) engine <- as.numeric(csv$Engine) # ready design matrix/remove dropped variables csv$Engine <- NULL csv$Sb <- NULL csv$P30w <- NULL csv$P30b <- NULL # readies X and Y X <- data.matrix(csv) Y <- engine # remove copies remove(csv) remove(engine) # Add one column of "1" to X one.col <- matrix(1, nrow(X), 1) X <- cbind(X, one.col) # transposing X X.t = t(X) # X'X, X'Y X.t.X <- X.t %*% X X.t.Y <- X.t %*% Y # MATHS betahat = solve(X.t.X) %*% X.t.Y Here's the CSV: http://www.filedropper.com/chess_1
Right from the top, I have to drop Sb -- it's redundant. I then am forced to drop a player to defeat the "system is computationally singular" error. In this case, I'm dropping the same player lm() would: the last one.
I have no philosophical objections to dropping variables but for the purposes of this, for evaluating players against each other, the incompleteness is troublesome.
Using Ridge Regression "works" to prevent any variable from being dropped, but this is unsatisfying -- are the results really then meaning what they should? X + 0 doesn't help matters for this problem either.
Are there any other tools I'm missing? Is ridge regression the right path to take for this problem but, rather than penalize towards zero, penalize towards priors?