0

I am trying to run svm (from e1071 package) on a document-feature matrix produced by the package quanteda. I start by training the svm on training data:

svm_fit <- svm(x=dfm_train, y=as.factor(y_train), kernel="radial", cost=10) 

where dfm_train is an S4 object of class dfm, produced by the dfm() function in quanteda. Next, I want to apply the resulting model to a validation set. dfm_val is also produced by applying dfm() to the validation set observations, and then making sure the features match the ones in the training dfm:

dfm_val <- dfm_match(dfm_val, features = featnames(dfm_train)) 

However, when I run:

predictions_val <- predict(svm_fit, newx=dfm_val, type="class") 

The predict.svm() function ignores the newx input, as it does when the newx columns don't match the dataset it was fitted on. Instead, it predicts on the training set, so the above line gives the same result as:

predict(svm_fit, type="class") 

I have previously successfully used the same pipeline to predict on models fitted with glmnet(), so this problem appears to be specific to svm().

I tried double checking whether the training and validation sets have the same columns:

> sum(dfm_val@Dimnames$features != dfm_train@Dimnames$features) [1] 0 

Here is a minimal reproducible example:

library("textdata") library("quanteda") library("e1071") d <- dataset_ag_news() d_train <- d[1:1000,] d_val <- d[1001:2000,] dfm_train <- dfm(tokens(d_train$description)) y_train <- as.factor(d_train$class) dfm_val <- dfm(tokens(d_val$description)) dfm_val <- dfm_match(dfm_val, features = featnames(dfm_train)) svm_fit <- svm(x=dfm_train, y=y_train, kernel="radial", cost=10) predictions_val <- predict(svm_fit, newx=dfm_val, type="class") predictions_train <- predict(svm_fit, newx=dfm_val, type="class") table(predictions_val) table(predictions_train) 

1 Answer 1

0

It turns out I was using the wrong argument: it should have been newdata, not newx. So:

predict(svm_fit, newdata=dfm_val, type="class")

Gives the expected result of predictions on the validation data.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.