This happens because gbm by default makes a prediction on the link function scale (which is log for distribution = "poisson"). This is governed by the type parameter of gbm::predict.gbm (see the help page of that function). Unfortunately mlr does not offer to change this parameter by default (it was reported in the mlr bugtracker). A workaround for now is to add this parameter by hand:
lrn <- makeLearner("regr.gbm", distribution = "poisson") lrn$par.set <- c(lrn$par.set, makeParamSet( makeDiscreteLearnerParam("type", c("link", "response"), default = "link", when = "predict", tunable = FALSE))) lrn <- setHyperPars(lrn, type = "response") # show that it works: counttask <- makeRegrTask("counttask", getTaskData(pid.task), target = "pregnant") pred <- predict(train(lrn, counttask), counttask) pred
Be aware that when tuning parameters on count data, the default regression measure (mean of squared errors) will possibly overemphasize fit for datapoints with large count values. The squared error for predicting "10" instead of "1" is the same as the error of predicting "1010" instead of "1001", but depending on your objective you probably want to put more weight on the first error in this example.
A possible solution is to use (normalized) mean Poisson log likelihood as measure:
poisllmeasure = makeMeasure( id = "poissonllnorm", minimize = FALSE, best = 0, worst = -Inf, properties = "regr", name = "Mean Poisson Log Likelihood", note = "For count data. Normalized to 0 for perfect fit.", fun = function(task, model, pred, feats, extra.args) { mean(dpois(pred$data$truth, pred$data$response, log = TRUE) - dpois(pred$data$truth, pred$data$truth, log = TRUE)) }) # example performance(pred, poisllmeasure)
This measure can be used for tuning by giving it to the measures parameter in tuneParams(). (Note you will have to give it in a list: tuneParams(... measures = list(poisllmeasure) ...))