I'm trying to reproduce a gbm model which was estimated without a set.seed value. To do so I need to determine what seed was used, which I can figure out based on one of the summary metrics from the estimated model (as shown below).
require(MatchIt) require(gbm) data("lalonde") i <- 1 while(!(tmp$rel.inf[1] == 82.3429390)){ gps <- gbm(treat ~ age + educ + nodegree + re74 + re75, distribution = "bernoulli", data = lalonde, n.trees = 100, interaction.depth = 4, train.fraction = 0.8, shrinkage=0.0005, set.seed(i)) tmp <- summary(gps, plotit=F) cat(i,"\n") i <- i + 1 } I think it would be very helpful both for this specific use case and for general future reference to know of any more efficient way of carrying this out. A multicore solution might be a good way to go; I'm researching that myself now. Or perhaps there's a way to improve it by using apply?
"82.3429390"in a publication, then it's likely that the true experimental value was between82.34293895and82.34293905. \$\endgroup\$set.seed(i)as an argument togbm?set.seedreturnsNULLso you are essentially passingNULLas an unnamed argument togbmand potentially messing up with it. Should you instead be runningset.seed(i)as its own statement, before callinggbm? \$\endgroup\$set.seed(123); x = runif(1); print(x)gives me[1] 0.2875775. But now if I runset.seed(123); runif(1) == 0.2875775it returnsFALSE. What I am saying is that your condition for exiting the loop should bewhile(abs(tmp$rel.inf[1] - 82.3429390) > eps)for some smalleps, probably 5e-8. \$\endgroup\$set.seed()is a valid argument togbm. I got it from the manual page and when I run it I get the same result every time (and it's in the expected range) but when I don't it varies slightly every time. I tried setting the seed at the top of my script but that didn't work for this. \$\endgroup\$