1

I'm performing IDW interpolation using the idw R function from the gstat package. For the purpose of determining what combination of "idp" and "nmax" produces the lowest RMSE in leave-one-out-cross validation, how can I specify the "idp" and the "nmax" that are going to be use in the krige.cv function?

2 Answers 2

2

I've had the same problem and just found a solution: In the gstat-documentation on page 21 they use for defining idp the expression set = list(idp = 0.5). For "nmax" it's as usual:

meuse.idw_cv <- krige.cv(log(zinc)~1, meuse, nmax = 7, set = list(idp = .5)) 
2

Here is a worked out CV example using the meuse dataset:

# Load Libraries library(data.table) library(ggplot2) library(viridisLite) library(sp) library(gstat) # Settings seed=123L # Convert data to SpatialPointsDataFrame data(meuse) coordinates(meuse)= ~x+y # Log transform meuse$logZinc=log(meuse$zinc) # Save data.frame meuse.df=as.data.table(meuse) # View data spplot(meuse['logZinc'],,auto.key=T,key.space='right',scales=list(draw=T)) # Initialize cv variables nfold=10L pmax=8L nTimes=5L cv=expand.grid(B=1:nTimes,p=0:pmax,RMSE=NA_real_) setDT(cv) setorder(cv,B,p) dim(cv) # 180 xx fit.cv=vector(mode='list',length=nrow(cv)) # B-times k-fold CV i=1L set.seed(seed) for(i in 1:nrow(cv)){ fit.cv[[i]]=krige.cv(logZinc~1,locations=meuse,nfold=nfold,set=list(idp=cv[i,p])) head(fit.cv[[i]]) spplot(fit.cv[[i]]['var1.pred'],auto.key=T,key.space='right',scales=list(draw=T)) cv[i,RMSE:=sqrt(mean(fit.cv[[i]]$residual^2))] } # END Loop # cv # Aggregate across B times cvAvg=cv[,.(RMSE=mean(RMSE)), by=p] # cvAvg # Plot CV results par(mar=c(4.5,4.5,1,1)) plot(RMSE~p,data=cv,type='p') lines(RMSE~p,data=cvAvg,lwd=3,col=4) # Set optimum p which.min(cvAvg$RMSE) # 4 p=3 # Create gridded newdata data(meuse.grid) newdata=meuse.grid coordinates(newdata)= ~x+y # Fit IDW to newdata rm(fit) fit=idw(logZinc~1,locations=meuse,newdata=newdata,idp=p) # fit=krige(logZinc~1,locations=meuse,newdata=newdata,set=list(idp=p)) # Equivalent fit.df=as.data.table(fit) # Plot - Observed ggplot(meuse.df,aes(x,y,fill=zinc))+ geom_point(size=3,shape=24)+ coord_equal()+ scale_fill_viridis_c()+ labs(y='Y',x='X',title='Observed Data',fill='Zinc, mg/L')+ theme_classic()+ theme(legend.position='bottom') # Plot - Fitted ggplot(fit.df,aes(x,y,fill=var1.pred))+ geom_raster()+ geom_point(data=meuse.df,aes(y=coordinates(meuse)[,'y'],x=coordinates(meuse)[,'x']),inherit.aes=F,size=3,shape=24,fill='orange')+ # Observed coord_equal()+ scale_fill_viridis_c()+ labs(y='Y',x='X',title=sprintf('IDW Fitted (p=%d)',p),fill='Zinc, mg/L')+ theme_classic()+ theme(legend.position='bottom') 

The code applies 10-fold CV (the OP requested LOOCV) to select the ideal power value. The CV is averaged across 5 instances to account for potential re-sampling anomalies due to the random seed. Note that krige.cv creates folds using simple random sampling. A more customized re-sampling by strata approach may be more appropriate for your data.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.