Linear regresion at each raster pixel's value to predict future value (in R language)

Question

I want to predict future value with existing time series raster. For simplicity, I want use linear regression at each raster pixel's value to predict future value

I have successfull run this code. I have read it from:

https://stackoverflow.com/questions/47435206/cant-calculate-pixel-wise-regression-in-r-on-raster-stack-with-fun?rq=1

library(raster) # Example data r <- raster(nrow=15, ncol=10) set.seed(0) # Now I make 6 raster (1 raster/months), then assign each pixel's value randomly s <- stack(lapply(1:6, function(i) setValues(r, rnorm(ncell(r), i, 3)))) names(s) <- paste0('Month', c(1,2,3,4,5,6)) # Extract each pixel values x <- values(s) # Model with linreg m <- lm(Month6 ~ ., data=data.frame(x)) # Prediction raster p <- predict(s, m)

If you run that code, p will be a raster. But, I still confused. How to make raster in 'Month8' based on 6 previous raster?

What I mean is, each pixels has different linreg equations (where X=Month1, ..., Months6). If I input X=Month8, I will have 150 cells of Y for 8th Month that represent in each pixel of raster.

What I have done

# Lets try make a data frame for clear insight in my data x <- values(s) DF <- data.frame(x) # Make X as month, and y is target. library(data.table) DF_T <- transpose(DF) Month <- seq(1,nrow(DF_T)) DF_T <- cbind(Month, DF_T) # Make prediction for first pixel V1_lr <- lm(V1 ~ Month, data=DF_T) # prediction for 8th Months in a pixel V1_p <- predict(V1_lr, data.frame(Month=8)) V1_p

This is just one pixel. I want the entire raster

What's DF in your second block of code? Your first model predicts Month6 as a linear combination of the other months, so there's no way it can predict Month8 - the model is actually independent of the ordering of the months. It looks like you are trying to do that in your second model, ie fit Month as a linear function of time from 1 to 6 for each pixel? That would be 150 different linear models? — Spacedman
– Spacedman, Commented Jun 25, 2020 at 11:16
Thanks for your suggestion. I forget to explain what DF is. Let me edit my question. BTW, Is it strange to have 150 different linear models where 1 models for each pixel?. I m sorry I don't know another ways to get prediction for Month8 — Ichsan
– Ichsan, Commented Jun 25, 2020 at 12:04
You can have 150 models if you want. Do you want that? Do you see each pixel's time-series of 6 months as independent? If so, then okay, otherwise there are other better models that take correlation into account. We can help you fit one model per pixel but I don't see the point if its going to produce a useless answer. — Spacedman
– Spacedman, Commented Jun 25, 2020 at 12:52
You are welcome. I want make 150 models (1 model/pixel). Sorry if I OOT, If you want, please tell me a better algorithm to predict 8th month instead using 150 models (which take correlation into account). — Ichsan
– Ichsan, Commented Jun 26, 2020 at 0:43

Spacedman · Accepted Answer · 2020-06-26 08:49:03Z

Starting with your raster stack s:

> s class : RasterStack dimensions : 15, 10, 150, 6 (nrow, ncol, ncell, nlayers)

I'll show how to fit and predict in various ways. I'm going to try to spell out every stage and use data structures that make it clear what's going on - some of these steps can be made quicker in various ways but I'm aiming for clarity here.

First convert your stack to a data frame and then to a matrix.

> sdata = as.matrix(as.data.frame(s)) > head(sdata) Month1 Month2 Month3 Month4 Month5 Month6 [1,] 4.78886285 6.410771 2.6266950 1.0362426 -0.06340309 5.734149 [2,] 0.02129992 -1.934262 7.4002338 6.1854661 6.94293798 9.440994 [3,] 4.98939779 1.710425 5.0217860 1.3459453 6.34638268 6.260438 [4,] 4.81728796 9.109160 8.8692758 -0.6153243 8.07890647 5.114985 [5,] 2.24392430 4.671879 2.1928770 1.0833196 8.22493467 7.536999 [6,] -3.61985013 1.243451 -0.7336546 -1.1544086 6.37492884 6.849941

Next make a similar matrix containing the time points for each element in that matrix:

> t = matrix(1:ncol(sdata), nrow=nrow(sdata), ncol=ncol(sdata), byrow=TRUE) > head(t) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 2 3 4 5 6 [2,] 1 2 3 4 5 6 [3,] 1 2 3 4 5 6 [4,] 1 2 3 4 5 6 [5,] 1 2 3 4 5 6 [6,] 1 2 3 4 5 6

and then make another similar matrix containing the cell number:

> cell = matrix(1:nrow(sdata), nrow=nrow(sdata), ncol=ncol(sdata)) > head(cell) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 1 1 1 1 1 [2,] 2 2 2 2 2 2 [3,] 3 3 3 3 3 3 [4,] 4 4 4 4 4 4 [5,] 5 5 5 5 5 5 [6,] 6 6 6 6 6 6

Now make a full data frame with one row per measurement. The cell number isn't really numeric data, so we'll convert it to a factor by adding "C" to the cell number. (I'm also using a sprintf string so there's enough zeroes to keep the sort order, otherwise C100 appears before C2 and that messes up the order when packing it back into the raster)

> d = data.frame(Month=c(sdata), t=c(t), cell=factor(sprintf("C%04d",c(cell)))) > head(d) Month t cell 1 4.78886285 1 C0001 2 0.02129992 1 C0002 3 4.98939779 1 C0003 4 4.81728796 1 C0004 5 2.24392430 1 C0005 6 -3.61985013 1 C0006

Now I can fit 150 models by splitting the data frame on cell and using lapply to get a list of 150 models:

> m150 = lapply(split(d,d$cell),function(s){lm(Month~t, data=s)})

Here's the 56th:

> m150[[56]] Call: lm(formula = Month ~ t, data = s) Coefficients: (Intercept) t 4.40613 -0.04868

Then you can use sapply to predict over those 150 models for month 8:

> sapply(m150, function(m){predict(m, newdata=data.frame(t=8))}) C0001.1 C0002.1 C0003.1 C0004.1 C0005.1 C0006.1 C0007.1 1.3282355 13.9995073 6.4117065 4.4702597 8.9561532 10.1491687 12.5426525 C0008.1 C0009.1 C0010.1 C0011.1 C0012.1 C0013.1 C0014.1 8.1974870 9.7911019 1.7776975 3.9731137 11.5130154 12.4518928 8.6699895

and that is a vector you can plug into a new raster (here r is your r above):

> p8 = sapply(m150, function(m){predict(m, newdata=data.frame(t=8))}) > rp8 = setValues(r, p8) > plot(rp8)

Now that is 150 totally independent models. If you really believe your pixels are all independent, (and this can be tested with plots and diagnostics) then that's that done.

Alternatively:

If you think the pixels all have the same slope in time and the same variance about the mean, but with different levels (intercepts), you can do it in a single model with:

> mOffset = lm(Month~t+cell-1,data=d)

which gives:

> mOffset Call: lm(formula = Month ~ t + cell - 1, data = d) Coefficients: t cellC0001 cellC0002 cellC0003 cellC0004 cellC0005 cellC0006 0.994784 -0.059526 1.194366 0.797317 2.413970 0.843910 -1.988344 cellC0007 cellC0008 cellC0009 cellC0010 cellC0011 cellC0012 cellC0013 -0.166818 -0.327839 0.466276 0.116303 -0.671000 -0.186959 0.775498 cellC0014 cellC0015 cellC0016 cellC0017 cellC0018 cellC0019 cellC0020 1.258440 0.479706 1.774456 -0.095878 -1.591700 0.190463 -1.819975 [etc]

Now all models have the same slope (the t coefficient) but and so are parallel lines but start at different heights given by the individual cell coefficients. A predict with the cell factors and t=8 will work with these coefficients.

If you want each pixel to have its own level (intercept) and slope, then fit interaction terms:

> mFree = lm(Month~t*cell-1,data=d)

This model has lots of coefficients:

> mFree Call: lm(formula = Month ~ t * cell - 1, data = d) Coefficients: t cellC0001 cellC0002 cellC0003 cellC0004 cellC0005 -0.465330 5.050874 -2.575418 2.620339 7.004403 0.724158 [etc]

those are the overall slope (t) and the individual intercepts. Then...

 t:cellC0002 t:cellC0003 t:cellC0004 t:cellC0005 t:cellC0006 2.537196 0.939251 0.148562 1.494329 2.388834

Those are the gradients (slopes) for each cell's fit. There's not one for cellC0001 because (I think) it has the t coefficient slope and all the other are relative to it. Anyway, you can again predict with cell values and t=8 and get a prediction over the raster.

The only difference between that model and 150 independent models is (I think) that the one model assumes the variance is the same across all pixels, but the 150 models will have 150 separate variances.

But all this independence is probably clearly untrue when you plot your data - there's likely to be correlations in space and time, so you need a method that considers both in order to use the correlations to your advantage. The first thing you should probably investigate is space-time kriging. I would only use these independent models to compare with a method like kriging in order to show the improvements made and the mistakes that not considering spatial and temporal correlation can bring.

Wow, your answer really fabulous. Thank you sir.

Ichsan
– Ichsan

2020-06-26 09:47:12 +00:00
Commented Jun 26, 2020 at 9:47 — Ichsan
– Ichsan, Commented Jun 26, 2020 at 9:47

Stack Exchange Network

Linear regresion at each raster pixel's value to predict future value (in R language)

1 Answer 1

Hot Network Questions

Linear regresion at each raster pixel's value to predict future value (in R language)

1 Answer 1

Related

Hot Network Questions