4

I have a data frame like this:

 Open High Low Close Volume 1998-09-08 10:32:00 106.44 106.44 106.44 106.44 1 1998-09-08 10:33:00 106.42 106.42 106.35 106.35 628225 1998-09-08 10:34:00 106.31 106.38 106.31 106.38 135840 1998-09-08 10:35:00 106.35 106.35 106.32 106.34 170010 1998-09-08 10:36:00 106.35 106.36 106.35 106.36 309560 1998-09-08 10:37:00 106.44 106.50 106.44 106.50 115540 1998-09-08 10:38:00 106.49 106.53 106.49 106.52 427620 1998-09-08 10:39:00 106.53 106.54 106.52 106.53 321350 1998-09-08 10:40:00 106.55 106.60 106.54 106.54 317647 1998-09-08 10:41:00 106.56 106.63 106.56 106.63 233901 

I need to change Open in a parallel processing. I wrote a function like this :

parTest <- function(x){ foreach(i = 1:nrow(x)) %dopar% { x[i,1] <- i } return(x) } 

but when I call this function nothing change and it return unchanged data frame.

zz <- parTest (x) zz 

When I use simple for loop it works but foreach do not work !

I also used appropriate package and cores setting as well:

library(foreach) library(doParallel) cl <- makeCluster(4) registerDoParallel(cl) 

Thanks for your help.

1 Answer 1

10

foreach will take the return value from the code block and somehow combine it. In your case, since you do not specify the .combine argument, it is returning each instance within a list. (The first paragraph of help(foreach) says this.)

Okay, so what is happening with each instantiation of your code block? It is taking a view of the data.frame from when the call was started (meaning row 2 does not see the changed data.frame from row 1, etc), updating this data.frame, and then returning "something".

This "something" is not what you think it should be. To see this, try manually updating the data.frame with something like (x[1,1] <- 1); this is showing what the return value from the assignment is the value "1", not the contents of x. In other words, the return value from an assignment is the value assigned, not the whole variable to which it was assigned.

So, in your case, x[i,1] <- i is silently return i, so the returned value from the child processes of foreach (which you are not capturing) is a list of 1:nrow(x), useless to you. If you assigned the result from foreach and explicitly returned it from the foreach code block, you would see this.

What I think you want is for the code block to return the specific row that has been adjusted, and then combine them into a data.frame at the end. Note, if you return the whole data.frame, then the return from foreach will be a list of data.frames, not (I think) what you want.

There are many ways to do this, I'll show three. This first one will work just fine, and it's a little more literal in how you are managing the data.frame.

parTest <- function(x) { ret <- foreach(i = 1:nrow(x)) %dopar% { x[i,1] <- i x[i,,drop=FALSE] } do.call('rbind', ret) } 

If your data.frame is rather large, realize you are making a lot of copies of this data.frame. If you only need one row (I'm assuming your example is contrived as a simple MWE), then this is unnecessary. You can simplify this a little with:

parTest <- function(x) { foreach(i = 1:nrow(x), .combine=rbind) %dopar% { x[i,1] <- i x[i,,drop=FALSE] } } 

Another technique, using the iterators package:

library(iterators) parTest <- function(x) { foreach(df = iter(x, by='row'), .combine=rbind) %dopar% { df[,1] <- 1 df } } 

This latter technique seems to me to be a little more readable. And, if you really only care about a single row at a time, it may perform faster than the other.

BTW: I'm assuming that you are really looking for the resulting data.frame, not specifically for the side-effect of changing the data.frame in the current environment. When dealing with parallel stuff using %dopar%, realize that the child processes do not get to see or work with the actual calling environment.

Sign up to request clarification or add additional context in comments.

3 Comments

Your last example should be changed to use df. It shouldn't use either x or i.
Steve, Thanks a lot. But i realized a new problem. my data frame is a zoo object with POSIXct index. I need to change index of zoo object in foreach but this function return matrix. I need to input zoo and return zoo.
@Steve, what happened when you coerced with as.zoo?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.