1

I have two zoo objects of unequal size (inflow and outflow). Outflow values lag inflow by some unknown amount of time. I would like to determine the correlation between the smaller outflow object (6 rows) and the larger inflow object (many rows), 6 rows at a time, starting at the inflow beginning time, incrementing by one, and find the period of highest correlation. I think this is called a "sliding window" comparison. I've tried many different ways, unsuccessfully, to use the "rollapply" function to do this but get an error because the size difference between the two objects. Hope someone will understand what I'm trying to ask and can offer a solution. Below is a portion of my data and an example of how I have tried to use the rollapply function.

> inflow (03/14/13 07:00:00) 11.20451 (03/14/13 07:02:00) 11.03810 (03/14/13 07:04:00) 11.03012 (03/14/13 07:06:00) 11.09517 (03/14/13 07:08:00) 10.90878 (03/14/13 07:10:00) 11.23285 (03/14/13 07:12:00) 11.14890 (03/14/13 07:14:00) 11.17002 (03/14/13 07:16:00) 11.38342 (03/14/13 07:18:00) 11.70833 (03/14/13 07:20:00) 11.93776 (03/14/13 07:22:00) 12.17832 (03/14/13 07:24:00) 12.39648 (03/14/13 07:26:00) 12.24020 (03/14/13 07:28:00) 12.18667 (03/14/13 07:30:00) 12.45410 (03/14/13 07:32:00) 12.50012 (03/14/13 07:34:00) 12.54736 (03/14/13 07:36:00) 13.05010 (03/14/13 07:38:00) 13.06495 (03/14/13 07:40:00) 13.14084 (03/14/13 07:42:00) 12.92427 (03/14/13 07:44:00) 12.98699 (03/14/13 07:46:00) 12.84172 (03/14/13 07:48:00) 12.87263 (03/14/13 07:50:00) 12.51861 (03/14/13 07:52:00) 12.98763 (03/14/13 07:54:00) 12.31124 (03/14/13 07:56:00) 12.33696 (03/14/13 07:58:00) 12.49630 (03/14/13 08:00:00) 12.40648 (03/14/13 08:02:00) 11.87164 (03/14/13 08:04:00) 12.76058 (03/14/13 08:06:00) 12.50016 (03/14/13 08:08:00) 12.68696 (03/14/13 08:10:00) 12.88447 (03/14/13 08:12:00) 12.33336 (03/14/13 08:14:00) 13.06670 (03/14/13 08:16:00) 13.15070 (03/14/13 08:18:00) 12.82410 (03/14/13 08:20:00) 12.91953 

outflow2

(03/14/13 07:54:00) (03/14/13 07:56:00) (03/14/13 07:58:00) (03/14/13 08:00:00) (03/14/13 08:02:00) (03/14/13 08:04:00) 11.51110 11.11878 11.05775 11.11303 10.95417 10.98035 

Use:

> test <- rollapply(inflow, width = 6, by = 1, FUN = cor(inflow, outflow)) Error in cor(inflow, outflow) : incompatible dimensions 
7
  • I think you need to define what you mean by "correlation" between a single value and multiple values. And if the outflow variable is measured at irregular times, then how do you expect the inflow values to be partitioned. Commented Aug 22, 2013 at 18:22
  • I want to correlate the outflow values with the inflow values, as groups of 6. Sorry, I'm not able to describe this better. Here's a text example" Commented Aug 22, 2013 at 18:43
  • I guess we need to get concrete: What is the "correlation" of value 3 with the sequence 1:6? Or are you really asking to have the correlation of the aggregated means of the longer vector with the shorter vector? Commented Aug 22, 2013 at 18:46
  • ab| a a ab| cor1 ab| a ab| ab|cor2 ab| a ab| ab|cor3 a a ab| Etc. Commented Aug 22, 2013 at 18:50
  • Another way of explaining this… get the correlation between the smaller object (outflow, 6 rows) and the first six rows of the larger object (inflow). Increment the inflow start position by one, select 6 rows and get correlation with outflow. Continue comparing inflow to outflow, by six rows, incrementing inflow by one each time, to the end of inflow. Thanks for trying to help. Commented Aug 22, 2013 at 19:10

1 Answer 1

1

Try this:

rollapply(inflow, 6, cor, y = outflow) 

This computes

value <- c( cor(inflow[1:6], outflow), cor(inflow[2:7], outflow), ...etc... ) ix <- seq(3, length = length(inflow) - 6 + 1) zoo(value, time(inflow)[ix]) 

Depending on what you want to get out you may need the align= argument too.

Sign up to request clarification or add additional context in comments.

5 Comments

I have added a description of the output.
I don't think that is what is being expected. outflow[2] is being measured during the same interval as inflow[7:12]. Furthermore when I try using cor(1:6, 3) I get a dimension error. (Actually the English description by the OP does not agree with that data.)
The poster's code and the description by the poster in the comments both seem to describe what I have shown.
The first element of outflow has an index of (03/14/13 07:54:00). The inflow-object's matching index element is at row 28. The next element in outflow is at (03/14/13 07:56:00) and the inflow row that matches that is number 30.
The times are not relevant. The desired result is just a matter of calculating cor of each successive 6 inflows with the outflow.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.