3

The dataset named crass looks like -

> dput(crass) structure(list(WT_TRADE_PRICE = c(3801, 3801, 3801, 3797, 3797, 3796.2125, 3800, 3797, 3795.09523809524, 3794, 3793, 3793, 3793.8, 3794.72, 3793.02777777778, 3789, 3790, 3788, 3788, 3788), min = c(3801, 3801, 3801, 3797, 3797, 3795, 3800, 3797, 3794, 3794, 3793, 3793, 3793, 3794, 3790, 3789, 3790, 3788, 3788, 3788), max = c(3801, 3801, 3801, 3797, 3797, 3800, 3800, 3797, 3797, 3794, 3793, 3793, 3794, 3797, 3794, 3789, 3790, 3788, 3788, 3788), Bid = c(3801, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Ask = c(3802, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -20L ), class = c("tbl_df", "tbl", "data.frame")) 
# A tibble: 20 x 5 WT_TRADE_PRICE min max Bid Ask <dbl> <dbl> <dbl> <dbl> <dbl> 1 3801 3801 3801 3801 3802 2 3801 3801 3801 NA NA 3 3801 3801 3801 NA NA 4 3797 3797 3797 NA NA 5 3797 3797 3797 NA NA 6 3796. 3795 3800 NA NA 7 3800 3800 3800 NA NA 8 3797 3797 3797 NA NA 9 3795. 3794 3797 NA NA 10 3794 3794 3794 NA NA 11 3793 3793 3793 NA NA 12 3793 3793 3793 NA NA 13 3794. 3793 3794 NA NA 14 3795. 3794 3797 NA NA 15 3793. 3790 3794 NA NA 16 3789 3789 3789 NA NA 17 3790 3790 3790 NA NA 18 3788 3788 3788 NA NA 19 3788 3788 3788 NA NA 20 3788 3788 3788 NA NA 

As can be seen, two variables ask & bid have only initial values which need to be filled iteratively using the following logic.

  • If WT_TRDAE_PRICE is >= than previous value of Ask OR WT_TRADE_PRICE will be checked whether > than mean of previous bid and ask - then current ask will be set equal to current row max variable and bid will be set equal to previous bid value.
  • Else , current ask will be set to previous ask value & bid to max.

Pseudo code -

if(WT_TRADE_PRICE >= L(Ask) | WT_TRADE_PRICE > (L(Bid)+L(Ask))/2) { Bid = L(Bid), Ask = max } else { Bid = min, Ask = L(Ask) } 

Final output -

SNo. WT_TRADE_PRICE min max Bid Ask
1 3801 3801 3801 3801 3802
2 3801 3801 3801 3801 3802
3 3801 3801 3801 3801 3802
4 3797 3797 3797 3797 3802
5 3797 3797 3797 3797 3802
6 3796. 3795 3800 3795 3802
7 3800 3800 3800 3795 3800
8 3797 3797 3797 3797 3800
9 3795. 3794 3797 3794 3800
10 3794 3794 3794 3794 3800
11 3793 3793 3793 3793 3800
12 3793 3793 3793 3793 3800
13 3794. 3793 3794 3793 3800
14 3795. 3794 3797 3793 3797
15 3793. 3790 3794 3790 3797
16 3789 3789 3789 3789 3797
17 3790 3790 3790 3790 3797
18 3788 3788 3788 3788 3797
19 3788 3788 3788 3788 3797
20 3788 3788 3788 3788 3797
5
  • I think your input dput and table input values are different. Can you please correct Commented Mar 24, 2021 at 19:04
  • crass %>% mutate(Bid = min, Prev_Bid = lag(Bid)) this gives the expected output. May be a better example would make it more clear Commented Mar 24, 2021 at 19:06
  • Simplified your condition Commented Apr 15, 2021 at 9:25
  • @akrun, In your leisure time please see this question. Since accumulate2 can handle at most two variables, I converted three variables to one row tibble each and used accumulate to generate two simultaneous variables. But I am sure you can provide some alternate strategy in these cases where more than 2 variables need to be passed into accumulate since we do not have paccumulate kinda thing Commented Apr 15, 2021 at 9:51
  • @AnilGoyal I would use for loop as it have greater flexibility Commented Apr 15, 2021 at 16:37

2 Answers 2

4

In this case, we require to generate two output columns simultaneously; and iteratively with the help of three inputs. So purrr::accumulate normally works on one output based on one input, whereas purrr::accumulate2() works on 2 inputs for again one output. So instead, my strategy for accumulate is as under:-

  • Re-arrange three input columns into row-wise tibbles each, so that each of three columns inputs are now a single column. For this, I generated a dummy column id so that each row is converted to a tibble each.
  • I used tidyr::nest_by() for this
  • again for output I generated a tibble instead of a vector through accumulate.
  • Lastly I converted both tibbles back to their original shapes by using tidyr::unnnest_wider()
crass[1:3] %>% nest_by(id = row_number()) %>% ungroup() %>% mutate(new = accumulate(data, .init = list(Bid = 3801, Ask = 3802), ~ tibble(Bid = ifelse(.y$WT_TRADE_PRICE >= min(.x$Ask, (.x$Ask + .x$Bid)/2), .x$Bid, .y$min), Ask = ifelse(.y$WT_TRADE_PRICE >= min(.x$Ask, (.x$Ask + .x$Bid)/2), .y$max, .x$Ask)) )[-1]) %>% unnest_wider(data) %>% unnest_wider(new) # A tibble: 20 x 6 id WT_TRADE_PRICE min max Bid Ask <int> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 3801 3801 3801 3801 3802 2 2 3801 3801 3801 3801 3802 3 3 3801 3801 3801 3801 3802 4 4 3797 3797 3797 3797 3802 5 5 3797 3797 3797 3797 3802 6 6 3796. 3795 3800 3795 3802 7 7 3800 3800 3800 3795 3800 8 8 3797 3797 3797 3797 3800 9 9 3795. 3794 3797 3794 3800 10 10 3794 3794 3794 3794 3800 11 11 3793 3793 3793 3793 3800 12 12 3793 3793 3793 3793 3800 13 13 3794. 3793 3794 3793 3800 14 14 3795. 3794 3797 3794 3800 15 15 3793. 3790 3794 3790 3800 16 16 3789 3789 3789 3789 3800 17 17 3790 3790 3790 3790 3800 18 18 3788 3788 3788 3788 3800 19 19 3788 3788 3788 3788 3800 20 20 3788 3788 3788 3788 3800 

Earlier Revised for loop

Syntax

 for(i in 2:nrow(crass)){ if(crass[i, 1] >= min(crass[i-1, 5], (crass[i-1, 4] + crass[i-1, 5])/2)){ crass[i, 5] <- crass[i, 3] crass[i, 4] <- crass[i-1, 4] } else { crass[i, 4] <- crass[i, 2] crass[i, 5] <- crass[i-1, 5] } } crass # A tibble: 20 x 5 WT_TRADE_PRICE min max Bid Ask <dbl> <dbl> <dbl> <dbl> <dbl> 1 3801 3801 3801 3801 3802 2 3801 3801 3801 3801 3802 3 3801 3801 3801 3801 3802 4 3797 3797 3797 3797 3802 5 3797 3797 3797 3797 3802 6 3796. 3795 3800 3795 3802 7 3800 3800 3800 3795 3800 8 3797 3797 3797 3797 3800 9 3795. 3794 3797 3794 3800 10 3794 3794 3794 3794 3800 11 3793 3793 3793 3793 3800 12 3793 3793 3793 3793 3800 13 3794. 3793 3794 3793 3800 14 3795. 3794 3797 3794 3800 15 3793. 3790 3794 3790 3800 16 3789 3789 3789 3789 3800 17 3790 3790 3790 3790 3800 18 3788 3788 3788 3788 3800 19 3788 3788 3788 3788 3800 20 3788 3788 3788 3788 3800 

crass before running of for loop

# A tibble: 20 x 5 WT_TRADE_PRICE min max Bid Ask <dbl> <dbl> <dbl> <dbl> <dbl> 1 3801 3801 3801 3801 3802 2 3801 3801 3801 NA NA 3 3801 3801 3801 NA NA 4 3797 3797 3797 NA NA 5 3797 3797 3797 NA NA 6 3796. 3795 3800 NA NA 7 3800 3800 3800 NA NA 8 3797 3797 3797 NA NA 9 3795. 3794 3797 NA NA 10 3794 3794 3794 NA NA 11 3793 3793 3793 NA NA 12 3793 3793 3793 NA NA 13 3794. 3793 3794 NA NA 14 3795. 3794 3797 NA NA 15 3793. 3790 3794 NA NA 16 3789 3789 3789 NA NA 17 3790 3790 3790 NA NA 18 3788 3788 3788 NA NA 19 3788 3788 3788 NA NA 20 3788 3788 3788 NA NA 
Sign up to request clarification or add additional context in comments.

4 Comments

On row 13, wt_trade_price = 3794 which is greater than L(Bid) ~ 3793. also wt_trade_price <(L(bid) + L(ask))/2 = 3796.5. So it works for the fourth conditional.
Minor correction done in loop. Still I can see some problem in row 14. WT (3795) is > L(Bid) i.e. 3793 and lower than mean of 3973 & 3800 (3796.5), then row-14 Bid should be 3794 & ask should be 3800. Please re-check
This process is fine but this is too slow - the dataset has 3 million rows. Any vectorized solutions?
@shoonya, have you checked the vectorised solution
3

You can also use the following solution in base R. For this solution I used row numbers as the primary vector to be used in Reduce. We can use this technique where we have more than one output variables. It should be noted that for differentiating between the previous value of Bid and Ask I used double brackets to subset the required value.

cbind(crass[1:3], do.call(rbind, Reduce(function(x, y) { data.frame(Bid = ifelse(crass$WT_TRADE_PRICE[y] >= min(x[["Ask"]], (x[["Ask"]] + x[["Bid"]])/2), x[["Bid"]], crass$min[y]), Ask = ifelse(crass$WT_TRADE_PRICE[y] >= min(x[["Ask"]], (x[["Ask"]] + x[["Bid"]])/2), crass$max[y], x[["Ask"]])) }, init = data.frame(Bid = crass$Bid[1], Ask = crass$Ask[1]), seq_len(nrow(crass))[-1], accumulate = TRUE))) 

Result

 WT_TRADE_PRICE min max Bid Ask 1 3801.000 3801 3801 3801 3802 2 3801.000 3801 3801 3801 3802 3 3801.000 3801 3801 3801 3802 4 3797.000 3797 3797 3797 3802 5 3797.000 3797 3797 3797 3802 6 3796.213 3795 3800 3795 3802 7 3800.000 3800 3800 3795 3800 8 3797.000 3797 3797 3797 3800 9 3795.095 3794 3797 3794 3800 10 3794.000 3794 3794 3794 3800 11 3793.000 3793 3793 3793 3800 12 3793.000 3793 3793 3793 3800 13 3793.800 3793 3794 3793 3800 14 3794.720 3794 3797 3794 3800 15 3793.028 3790 3794 3790 3800 16 3789.000 3789 3789 3789 3800 17 3790.000 3790 3790 3790 3800 18 3788.000 3788 3788 3788 3800 19 3788.000 3788 3788 3788 3800 20 3788.000 3788 3788 3788 3800 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.