Goal: Summarize/count responses in the same row of an occured stimuli with dplyr.
Background: I got some excellent help in another topic: Loop through dataframe in R and measure time difference between two values
Now, I am working with the same/ similar dataset and my goal is to count the responses on perceived stimuli of users in the same row as where the stimuli occured. The dataset looks like this:
structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), StimuliA = c(1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), StimuliB = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), R2 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L ), R3 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R6 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), R7 = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("User", "StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6", "R7"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), spec = structure(list( cols = structure(list(User = structure(list(), class = c("collector_integer", "collector")), StimuliA = structure(list(), class = c("collector_integer", "collector")), StimuliB = structure(list(), class = c("collector_integer", "collector")), R2 = structure(list(), class = c("collector_integer", "collector")), R3 = structure(list(), class = c("collector_integer", "collector")), R4 = structure(list(), class = c("collector_integer", "collector")), R5 = structure(list(), class = c("collector_integer", "collector")), R6 = structure(list(), class = c("collector_integer", "collector")), R7 = structure(list(), class = c("collector_integer", "collector"))), .Names = c("User", "StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6", "R7")), default = structure(list(), class = c("collector_guess", "collector"))), .Names = c("cols", "default"), class = "col_spec")) Desired output: The desired output would be summarized list with all responses aggregate in the same row of the occured stimuli:
U StimuliA StimuliB R2 R3 R4 R5 R6 R7 1 1 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 1 0 1 1 2 0 0 1 0 1 0 1 0 0 0 0 0 0 2 1 0 3 0 0 0 0 0 2 0 1 1 0 0 0 2 0 In the sample, line 1 notes a stimuli for A and line 2 a 1 for R7. The outcome in the desired result is then a row with a 1 at StimuliA and a 1 at R7. Then it starts again because in the line 3 we have a new 1 for StimuliA.
In the end for every Stimuli there will be a summary of the following occured Responses (R2-R7) in the same row. The value of Stimuli (A or B) stays 1.
Question: I feel I can achieve this with the dplyr package, but my previous attempts have not concluded in much useful output. How would I structure the syntax with the dplyr commands or should I search for a solution in another direction? Would i mutate the same existing dataframe or create a new one?
Thanks for all the inputs and help!
aggregate(. ~ User + StimuliA + StimuliB, data=dat, sum)Indplyrsyntax, maybedat %>% group_by(., User, StimuliA, StimuliB) %>% summarize_all(sum).df %>% group_by(User) %>% mutate(Sta = cumsum(StimuliA), Stb = cumsum(StimuliB)) %>% group_by(User, Sta, Stb) %>% summarise(StA = sum(StimuliA), StB = sum(StimuliB), R2 = sum(R2), R3 = sum(R3), R4 = sum(R4), R5 = sum(R5), R6 = sum(R6), R7 = sum(R7)) %>% select(-Sta, -Stb)