I have a data set as I've shown below:
df <- tribble( ~id, ~price, ~type, ~number_of_book, "1", 10, "X", 3, "1", 2, "X", 1, "1", 5, "Y", 1, "2", 7, "X", 4, "2", 6, "X", 1, "2", 6, "Y", 2, "3", 2, "X", 4, "3", 8, "X", 2, "3", 1, "Y", 4, "3", 9, "Y", 5, ) Now, I want to answer this question: for each id and for each selected price group, what percentage of books is X and what percentage is Y? In other word, what is the distribution of the type of books for each id and price group?
To do this, first I need to have this data set as far as I visualize it in my mind:
agg_df <- tribble( ~type, ~id, ~less_than_two, ~two-five, ~five-six, ~more_than_six, "X", "1", 1, 0, 0, 3, "Y", "1", 0, 1, 0, 0, "X", "2", 0, 0, 1, 4, "Y", "2", 0, 0, 2, 2, "X", "3", 4, 0, 0, 2, "Y", "3", 4, 0, 0, 5, ) And then, this will be my desired data set:
desired_df <- tribble( ~type, ~id, ~less_than_two, ~three-five, ~five-six, ~more_than_six, "X", "1", "100%", "0%", "0%", "100%", "Y", "1", "0%", "100%", "0%", "0%", "X", "2", "0%", "0%", "33.3%", "66.6%", "Y", "2", "0%", "0%", "66.6%", "33.3%", "X", "3", "50%", "0%", "0%", "28.5%", "Y", "3", "50%", "0%", "0%", "71.4%", ) This desired data set shows me that when id is "3" and the price bin is more than six dollars there are two books in X type, but five books in Y type. So, here is the distribution: X(28.5%) and Y(71.4%).
Note: I had a similar question here, but now it is more complex manipulation that I could not manage to get it: How to manipulate (aggregate) the data in R?
I would appreciate if you could help me. Thanks in advance.
df %>% mutate(price_group = c("less_than_two", "three_five", "five_six", "more_than_six")[findInterval(price, c(2, 5, 6), left.open = TRUE) + 1]) %>% group_by(id, type, price_group) %>% summarise(number_of_book = sum(number_of_book)) %>% group_by(id, price_group) %>% mutate(n = number_of_book/sum(number_of_book) * 100) %>% select(-number_of_book) %>% pivot_wider(names_from = price_group, values_from = n)findIntervalwas already mentioned in the comments