2

My data comes from a database which, depending on when I run my SQL query could contain different values for POS from one week to the other.

Not knowing which values will be in a variable makes it very hard to automate the creation of a report.

My data looks as follows:

sample <- data.frame(DRUG = c("A","A","B"),POS = c("Hospital","Physician","Home"),GROSS_COST = c(50,100,60), NET_COST = c(45,80,40)) 

enter image description here

I need to pivot this data frame wider so that there's a column for each pos by cost (gross & net).

This can be easily achieve using pivot_wider:

x <- sample %>% pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST)) 

enter image description here

Objective I would like to be able to keep the columns for each POS together i.e. the GROSS_COST_Hospital and NET_COST_Hospital would be side by side, similar for all other POS columns.

Is there an elegant way to group columns using string matching?

4 Answers 4

3

Unfortunately, I don't think there is a direct solution to this (yet!). See https://github.com/tidyverse/tidyr/issues/839 .

For now you can get the data in long format so you can control their ordering the way you want.

library(tidyr) sample %>% pivot_longer(cols = c(GROSS_COST, NET_COST)) %>% pivot_wider(names_from = c(name, POS), values_from = value) # DRUG GROSS_COST_Hosp… NET_COST_Hospit… GROSS_COST_Phys… NET_COST_Physic… # <chr> <dbl> <dbl> <dbl> <dbl> #1 A 50 45 100 80 #2 B NA NA NA NA # … with 2 more variables: GROSS_COST_Home <dbl>, NET_COST_Home <dbl> 
Sign up to request clarification or add additional context in comments.

Comments

2

We can do an ordering on the select step

library(dplyr) library(tidyr) library(stringr) sample %>% pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST)) %>% select(DRUG, names(.)[-1][order(str_extract(names(.)[-1], '[^_]+$'))]) # A tibble: 2 x 7 # DRUG GROSS_COST_Home NET_COST_Home GROSS_COST_Hospital NET_COST_Hospital GROSS_COST_Physician NET_COST_Physician # <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #1 A NA NA 50 45 100 80 #2 B 60 40 NA NA NA NA 

Comments

1

A data.table option using dcast + melt

> dcast(melt(setDT(sample), id.vars = c("DRUG", "POS")), DRUG ~ variable + POS) DRUG GROSS_COST_Home GROSS_COST_Hospital GROSS_COST_Physician NET_COST_Home 1: A NA 50 100 NA 2: B 60 NA NA 40 NET_COST_Hospital NET_COST_Physician 1: 45 80 2: NA NA 

2 Comments

There is also recast from reshape2 recast(sample, id.var = c('DRUG', 'POS'), DRUG ~ variable + POS)
@akrun Thanks a lot! Learnt a new technique from you :)
0

With the advent of tidyr 1.2.0, the issue is finally resolved, you may do this directly using names_vary argument

library(tidyr) sample <- data.frame(DRUG = c("A","A","B"),POS = c("Hospital","Physician","Home"),GROSS_COST = c(50,100,60), NET_COST = c(45,80,40)) sample %>% pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST), names_vary = 'slowest') #> # A tibble: 2 x 7 #> DRUG GROSS_COST_Hospital NET_COST_Hospital GROSS_COST_Physi~ NET_COST_Physic~ #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 A 50 45 100 80 #> 2 B NA NA NA NA #> # ... with 2 more variables: GROSS_COST_Home <dbl>, NET_COST_Home <dbl> 

Created on 2022-02-18 by the reprex package (v2.0.1)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.