0

I created a list of dataframes. I need to loop over them, filter what I need and save as a single file. However, I need to know from each file each values comes from.

Each dataframe has a name like Plastic Chair 1111, Wooden Chair 3950, Table 6909, etc... and are saved inside a list named "listed" that contains the following structure:

listed[1] Material_ID ABC Key.Figure W01 W02 W03 46548970 A Actuals 1048 564 548 46548970 A Forecasted 848 500 590 18969856 A Actuals 358 1500 900 18969856 A Forecasted 460 1602 1000 listed[2] Material_ID ABC Key.Figure W01 W02 W03 24564897 A Actuals 1258 444 798 26548970 A Forecasted 1345 500 850 34879856 A Actuals 985 1020 980 15486856 A Forecasted 846 1064 1100 

What I would like to obtain is:

 Group name Group Code Material_ID ABC Key.Figure W01 W02 W03 Plastic Chair 1111 46548970 A Actuals 1048 564 548 Plastic Chair 1111 18969856 A Actuals 358 1500 900 Wooden Chair 3950 24564897 A Actuals 1258 444 798 Wooden Chair 3950 34879856 A Actuals 985 1020 980 

Is it possible to create these two columns on the left by using the dataframes name?

Thank you very much for the help!

Here is my code if you need to better understand the situation.

library(openxlsx) library(dplyr) library(purrr) # read the data filename = 'Dataset.xlsx' wb <- loadWorkbook(filename) # get a list of the spreadshits in the excel file sheetNames <- sheets(wb) sheetNames <- make_names(sheetNames) # create an empty list listed <- list() # assign which spreadshit as a dataframe inside a list for(i in 1:length(sheetNames)) { listed[[i]] <- assign(sheetNames[i],readWorkbook(wb,sheet = i)) print(paste0("read the ", i," file")) # here it says what it's doing } # remove variable Sales.Org.ID map(listed, ~ (.x %>% select(-Sales.Org.ID))) # filter the dataframes to only show rows with Key.Figure = "Actual Totals" list_actuals <- lapply(listed, function(x) x %>% filter( Key.Figure == "Actual Totals"), ) # put the result in a single dataframe result_actuals = do.call(rbind,list_actuals) 
2
  • 3
    Have you looked at purrr::map_dfr which works for a list of named dataframes appending the dataframe name using the .id argument? Commented Apr 1, 2021 at 14:03
  • Very difficult to provide an answer to this without a minimal reproducible example. I will note that you can now put dataframes in the columns of a bigger dataframe with list columns. If you had a column with the names of your dataframes, you can then add a column called data containing each dataframe and then mutate a function to do the filtering, and then do an rbind on the mutated column. Commented Apr 1, 2021 at 14:26

1 Answer 1

2

I think simplifying the code a little would help. For example, don't change the sheet names with make_names first, then iterate over sheet numbers to import. Instead, use the unaltered sheet names until after importing the data and change the names later if desired. Also instead of lapply followed by rbind, try map_df. It is not quite as specialized as the purrr::mapdfr suggested in the comments but it is a bit easier to see what is happening. In the example code below I used a mutate inside the map_df to insert the name into each data frame before map_df combines them.

library(openxlsx) library(dplyr) library(purrr) # read the data filename = 'Dataset.xlsx' wb <- loadWorkbook(filename) wb %>% sheets() %>% # read all of the sheets, put the sheet name in a new column map_df(~readWorkbook(wb, sheet = .x) %>% mutate(group_name = .x)) %>% # remove variable Sales.Org.ID select(-Sales.Org.ID) %>% # filter the dataframes to only show rows with Key.Figure = "Actual Totals" filter( Key.Figure == "Actual Totals") %>% # if you still want to change the names taken from the sheet names mutate(group_name = make_name(group_name)) 
Sign up to request clarification or add additional context in comments.

5 Comments

thanks a lot for all the tips and help. I tried to used your code, but I obtain an error: <error/dplyr:::mutate_error> Problem with mutate() input group_name. x Objekt '.x' nicht gefunden i Input group_name is .x.
looks like there was an extra paren after readWorkbook(...). I edited the answer to fix it
I had already seen the extra parenthesis and corrected it. That's not what is causing my error...
It's hard to confirm without the actual data on hand (or code that can create a reproducible example of the error) but I do have few more suggestions for things to check based on what I can see or guess. The ~ could be missing from the beginning of the code inside the map() step. The example data shown doesn't have a column called "Sales.Org.ID". Lastly, the make_names() function isn't defined here, perhaps it should be make.names() instead?
You were right! There was a missing "s" in the name of the function make_name. Thank you so much for all the help! I had reached the result as well, but your solution is much cleaner and better executed, which help me to learn as well. Thank you also for the time in figuring out the problems...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.