0

Big picture: I want my user defined function to iterate through a list (or vector) of arguments like a loop. (In this case the argument is a character string)

get_avg2 <- function(v_name) { avg <- "_Average" data_1 <- PFF_College_Defense_data %>% dplyr::group_by(Name) %>% dplyr::summarise("{{ v_name }}_{avg}" := mean({{ v_name }}, na.rm = TRUE)) PFF_NCAA_Average_grades <- merge(PFF_NCAA_Average_grades, data_1, by = "Name") return(PFF_NCAA_Average_grades) } v_names <- list("hits", "tackles", "forced_fumbles") for (i in v_names) { get_avg2(i) } #didn't work PFF_NCAA_Average_grades <- purrr::map_df(v_names, get_avg2) #didnt' work 

I am trying to get averages by group from a dataframe and store it as another dataframe. I have written a UDF to accept one argument as the variable name from the original database, then the UDF runs the calculation and merges it to the created dataframe which I pre-formatted to fit the results of the UDF. I want to pass a list to my function and have it iterate over that list like a loop. Although I just can't conceptually seem to master this concept or the use of purrr::map which I thought would do the trick.

I know I can do this:

PFF_NCAA_Average_grades <- get_avg2(hits) PFF_NCAA_Average_grades <- get_avg2(tackles) PFF_NCAA_Average_grades <- get_avg2(forced_fumbles) 

But that seems ugly and slow. Can someone please help me conceptually understand the best way to do this?

Thanks in advance!!!

*** UPDATED WITH REPREX ******

library(tidyverse) data_sample <- data.frame( Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"), Defense_Grade = c(88, 86, 92, 94, 97, 95), Tackle_Grade = c(66, 69, 72, 74, 76, 78), Coverage_Grade = c(44, 43, 44, 76, 73, 78) ) #Here I set up the dataframe which the function will bind to data_sample_averages <- data_sample %>% group_by(Name) %>% dplyr::summarise(Defense_Grade_Average = mean(Defense_Grade)) #> `summarise()` ungrouping output (override with `.groups` argument) #Function which computes average of variable (the only argument) and merges it back to data_sample_averages get_avg2 <- function(v_name) { avg <- "_Average" data_1 <- data_sample %>% dplyr::group_by(Name) %>% dplyr::summarise("{{ v_name }}_{avg}" := mean({{ v_name }}, na.rm = TRUE)) data_sample_averages <- merge(data_sample_averages, data_1, by = "Name") return(data_sample_averages) } #This works - it computers the average of Tackle_Grade and binds it to data_sample_averages data_sample_averages <- get_avg2(Tackle_Grade) #> `summarise()` ungrouping output (override with `.groups` argument) #shows you the averages print(data_sample_averages) #> Name Defense_Grade_Average Tackle_Grade__Average #> 1 Andre Walker 95.33333 76 #> 2 Dalton Campbell 88.66667 69 #Neither of these work - this is where I'm stuck variable_list <- list("Defense_Grade", "Tackle_Grade", "Coverage Grade") data_sample_averages <- lapply(variable_list, get_avg2) #> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric #> or logical: returning NA #> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric #> or logical: returning NA #> `summarise()` ungrouping output (override with `.groups` argument) #> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric #> or logical: returning NA #> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric #> or logical: returning NA #> `summarise()` ungrouping output (override with `.groups` argument) #> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not #> numeric or logical: returning NA #> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not #> numeric or logical: returning NA #> `summarise()` ungrouping output (override with `.groups` argument) data_sample_averages <- purrr::map(variable_list, get_avg2) #> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric #> or logical: returning NA #> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric #> or logical: returning NA #> `summarise()` ungrouping output (override with `.groups` argument) #> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric #> or logical: returning NA #> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric #> or logical: returning NA #> `summarise()` ungrouping output (override with `.groups` argument) #> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not #> numeric or logical: returning NA #> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not #> numeric or logical: returning NA #> `summarise()` ungrouping output (override with `.groups` argument) 

This feels like a really simple operation - compute the mean by group from one dataframe and bind it to another dataframe - that is not really the part I'm struggling with. What I want is for my function to iterate through a series of arguments automatically. I want to quickly be able to build a list (or vector - I'm not set on using lists) of variables and pass it to the function as the argument so it builds a dataframe with the variables I feed it. But I'm open to the idea that I have something conceptually wrong and that I should be using a loop, purr, map, etc. or change the way my function is written?

11
  • did you tried unlist? Commented Dec 16, 2020 at 22:52
  • How do you mean? Where should I try it? Commented Dec 16, 2020 at 22:55
  • Your function seems to be a rather convoluted way of doing in the tidyverse what the base function ave does. Commented Dec 16, 2020 at 23:02
  • I've never used that function but it looks like it designed to work with factors whereas my grouping category is a character? will it work if I just convert my characters to factors and run it? Commented Dec 16, 2020 at 23:11
  • @Spence_p yes. It should work with characters directly though Commented Dec 16, 2020 at 23:44

1 Answer 1

2

The difference between your standalone example and the function where you pass list is - In standalone example you are passing unquoted variable (get_avg2(Tackle_Grade)) whereas when passing a vector/list you are passing quoted variables. (variable_list <- list("Defense_Grade", "Tackle_Grade", "Coverage Grade")).

It is not easy to pass unquoted variables in a list so it is better we change our function that can accept quoted variables/strings. For that you need to change the function to :

library(dplyr) get_avg2 <- function(v_name) { avg <- "_Average" data_1 <- data_sample %>% dplyr::group_by(Name) %>% dplyr::summarise(!!paste0(v_name, avg):=mean(.data[[v_name]], na.rm = TRUE)) data_sample_averages <- merge(data_sample_averages, data_1, by = "Name") return(data_sample_averages) } 

For a single value you call it as :

get_avg2("Tackle_Grade") # Name Defense_Grade_Average Tackle_Grade_Average #1 Andre Walker 95.33333 76 #2 Dalton Campbell 88.66667 69 

For list/vector of values you can then use lapply :

variable_list <- c("Defense_Grade", "Tackle_Grade", "Coverage_Grade") lapply(variable_list, get_avg2) #[[1]] # Name Defense_Grade_Average.x Defense_Grade_Average.y #1 Andre Walker 95.33333 95.33333 #2 Dalton Campbell 88.66667 88.66667 #[[2]] # Name Defense_Grade_Average Tackle_Grade_Average #1 Andre Walker 95.33333 76 #2 Dalton Campbell 88.66667 69 #[[3]] # Name Defense_Grade_Average Coverage_Grade_Average #1 Andre Walker 95.33333 75.66667 #2 Dalton Campbell 88.66667 43.66667 

However, ideally you'll not pass such variables one by one using lapply/map but use across

data_sample %>% group_by(Name) %>% summarise(across(ends_with('Grade'), mean, na.rm = TRUE)) #. Name Defense_Grade Tackle_Grade Coverage_Grade # <chr> <dbl> <dbl> <dbl> #1 Andre Walker 95.3 76 75.7 #2 Dalton Campbell 88.7 69 43.7 

But maybe you are building this function for something else.

Sign up to request clarification or add additional context in comments.

7 Comments

Wow this is a very comprehensive answer. I'm definitely conceptually struggling with NSE. Thank you so much!!!!! I think the last solution using across may work but the "Grade" variables.. there's only about 6 of them and I have another 50+ columns all with unique type of names. so I ruled out using the _with style. What is another way I could use across() to solve this?
You can pass column name with pattern or any particular regex. (ends_with('Grade')) or you can also pass it as column number i.e cols <- c(2:5, 8, 12:15).
I'm sorry to bug you, trying to understand - so I would assign cols before this sequence and then call it where 'Grade' is right now?
Yes, so cols are assigned before and then use it in across like data_sample %>% group_by(Name) %>% summarise(across(cols, mean, na.rm = TRUE))
Wow, works! thanks so much!! Only issue I had was that when assigning 'cols' it wanted zero indexing. In this same example when I tried to set cols <- c(2:4) it told me 4th column doesnt exist. I had to do 1:3 which implies zero indexing? I thought R always did 1 indexing?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.