Why is R dplyr::mutate inconsistent with custom functions

Question

This question is a "why", not a how. In the following code I'm trying to understand why dplyr::mutate evaluates one custom function (f()) with the entire vector but not with the other custom function (g()). What exactly is mutate doing?

set.seed(1);sum(rnorm(100, c(0, 10, 100))) f=function(m) { set.seed(1) sum(rnorm(100, mean=m)) } g <- function(m) sin(m) df <- data.frame(a=c(0, 10, 100)) y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a)) y2 <- rowwise(df) %>% mutate(asq=a^2, fout=f(a), gout=g(a)) y3 <- group_by(df, a) %>% summarize(asq=a^2, fout=f(a), gout=g(a))

For all three columns, asq, fout, and gout, evaluation is rowwise in y2 and y3 and the results are identical. However, y1$fout is 3640.889 for all three rows, which is the result of evaluating sum(rnorm(100, c(0, 10, 100))). So the function f() is evaluating the entire vector for each row.

A closely related question has been asked elsewhere mutate/transform in R dplyr (Pass custom function), but the "why" was not explained.

MrFlick · Accepted Answer · 2019-12-03 20:09:08Z

sin and ^ are vectorized, so they natively operate on each individual value, rather than on the whole vector of values. f is not vectorized. But you can do f = Vectorize(f) and it will operate on each individual value as well.

y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a)) y1

 a asq fout gout 1 0 0 3640.889 0.0000000 2 10 100 3640.889 -0.5440211 3 100 10000 3640.889 -0.5063656

f = Vectorize(f) y1a <- mutate(df, asq=a^2, fout=f(a), gout=g(a)) y1a

 a asq fout gout 1 0 0 10.88874 0.0000000 2 10 100 1010.88874 -0.5440211 3 100 10000 10010.88874 -0.5063656

Some additional info on vectorization here, here, and here.

This is a great answer, thank you. The links on vectorization are on point. My mental model was that mutate was implicitly doing a loop, but if I understand correctly it's not, it's passing a vector. This makes sense and explains the differing results in my example.

akrun · Accepted Answer · 2018-04-22 15:52:07Z

We can loop through each element of 'a' using map and apply the function f

library(tidyverse) df %>% mutate(asq = a^2, fout = map_dbl(a, f), gout = g(a)) # a asq fout gout #1 0 0 10.88874 0.0000000 #2 10 100 1010.88874 -0.5440211 #3 100 10000 10010.88874 -0.5063656

Thanks. This solution follows Jim Hester's recommendation, the third link in the answer I accepted.

Collectives™ on Stack Overflow

Why is R dplyr::mutate inconsistent with custom functions

2 Answers 2

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Linked

Related