As the title, I'd like to know how to define a vectorized function in R.
- Is it just by using a loop in the function?
- Is this method efficient?
- And what's the best practice ?
A loop at the R level is not vectorized. An R loop will be calling the same R code for each element of a vector, which will be inefficient. Vectorized functions usually refer to those that take a vector and operate on the entire vector in an efficient way. Ultimately this will involve some form of loop, but as that loop is being performed in a low-level language such as C it can be highly efficient and tailored to the particular task.
Consider this silly function to add pairwise the elements of two vectors
sillyplus <- function(x, y) { out <- numeric(length = length(x)) for(i in seq_along(x)) { out[i] <- x[i] + y[i] } out } It gives the right result
R> sillyplus(1:10, 1:10) [1] 2 4 6 8 10 12 14 16 18 20 and is vectorised in the sense that it can operate on entire vectors at once, but it is not vectorised in the sense I describe above because it is exceptionally inefficient. + is vectorised at the C level in R so we really only need 1:10 + 1:10, not an explicit loop in R.
The usual way to write a vectorised function is to use existing R functions that are already vectorised. If you want to start from scratch and the thing you want to do with the function doesn't exist as a vectorised function in R (odd, but possible) then you will need to get your hands dirty and write the guts of the function in C and prepare a little wrapper in R to call the C function you wrote with the vector of data you want it to work on. There are ways with functions like Vectorize() to fake vectorisation for R functions that are not vectorised.
C is not the only option here, FORTRAN is a possibility as is C++ and, thanks to Dirk Eddelbuettel & Romain Francois, the latter is much easier to do now with the Rcpp package.
Vectorize is creating a function that is a wrapper around the mapply function. This allows you to call the scalar R function on vector arguments, but it is going via a call to mapply() which calls the scalar function on the 1st elements of the arguments, then the 2nd element of the args and so on. This is going to be slower than having a native vectorised function as there is more R code involved in calling the function repeatedly rather than the function calling out to a loop over arguments at the C level.A vectorized function will return a vector of the same length as one of its arguments. Generally one can get such a function by using combinations of built-in functions like "+", cos or exp that are vectorized as well.
vecexpcos <- function(x) exp(cos(x)) vecexpcos( (1:10)*pi ) > vecexpcos( (1:10)*pi ) # [1] 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818 If you need to use a non-vectorized function like sum, you may need to invoke mapply or Vectorize in order to get the desired behavior.
'+' (for example, outer(a, b, '+')). A slower Vectorized()` sum() function can be written: sumV <- Vectorize(function(x, y) sum(x, y))outer is vectorized. I suppose if it is I would need to revise my answer.The purpose of the Vectorize function is to enhance the capability of a normal function to consider the concept of vectorization in R.
For instance, consider the function below for subtraction:
difftemp <- function(x){ if(x > 10) return(x*10 - x) else return(x) } This is a simple function that will return a value that is less than 10 times the input if the value is greater than 10. If the input value is less than 10, then it will simply return the same value.
> difftemp(100) # [1] 900 But when you will apply the same function over a vector, then it will fail.
> difftemp(mtcars$mpg) # Error in if (x > 10) return(x * 10 - x) else return(x) : # the condition has length > 1 This is because the function does not support vectorization. To make this function Vectorized, we need to use the Vectorize function in R. For example:
# Vectorize difftemp function > difftemp_v <- Vectorize(difftemp) > difftemp_v(mtcars$mpg) # [1] 189.0 189.0 205.2 192.6 168.3 162.9 128.7 219.6 205.2 172.8 160.2 147.6 155.7 136.8 93.6 93.6 132.3 291.6 273.6 305.1 193.5 139.5 # [23] 136.8 119.7 172.8 245.7 234.0 273.6 142.2 177.3 135.0 192.6 Keep Coding!
Late to the party, but I think the question is still highly relevant and there some new methods gained popularity recently. So here's one more way to vectorize functions in R, using tidyverse methods.
First, define some data:
x <- c(1,2,3) y <- c(1,2,4) Now, assume, we'd like to perform some computation element-wise on these two vectors such that f(x,y).
For instance, computing the sum for each (pair of) element of x and y should yield: 2,4,7.
Let's use map2_dbl from purrr (a package from the tidyverse ecosystem):
x <- c(1,2,3) y <- c(1,2,4) library(tidyverse) map2_dbl(.x = x, .y = y, .f = sum) #> [1] 2 4 7 As can be seen, the result is vectorized in the sense that the sum was computed for each pair of elements from x and y.
In sum, using map() and its variants is a convenient way to vectorize functions, at least in some situations.
map function is isomorphic to the mapply function, hence not vectorized.