2

I have a dataset with 460K observations loaded into a data frame named data. One of the variables is defined as follows:

$ exeroft1 <int> NA, 105, NA, 205, NA, 102, 220, 102, 102, 220, 230, NA, NA, 105, 102, 210, 203, NA, NA, 107, 103, NA, 203, NA, NA, 105, 107, NA, 102, NA, 107, NA, 107, 103, ... 

I need to pass each value of exeroft1 to the following function, which converts the value into another value:

calculateWeeklyExercise <- function(value) { if (value > 200) { timesWeekly = (value - 200) / 4 } else { timesWeekly = (value - 100) } timesWeekly } 

Here is some R code that does all the processing:

data %>% # Filter missing values filter(!is.na(exeroft1)) %>% # Add a column to the data frame which represents exercise rate mutate(weeklyExercise = calculateWeeklyExercise(exeroft1)) %>% # Select some values select(educa, sex, exeroft1, weeklyExercise) 

When I execute this code, I get the following warning, which I do not understand:

Warning message: In if (value > 200) { : the condition has length > 1 and only the first element will be used 

I'm not very experienced with R. It seems that the value I'm passing to the function is not being treated as a integer, even though it is. For any value < 200, the correct value is calculated. For any value > 200, it's not. So, essentially, in the function, only the else clause seems to ever get executed.

4
  • It is related to the if/else problem when you have length > 1. Use ifelse Or if we are applying for each row, then data %>% # Filter missing values filter(!is.na(exeroft1)) %>% rowwise() %>%> and do it. Commented Aug 19, 2016 at 15:01
  • @akrun - I don't understand though. Why is 'value' being treated as having a length > 1 when it's an integer? Commented Aug 19, 2016 at 15:03
  • if I understand (without a reproducible example), the value is taking a column as input and the column have length > 1. i.e. if(1:3 >2) 1 get the same warning Commented Aug 19, 2016 at 15:05
  • or calculateWeeklyExercise <- Vectorize(calculateWeeklyExercise) and run again. but you need a condition to handle NA in this case Commented Aug 19, 2016 at 16:10

1 Answer 1

2

If we modify the function to use ifelse ie. the vectorized form of if/else which can take multiple values, then it should work

calculateWeeklyExerciseNew <- function(value) { ifelse(value > 200, (value - 200) / 4, value - 100) } 

The warning message is obvious as the OP's function is applied to a column of dataset where the number of elements are greater than 1. As if/else takes only a single observation, it throws the warning. i.e.

if(1:3 >2) 1 

Warning message: In if (1:3 > 2) 1 : the condition has length > 1 and only the first element will be used

In the above example, we have a vector of length 3 (1:3), and it gives the warning, suppose if we do with ifelse

ifelse(1:3 >2, 1, 0) #[1] 0 0 1 

However, we can still use the OP's function to take only a single observation by doing the rowwise i.e.

data %>% filter(!is.na(exeroft1)) %>% rowwise() %>% mutate(weeklyExercise = calculateWeeklyExercise(exeroft1)) %>% select(educa, sex, exeroft1, weeklyExercise) 

but, it would be slower.

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you. Great explanation.
Just an FYI, I tried the ifelse and I'm still getting the warning, and incorrect results.
@RandyMinder You haven't provided a reproducible example for me to test it. My suggestion was based on the warning you showed.
Your answer is still useful and the rowwise suggestion works perfectly.
@RandyMinder Can you change the ifelse to if_else (dplyr specific)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.