0

What I have is a data frame that contains, among others, a factor field which holds a range of values used as factor. From what I understand it is essentially bins for numeric values.

What I want to do is to convert these to numeric values so I can use them in the downstream analysis. The idea is simple enough; (a) get a function that takes the factor level, split it at the dash and extract numeric values and calculates the average and (b) apply the function of the column

data$Range.mean <- sapply(data$Range, function(d) { range <- as.matrix(strsplit(as.character(d), "-")) (as.numeric(range[,1]) + as.numeric(range[,2]))/2 }) 

Which gives the following error

Error in FUN(X[[1L]], ...) : (list) object cannot be coerced to type 'double' 

I tried lapply instead which makes no difference. While looking for answers, I found some other solutions to this problem, which is essentially extracting the lower and upper bound separately to individual arrays then of course calculating pairwise average is trivial.

I would like to understand what I am doing/thinking wrong here though. Why is my code giving an error, and what does that error mean, really?

4
  • perhaps strsplit(as.character(d), "-")[[1]]? Commented Nov 12, 2013 at 12:38
  • or, maybe, unlist(strsplit(as.character(d), "-"))? Commented Nov 12, 2013 at 12:41
  • @alexis_laz unlist seems to do the trick, together with changing the way I access the first and second element. Although I am still a bit confused as to why I got the error message in the first place. The error message is a bit cryptic to me. Commented Nov 12, 2013 at 12:45
  • In ?strsplit you'll see that the value of this function is a list. as.numeric tries to be applied on a "list-y" object and gives this error. Commented Nov 12, 2013 at 12:48

1 Answer 1

2

You are correct in that factors in fact are integers with labeled bins. So if you have a factor like this

x <- factor(c("0-1", "0-1", "1-2", "1-2")) 

it is essentially a combination of the following components

as.integer(x) levels(x) 

To convert the factor to the actual values specified by its lables, you can take a detour through as.character and parse that into numbers.

# Recreating a data frame with a factor like yours data <- data.frame(Range = cut(runif(100), 0:10/10)) levels(data$Range) <- sub("\\((.*),(.*)]", "\\1-\\2", levels(data$Range)) # Calculating range means sapply(strsplit(as.character(data$Range), "-"), function(x) mean(as.numeric(x))) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.