4

I have a dataframe that originated from an excel file. It has the usual headers above the columns but some of the columns have % signs in them which I want to remove.

Searching stackoverflow gives some nice code for removing percentages from matrices, Any way to edit values in a matrix in R?, which did not work when I tried to apply it to my dataframe

as.numeric(gsub("%", "", my.dataframe)) 

instead it just returns a string of "NA"s with a warning message explaining that they were introduced by coercion. When I applied,

gsub("%", "", my.dataframe)) 

I got the values in "c(...)" form, where the ... represent numbers followed by commas which was reproduced for every column that I had. No % was in evidence; if I could just put this back together ... I'd be cooking.

Any help greatfully received, thanks.

3
  • 3
    As usual, it is difficult to help much unless you provide data for us to work with. dput(head(...)) is often useful. Commented Jan 17, 2013 at 17:52
  • it sounds like you may have factors in your data.frame as well. Try reading it in setting stringsAsFactors to FALSE: read.table(file="file.csv", stringsAsFactors = FALSE) Commented Jan 17, 2013 at 17:52
  • 1
    @tcash21 stringsAsFactors = FALSE doesn't solve the problem because once you read the data and then apply Arun's solution the resulting data.frame will consist of only factors, the numeric elements will be coerced to be factors, so the solution is using Arun's suggestion and adding as.numeric inside sapply call. Commented Jan 17, 2013 at 18:05

1 Answer 1

7

Based on @Arun's comment and imaging how your data.frame looks like:

> DF <- data.frame(X = paste0(1:5,'%'), Y = paste0(2*(1:5),'%'), Z = 3*(1:5), stringsAsFactors=FALSE ) > DF # this is how I imagine your data.frame looks like X Y Z 1 1% 2% 3 2 2% 4% 6 3 3% 6% 9 4 4% 8% 12 5 5% 10% 15 > # Using @Arun's suggestion > (DF2 <- data.frame(sapply(DF, function(x) as.numeric(gsub("%", "", x))))) X Y Z 1 1 2 3 2 2 4 6 3 3 6 9 4 4 8 12 5 5 10 15 

I added as.numeric in sapply call for the resulting cols to be numeric, if I don't use as.numeric the result will be factor. Check it out using sapply(DF2, class)

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks to @Jiber and everyone else (I could only mention one person). The code above worked like a charm. It was said, 'as usual no data' apologies for that but the data I'm working with is proprietary; it is from a number of experiments. In fact, the cleaned up data is in triplicate. I most probably have to start another question thread but, how can I reduce my triplicate data to a third of the size by finding the mean of values 1:3, 4:6, 7:9, etc until I get up to 22:24 and save everything in a new dataframe?
@user1945827 if this answer meets your needs, then consider accepting it by doing clic in the green tick-mark

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.