28

This question is related to a post with a similar title (replace NA in an R vector with adjacent values). I would like to scan a column in a data frame and replace NA's with the value in the adjacent cell. In the aforementioned post, the solution was to replace the NA not with the value from the adjacent vector (e.g. the adjacent element in the data matrix) but was a conditional replace for a fixed value. Below is a reproducible example of my problem:

UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300) STATUS <-c('ACTIVE','INACTIVE','ACTIVE','ACTIVE','INACTIVE','ACTIVE','INACTIVE','ACTIVE','ACTIVE', 'ACTIVE','INACTIVE') TERMINATED <- c('1999-07-06' , '2008-12-05' , '2000-08-18' , '2000-08-18' ,'2000-08-18' ,'2008-08-18', '2008-08-18','2006-09-19','2006-09-19' ,'2006-09-19' ,'1999-03-15') START <- c('2007-04-23','2008-12-06','2004-06-01','2007-02-01','2008-04-19','2010-11-29','2010-12-30', '2007-10-29','2008-02-05','2008-06-30','2009-02-07') STOP <- c('2008-12-05','4712-12-31','2007-01-31','2008-04-18','2010-11-28','2010-12-29','4712-12-31', '2008-02-04','2008-06-29','2009-02-06','4712-12-31') #creating dataframe TEST <- data.frame(UNIT,STATUS,TERMINATED,START,STOP); TEST UNIT STATUS TERMINATED START STOP 1 NA ACTIVE 1999-07-06 2007-04-23 2008-12-05 2 NA INACTIVE 2008-12-05 2008-12-06 4712-12-31 3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 #using the syntax for a conditional replace and hoping it works :/ TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS; TEST UNIT STATUS TERMINATED START STOP 1 1 ACTIVE 1999-07-06 2007-04-23 2008-12-05 2 2 INACTIVE 2008-12-05 2008-12-06 4712-12-31 3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 

The outcome should be:

 UNIT STATUS TERMINATED START STOP 1 ACTIVE ACTIVE 1999-07-06 2007-04-23 2008-12-05 2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31 3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 
2
  • maybe try TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS[is.na(TEST$UNIT)]; TEST Commented Mar 26, 2013 at 5:23
  • 2
    You can't mix types within a column in a data frame. Commented Mar 26, 2013 at 5:24

3 Answers 3

34

It didn't work because status was a factor. When you mix factor with numeric then numeric is the least restrictive. By forcing status to be character you get the results you're after and the column is now a character vector:

TEST$UNIT[is.na(TEST$UNIT)] <- as.character(TEST$STATUS[is.na(TEST$UNIT)]) ## UNIT STATUS TERMINATED START STOP ## 1 ACTIVE ACTIVE 1999-07-06 2007-04-23 2008-12-05 ## 2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31 ## 3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 ## 4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 ## 5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 ## 6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 ## 7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 ## 8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 ## 9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 ## 10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 ## 11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 
Sign up to request clarification or add additional context in comments.

1 Comment

This won't work for me because I also have some rows with NA in the adjacent column. Anyways to work around it? Both my columns are of int.
17

You have to do

TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS[is.na(TEST$UNIT)] 

so that the value will be replaced with the adjacent value. Otherwise there is a mismatch between the number of values to be replaced and the values to replace them with. This would result in the values being replaced in row order. It works in this case because the two values being replaced are the first two.

1 Comment

I think this is OK as an answer. Sure, the solution is the same as the one given by others, but you have added an explanation of what is going on. It shouldn't be a comment, in my opinion.
2
TEST$UNIT = ifelse(is.na(TEST$UNIT), paste(TEST$STATUS),paste(TEST$UNIT));TEST UNIT STATUS TERMINATED START STOP 1 ACTIVE ACTIVE 1999-07-06 2007-04-23 2008-12-05 2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31 3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31 4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18 5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29 7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04 9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29 10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06 11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.