2

I have a dataframe in R that looks like this: enter image description here

And I want to perform the following operation to each column:

((abs(a-b))+(abs(a-c))+(abs(a-d)))/200 The problem is that some rows will have NA values so when I do this:

data$E=(abs(data$a-data$b)+abs(data$a-data$c)+abs(data$a-data$d))/200) 

The results in the column E will be NA for many rows. Unless they don't have any NA values (like in ID1) Ideally I would like the numerator operation to stop when it encounters the first NA. Therefore for ID 4, let's say, it would look like this:

(abs(a-b)+(abs(a-c)))/200 Therefore omitting column d because of their NA values.

Any help will be appreciated. Thanks!

4
  • Replace NA by 0? But -- is the semantics of that move coherent? If a value is not available, assuming that it is 0 is as arbitrary as assuming that it is e.g. 42. What you are trying to do seems ad hoc. Commented Jan 2, 2020 at 12:12
  • I can't assume is 0 because I'm calculating variability and 0 has a real meaning. the actual data is from cells in an embryo. Commented Jan 2, 2020 at 12:14
  • I guess you are not assuming it is zero, so much as assuming that it is a, but you are still making assumptions on missing values. Commented Jan 2, 2020 at 12:16
  • each column represents a cell that touches another cell. Some cells are touched by many cells (up to 8) but other cells are only touched by 2 or 3 cells depending on their position on the embryo. Unfortunately they are not a perfect grid. Commented Jan 2, 2020 at 12:22

2 Answers 2

2

Here is a base R solution using rowSums(), where the option na.rm should be set to TRUE.

You can try the code below for your objective:

data$j <- rowSums(abs(replicate((ncol(data)-2),df$a) - data[-(1:2)]),na.rm = T)/156 

such that

> data ID a b c d e f g h i j 1 1 0 0 0 1 NA NA NA NA NA 0.006410256 2 2 0 0 0 1 1 NA NA NA NA 0.012820513 3 3 0 0 0 0 0 NA NA NA NA 0.000000000 4 4 0 0 0 0 0 0 NA NA NA 0.000000000 5 5 0 0 0 NA NA NA NA NA NA 0.000000000 6 6 0 0 0 0 0 NA NA NA NA 0.000000000 

DATA

data <- structure(list(ID = 1:6, a = c(0, 0, 0, 0, 0, 0), b = c(0, 0, 0, 0, 0, 0), c = c(0, 0, 0, 0, 0, 0), d = c(1, 1, 0, 0, NA, 0 ), e = c(NA, 1, 0, 0, NA, 0), f = c(NA, NA, NA, 0, NA, NA), g = c(NA, NA, NA, NA, NA, NA), h = c(NA, NA, NA, NA, NA, NA), i = c(NA, NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = "data.frame") 
Sign up to request clarification or add additional context in comments.

13 Comments

I manually calculated it and the result doesn't work for the data with NA. Also it didn't worked with my real dataset that is bigger than the example. Thanks though
@Amaranta_Remedios why not working? do you have more columns? I am not sure you want abs((a-b)+(a-c)+(a-d))/200 or (abs(a-b)+abs(a-c)+abs(a-d))/200
Yes real data has up to 8 columns(a to h), and 1982 different IDs
@Amaranta_Remedios see my updated solution. One more thing, I am not sure you want abs((a-b)+(a-c)+(a-d))/200 or (abs(a-b)+abs(a-c)+abs(a-d))/200, since two formulas give different results
@Amaranta_Remedios Maybe it is better to dput() your data and paste it in your post...By the way, I guess you need a lot of manual work when you apply the answer you accepted when you have 8 columns, but mine does not need
|
1

This is my effort of using tidyverse. Please let me know if I misunderstood your question:

data <- data.frame(a = c(40, 20, 2, 4, 5), b = c(3, 3, 5,0, 0), c = c(0, NA, 4, 8, 0), d = c(10, NA, 10, NA, 10)) library(tidyverse) data %>% mutate( x = (ifelse(!is.na(a-b), abs(a-b), 0) + ifelse(!is.na(a-c), abs(a-c), 0) + ifelse(!is.na(a-d), abs(a-d), 0))/200 ) #> a b c d x #> 1 40 3 0 10 0.535 #> 2 20 3 NA NA 0.085 #> 3 2 5 4 10 0.065 #> 4 4 0 8 NA 0.040 #> 5 5 0 0 10 0.075 

Created on 2020-01-02 by the reprex package (v0.3.0)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.