4

I have a data table, A like:

year location sigma_NN_1 sigma_NN_2 sigma_NN_3 2076 43.59375_-116.78125 1.4681173 1.664289 1.735974 2077 43.59375_-116.78125 1.3798515 1.550524 1.551269 2078 43.59375_-116.78125 0.7934367 1.064248 1.177981 2079 43.59375_-116.78125 1.8235574 1.991018 2.288402 2080 43.59375_-116.78125 2.5560329 2.578093 2.589334 

And I want to use it to mask another data table where the value of sigmas are below a threshold, lets say 2. Lets say my second data table is B

year location location_NN_1 location_NN_2 location_NN_3 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 41.71875_-74.15625 2080 43.59375_-116.78125 41.34375_-90.78125 41.96875_-86.21875 41.21875_-90.65625 

So, I want to have something like B[A<2], but obviously this does not work, otherwise, I would not be here.

Any suggestions?

Expected output:

output

year location location_NN_1 location_NN_2 location_NN_3 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 NA 2080 43.59375_-116.78125 NA NA NA 

The goal is to find locations whose corresponding sigma in data table A is less than 2.

1
  • You may need something like i1 <- setDT(A)[, Reduce('&', lapply(.SD, <, 2)), .SDcols = 3:5]; B[i1] Commented Apr 29, 2019 at 16:56

3 Answers 3

4

We can use base R subsetting to identify the appropriate cells of B and replace them with NA. This approach requires the order of the columns in A and B to be the same.

We can use a simple conditional statement on dfa to find the cells with a sigma value not less than 2. Since we don't want to apply the conditional to the year and condition columns, we subset them out before applying the conditional:

!(dfa[-c(1,2)] < 2) sigma_NN_1 sigma_NN_2 sigma_NN_3 [1,] FALSE FALSE FALSE [2,] FALSE FALSE FALSE [3,] FALSE FALSE FALSE [4,] FALSE FALSE TRUE [5,] TRUE TRUE TRUE 

This returns a logical matrix we can use to subset B to replace values. What happens here is we subset B twice: first we ignore the year and location columns to get just the location columns, then we use the earlier conditional to select the rows where the matching sigma value is not less than 2 and insert NA into it:

dfb[-c(1,2)][!(dfa[-c(1,2)] < 2)] <- NA dfb year location location_NN_1 location_NN_2 location_NN_3 1 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625 2 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625 3 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875 4 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA> 5 2080 43.59375_-116.78125 <NA> <NA> <NA> 
Sign up to request clarification or add additional context in comments.

Comments

3

Simple base R solution:

B[-(1:2)][A[-(1:2)]>=2] <- NA 

Select all columns except first and second B[-(1:2)].

Then use the vectorized logical expression A[-(1:2)]>=2 to set the correct elements to NA.

Results:

 year location location_NN_1 location_NN_2 location_NN_3 1 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625 2 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625 3 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875 4 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA> 5 2080 43.59375_-116.78125 <NA> <NA> <NA> 

Comments

3

Assuming that these data.table objects, and assuming that the rows of the 'sigma' columns in 'A' should all be less than the threshold 2.

library(data.table) nm1 <- grep("sigma", names(A), value = TRUE) i1 <- setDT(A)[, Reduce(`&`, lapply(.SD, `<`, 2)), .SDcols = nm1] setDT(B)[i1] 

Update

Based on the expected output

nm2 <- grep("sigma", names(A)) B[, (nm2) := Map(function(x, y) replace(x, y >= 2, NA_character_), .SD, A[, nm2, with = FALSE]), .SDcols = nm2][] # year location location_NN_1 location_NN_2 location_NN_3 #1: 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625 #2: 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625 #3: 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875 #4: 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA> #5: 2080 43.59375_-116.78125 <NA> <NA> <NA> 

data

A <- structure(list(year = 2076:2080, location = c("43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125"), sigma_NN_1 = c(1.4681173, 1.3798515, 0.7934367, 1.8235574, 2.5560329), sigma_NN_2 = c(1.664289, 1.550524, 1.064248, 1.991018, 2.578093), sigma_NN_3 = c(1.735974, 1.551269, 1.177981, 2.288402, 2.589334)), class = "data.frame", row.names = c(NA, -5L)) B <- structure(list(year = 2076:2080, location = c("43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125"), location_NN_1 = c("41.15625_-90.65625", "43.34375_-78.15625", "41.34375_-90.78125", "43.53125_-116.78125", "41.34375_-90.78125"), location_NN_2 = c("41.21875_-90.65625", "43.34375_-78.21875", "41.21875_-90.65625", "41.34375_-90.78125", "41.96875_-86.21875"), location_NN_3 = c("41.15625_-90.65625", "43.28125_-78.15625", "41.53125_-73.96875", "41.71875_-74.15625", "41.21875_-90.65625")), class = "data.frame", row.names = c(NA, -5L)) 

6 Comments

the i1 <- setDT(A)[, Reduce(&, lapply(.SD, <, 2)), .SDcols = nm1] does not work: Error: unexpected '<' in "i1 <- setDT(A_sigma)[, Reduce(&, lapply(.SD, <"
I meant: Error: unexpected '<' in "i1 <- setDT(A)[, Reduce(&, lapply(.SD, <"
sorry backquotes didnt copy' can u test updated code'
your output looks like ` year location location_NN_1 location_NN_2 location_NN_3 1: 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625 2: 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625 3: 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875` You are missing two values one in sigma_1 and one in sigma_2 column, both in 4th row. are less than 2
@NoLie Sorry, I didn't know the expected output at the time I write the answer.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.