I have been using the %in% operator for a long time since I knew about it.
However, I still don't understand how it works. At least, I thought that I knew how, but I always doubt about the order of the elements.
Here you have an example:
This is my dataframe:
df <- data.frame("col1"=c(1,2,3,4,30,21,320,123,4351,1234,3,0,43), "col2"=rep("something",13)) This how it looks
> df col1 col2 1 1 something 2 2 something 3 3 something 4 4 something 5 30 something 6 21 something 7 320 something 8 123 something 9 4351 something 10 1234 something 11 3 something 12 0 something 13 43 something Let's say I have a numerical vector:
myvector <- c(30,43,12,333334,14,4351,0,5,55,66) And I want to check if all the numbers (or some) from my vector are in the previous dataframe. To do that, I always use %in%.
I thought 2 approaches:
#common in both: 30, 4351, 0, 43 # are the numbers from df$col1 in my vector? trial1 <- subset(df, df$col1 %in% myvector) # are the numbers of the vector in df$col1? trial2 <- subset(df, myvector %in% df$col1) Both approaches make sense to me and they should give the same result. However, only the result from trial1 is okay.
> trial1 col1 col2 5 30 something 9 4351 something 12 0 something 13 43 something What I don't understand is why the second way is giving me some common numbers and some which are not in the vector.
col1 col2 1 1 something 2 2 something 6 21 something 7 320 something 11 3 something 12 0 something Could someone explain to me how `%in% operator works and why the second way gives me the wrong result?
Thanks very much in advance
Regards
leftoperand.myvector %in% df$col1will always return a vector the same length aslength(myvector)regardless ofnrow(df), which means that that return value is not safe for subsettingdf.