Filter rows that have 2 or more instances of certain values in R

Question

I am having trouble making a frequency table where the data is in multiple columns.My dataset is structured like this:

ID shape shape2 shape3 shape4 shape5 12 triangle rhombus circle square rectangle 68 hexagon NA NA NA NA 32 rectangle octagon square NA NA 18 square triangle NA NA NA 27. circle. NA. NA. NA. NA 77. square. trapezoid. triangle. NA. NA

I want to get a frequency table of all of the different shapes, like this:

shape. freq triangle. 3 square. 4 circle. 2 hexagon. 1 octagon. 1 trapezoid. 1 rhombus. 1 rectangle 2

Then I really am only interested in 5 shapes, so I would like it to look like this:

shape. freq triangle. 3 square. 4 circle. 2 rectangle 2

Thank you!

user438383 · Accepted Answer · 2021-12-01 15:51:42Z

1

Say you have your data in dat, then:

library(data.table) data.table(table(gsub("\\.", "", unlist(dat[,-1]))))[N>=2]

Note that I removed the . from some of the names. data.table isn't really necessary, I just find the syntax [N>=2] to be cleaner and more convenient than the base-r alternatives.

For a base-r alternative:

subset(data.frame(table(gsub("\\.", "", unlist(dat[,-1])))), Freq > 2)

 V1 N 1: circle 2 2: NA 4 3: rectangle 2 4: square 4 5: triangle 3

edited Dec 1, 2021 at 15:51

answered Nov 29, 2021 at 15:21

user438383

6,30710 gold badges32 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

r2evans Over a year ago

This does not strictly require data.table, it'll work in base too: subset(data.frame(table(gsub("[.]$","",unlist(dat[,-1])))), Freq > 1)

user438383 Over a year ago

@r2evans That's true, I just personally don't really like the base-r syntax of sub-setting data.frames.

r2evans Over a year ago

That's not a problem, nor was I trying to convince you to change your answer. Perhaps augment it, not change it. I provided the comment mostly as a reference for users who may be averse to adding a new package solely for one step in their data pipelines; in this case, if they're not already using data.table and have no impetus to learn it, then perhaps the higher code-golf of subset(., Freq > 1) will be good enough to use the core of your methodology.

user438383 Over a year ago

@r2evans yeah I know - thanks for the comment :)

Karthik S · Accepted Answer · 2021-11-29 15:24:55Z

Does this work:

library(dplyr) library(tidyr) df %>% pivot_longer(-ID, values_to = 'shape') %>% count(shape, name = 'freq') %>% filter(shape %in% c('triangle','square','circle','rectangle')) # A tibble: 4 x 2 shape freq <chr> <int> 1 circle 2 2 rectangle 2 3 square 4 4 triangle 3

PaulS · Accepted Answer · 2021-11-29 16:49:35Z

Another solution, based also on data.table:

library(data.table) dt <- data.table::data.table( ID = c(12L, 68L, 32L, 18L, 27L, 77L), shape = c("triangle","hexagon", "rectangle","square","circle","square"), shape2 = c("rhombus", NA, "octagon", "triangle", NA, "trapezoid"), shape3 = c("circle", NA, "square", NA, NA, "triangle"), shape4 = c("square", NA, NA, NA, NA, NA), shape5 = c("rectangle", NA, NA, NA, NA, NA) ) melt(dt, id="ID", 2:6, na.rm = T, value.name = "shape" )[,.(freq = .N), by = "shape"][freq > 1] #> shape freq #> 1: triangle 3 #> 2: rectangle 2 #> 3: square 4 #> 4: circle 2

Collectives™ on Stack Overflow

Filter rows that have 2 or more instances of certain values in R

3 Answers 3

4 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Related