0

I am having trouble making a frequency table where the data is in multiple columns.My dataset is structured like this:

ID shape shape2 shape3 shape4 shape5 12 triangle rhombus circle square rectangle 68 hexagon NA NA NA NA 32 rectangle octagon square NA NA 18 square triangle NA NA NA 27. circle. NA. NA. NA. NA 77. square. trapezoid. triangle. NA. NA 

I want to get a frequency table of all of the different shapes, like this:

shape. freq triangle. 3 square. 4 circle. 2 hexagon. 1 octagon. 1 trapezoid. 1 rhombus. 1 rectangle 2 

Then I really am only interested in 5 shapes, so I would like it to look like this:

shape. freq triangle. 3 square. 4 circle. 2 rectangle 2 

Thank you!

3 Answers 3

1

Say you have your data in dat, then:

library(data.table) data.table(table(gsub("\\.", "", unlist(dat[,-1]))))[N>=2] 

Note that I removed the . from some of the names. data.table isn't really necessary, I just find the syntax [N>=2] to be cleaner and more convenient than the base-r alternatives.

For a base-r alternative:

subset(data.frame(table(gsub("\\.", "", unlist(dat[,-1])))), Freq > 2) 
 V1 N 1: circle 2 2: NA 4 3: rectangle 2 4: square 4 5: triangle 3 
Sign up to request clarification or add additional context in comments.

4 Comments

This does not strictly require data.table, it'll work in base too: subset(data.frame(table(gsub("[.]$","",unlist(dat[,-1])))), Freq > 1)
@r2evans That's true, I just personally don't really like the base-r syntax of sub-setting data.frames.
That's not a problem, nor was I trying to convince you to change your answer. Perhaps augment it, not change it. I provided the comment mostly as a reference for users who may be averse to adding a new package solely for one step in their data pipelines; in this case, if they're not already using data.table and have no impetus to learn it, then perhaps the higher code-golf of subset(., Freq > 1) will be good enough to use the core of your methodology.
@r2evans yeah I know - thanks for the comment :)
1

Does this work:

library(dplyr) library(tidyr) df %>% pivot_longer(-ID, values_to = 'shape') %>% count(shape, name = 'freq') %>% filter(shape %in% c('triangle','square','circle','rectangle')) # A tibble: 4 x 2 shape freq <chr> <int> 1 circle 2 2 rectangle 2 3 square 4 4 triangle 3 

Comments

1

Another solution, based also on data.table:

library(data.table) dt <- data.table::data.table( ID = c(12L, 68L, 32L, 18L, 27L, 77L), shape = c("triangle","hexagon", "rectangle","square","circle","square"), shape2 = c("rhombus", NA, "octagon", "triangle", NA, "trapezoid"), shape3 = c("circle", NA, "square", NA, NA, "triangle"), shape4 = c("square", NA, NA, NA, NA, NA), shape5 = c("rectangle", NA, NA, NA, NA, NA) ) melt(dt, id="ID", 2:6, na.rm = T, value.name = "shape" )[,.(freq = .N), by = "shape"][freq > 1] #> shape freq #> 1: triangle 3 #> 2: rectangle 2 #> 3: square 4 #> 4: circle 2 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.