How do I filter rows based on two values within a column?

Question

Totally new to R and I trying to solve this using the dplyr package. I want to filter out and return countries that both have Import and Export values and view them separately. I tried a lot of methods such as select and filter but have been unable to do so.

Country Year Quantity Description Import/Export A 2001 10 Frozen Export B 2001 50 Fresh Import B 2004 20 Frozen Export C 2003 30 Frozen Import C 2005 40 Fresh Export C 2006 60 Frozen Import D 2007 290 Fresh Import

Ideally, the end result should be this:

Country Year Quantity Description Import/Export B 2001 50 Fresh Import B 2004 20 Frozen Export C 2003 30 Frozen Import C 2005 40 Fresh Export C 2006 60 Frozen Import

GuedesBF · Accepted Answer · 2021-09-11 13:52:58Z

We can group_by() Country, then filter all groups with any "Import/Export" =='Import' and any ""Import/Export"" == 'Export'

library(dplyr) df %>% group_by(Country) %>% filter(any(`Import/Export`=='Import') & any(`Import/Export`=='Export')) %>% ungroup() # A tibble: 5 x 5 Country Year Quantity Description `Import/Export` <chr> <dbl> <dbl> <chr> <chr> 1 B 2001 50 Fresh Import 2 B 2004 20 Frozen Export 3 C 2003 30 Frozen Import 4 C 2005 40 Fresh Export 5 C 2006 60 Frozen Import

data

structure(list(Country = c("A", "B", "B", "C", "C", "C", "D"), Year = c(2001, 2001, 2004, 2003, 2005, 2006, 2007), Quantity = c(10, 50, 20, 30, 40, 60, 290), Description = c("Frozen", "Fresh", "Frozen", "Frozen", "Fresh", "Frozen", "Fresh"), `Import/Export` = c("Export", "Import", "Export", "Import", "Export", "Import", "Import" )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame" ))

Amazing!! What would be the benefit of presenting the data as shown in the 2nd part?
structure(list(Country = c("A", "B", "B", "C", "C", "C", "D"), Year = c(2001, 2001, 2004, 2003, 2005, 2006, 2007), Quantity = c(10, 50, 20, 30, 40, 60, 290), Description = c("Frozen", "Fresh", "Frozen", "Frozen", "Fresh", "Frozen", "Fresh"), Import/Export = c("Export", "Import", "Export", "Import", "Export", "Import", "Import" )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame" ))
this is your data in a reproducible form. If you run that code, it gets us your dataframe. It makes it easier to share the data. You can get the same code with dput(data)
You should always share your data this way, it makes it much easier for others to test the answers and manipulate your data.

Ronak Shah · Accepted Answer · 2021-09-11 13:40:00Z

Using data from @GuedesBF answer here is another dplyr way to filter groups which has both 'Import' and 'Export'.

library(dplyr) df %>% group_by(Country) %>% filter(all(c('Import', 'Export') %in% `Import/Export`)) %>% ungroup() # Country Year Quantity Description `Import/Export` # <chr> <dbl> <dbl> <chr> <chr> #1 B 2001 50 Fresh Import #2 B 2004 20 Frozen Export #3 C 2003 30 Frozen Import #4 C 2005 40 Fresh Export #5 C 2006 60 Frozen Import

akrun · Accepted Answer · 2021-09-11 19:09:26Z

Using data.table

library(data.table) setDT(df)[df[, .I[all(c('Import', 'Export') %in% `Import/Export`)], Country]$V1] Country Year Quantity Description Import/Export 1: B 2001 50 Fresh Import 2: B 2004 20 Frozen Export 3: C 2003 30 Frozen Import 4: C 2005 40 Fresh Export 5: C 2006 60 Frozen Import

danlooo · Accepted Answer · 2021-09-11 13:03:58Z

library(tidyverse) data <- tribble( ~Country, ~Year, ~Quantity, ~Description, ~`Import/Export`, "A", 2001, 10, "Frozen", "Export", "B", 2001, 50, "Fresh", "Import", "B", 2004, 20, "Frozen", "Export", "C", 2003, 30, "Frozen", "Import", "C", 2005, 40, "Fresh", "Export", "C", 2006, 60, "Frozen", "Import", "D", 2007, 290, "Fresh", "Import" ) data #> # A tibble: 7 x 5 #> Country Year Quantity Description `Import/Export` #> <chr> <dbl> <dbl> <chr> <chr> #> 1 A 2001 10 Frozen Export #> 2 B 2001 50 Fresh Import #> 3 B 2004 20 Frozen Export #> 4 C 2003 30 Frozen Import #> 5 C 2005 40 Fresh Export #> 6 C 2006 60 Frozen Import #> 7 D 2007 290 Fresh Import selected_countries <- data %>% mutate(is_there = TRUE) %>% distinct(Country, `Import/Export`, is_there) %>% pivot_wider(names_from = "Import/Export", values_from = is_there) %>% filter(!is.na(Export) & !is.na(Import)) %>% pull(Country) %>% unique() selected_countries #> [1] "B" "C" data %>% filter(Country %in% selected_countries) #> # A tibble: 5 x 5 #> Country Year Quantity Description `Import/Export` #> <chr> <dbl> <dbl> <chr> <chr> #> 1 B 2001 50 Fresh Import #> 2 B 2004 20 Frozen Export #> 3 C 2003 30 Frozen Import #> 4 C 2005 40 Fresh Export #> 5 C 2006 60 Frozen Import

^{Created on 2021-09-11 by the reprex package (v2.0.1)}

It's to indicate that the property is "there". Pivot wider requires a name and a value column. After the pivot, we have all the info need in one row per country.

Collectives™ on Stack Overflow

How do I filter rows based on two values within a column?

4 Answers 4

5 Comments

Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

Comments

2 Comments

Linked

Related