Select unique entries showing at least one value from another column

Question

I have the following dataset (32000 entries) of water chemical compounds annual means organized by monitoring sites and sampling year:

data= data.frame(Site_ID=c(1, 1, 1, 2, 2, 2, 3, 3, 3), Year=c(1976, 1977, 1978, 2004, 2005, 2006, 2003, 2004, 2005), AnnualMean=c(1.1, 1.2, 1.1, 2.1, 2.6, 3.1, 2.7, 2.6, 1.9))

Site_ID Year AnnualMean 1 1976 1.1 1 1977 1.2 1 1978 1.1 2 2004 2.1 2 2005 2.6 2 2006 3.1 3 2003 2.7 3 2004 2.6 3 2005 1.9

I would like to select the data only from all monitoring sites showing at least a measurement in 2005 in their time range. With the above dataset, the expect output dataset would be:

Site_ID Year AnnualMean 2 2004 2.1 2 2005 2.6 2 2006 3.1 3 2003 2.7 3 2004 2.6 3 2005 1.9

I am completely new in R and have been spinning my head around with data manipulation, so thank you in advance!

Gregor Thomas · Accepted Answer · 2020-04-16 14:31:51Z

4

With dplyr:

library(dplyr) data %>% group_by(Site_ID) %>% filter(2005 %in% Year)

answered Apr 16, 2020 at 14:31

Gregor Thomas

147k22 gold badges185 silver badges320 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

tcash21 Over a year ago

And this is why dplyr is so elegant. This would be a base R way of doing it. data[data$Site_ID %in% data[data$Year %in% 2005,]$Site_ID,]

ALEXIS CAZORLA Over a year ago

As a small revision to my question, how should I write the filter request to only select the data from monitoring sites that have at least 10 measurements between 1990 and 2005 ? I tried filter(n()>=10 %in% between(phenomenonTimeReferenceYear, 1990, 2012)) without success. Thank you !

Gregor Thomas Over a year ago

That's quite a bit different - I'd recommend a new question.

ThomasIsCoding · Accepted Answer · 2020-04-16 14:35:01Z

Here is a base R solution, using subset + ave

dfout <- subset(df,!!ave(Year,Site_ID,FUN = function(x) "2005" %in% x))

such that

> dfout Site_ID Year AnnualMean 4 2 2004 2.1 5 2 2005 2.6 6 2 2006 3.1 7 3 2003 2.7 8 3 2004 2.6 9 3 2005 1.9

akrun · Accepted Answer · 2020-04-16 19:13:15Z

0

An option with data.table

library(data.table) setDT(data)[, .SD[2005 %in% Year], Site_ID]

answered Apr 16, 2020 at 19:13

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

Select unique entries showing at least one value from another column

3 Answers 3

3 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Related