0

I would like to perform a query that will select only the most recent item from a given group.


In this example, I'm tracking vans:

  • Each time they return to base, a check-in is recorded with information - mileage, etc...
  • Each time they make a delivery, a delivery is recorded - customer, etc...

This table lets us know the history for a given van. The data can be produced with a query or stored as we go - this isn't the problem.

 id | checkin_id | delivery_id | van_id ----+------------+-------------+-------- 24 | 15 | NULL | 3 25 | NULL | 28 | 3 26 | 16 | NULL | 4 27 | NULL | 29 | 3 28 | NULL | 30 | 4 29 | 17 | NULL | 5 

I can see the van's history by querying with ... WHERE van_id=3; - fine.

Conversely, I would like to be able to get a list of vans with their most recent "event". Resulting in this:

 id | checkin_id | delivery_id | van_id ----+------------+-------------+-------- 27 | NULL | 29 | 3 28 | NULL | 30 | 4 29 | 17 | NULL | 5 

I jumped to the following query:

SELECT * FROM `history` GROUP BY `van_id`; 

But this returns the following error:

#1055 - Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'database.history.checkin_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

After reading up, I understand what this is about and have to admit that my SQL is somewhat out of date - which of the items from the group do I want returned?

Adding checkin_id and delivery_id to the GROUP BY just shifts the problem - Ultimately I end up with the same set of data, just sorted differently.


This answer piqued my interest, and the graphic really helps to clearly outline the problem, thanks @azerafati!

I want to use the FIRST() or LAST() aggregate function - but MySQL doesn't appear to have them.

How do I reproduce this behaviour without processing all of the data in my application?

4
  • Learn to use proper GROUP BY. Then you won't have this error. Commented Jul 20, 2018 at 18:17
  • The reason you get that particular error is because recent versions of MySQL default to disallowing it. As far as I know, MySQL is the only RDBMS that allows it at all; the problem is that unless the non-grouped, non-aggregated fields selected are "functionally dependent" on the grouped ones, the values used are not guaranteed to be a specific one encountered (effectively a random selection from those encountered in the process of grouping). Basically, if your configuration allowed it, your query could get a result row like (27, 15, 28, 3). Commented Jul 20, 2018 at 18:22
  • You can tell how your code doesn't make sense. When you grouping by van_id, and you don't put aggregation on checkin_id and delivery_id, how would MySQL know which one to get?? For van_id 3, why checkin_id is NULL, not 15??? Why would delivery_id is 29, not NULL??? Commented Jul 20, 2018 at 18:22
  • You are trying to use MySQL's deprecated non-standard extension to GROUP BY. Please read this. Commented Jul 20, 2018 at 20:41

2 Answers 2

9

I guess your id values are unique, and later records have higher values than earlier records.

You need to use a subquery that gets the latest id for each van:

 SELECT MAX(id) id, van_id FROM history GROUP BY van_id 

Then join that to your detail query.

 SELECT h.* FROM history h JOIN ( SELECT MAX(id) id, van_id FROM history GROUP BY van_id ) m ON h.id = m.id AND h.van_id = m.van_id 

But because your id values are unique you can simplify this even more.

 SELECT h.* FROM history h JOIN ( SELECT MAX(id) id FROM history GROUP BY van_id ) m ON h.id = m.id 
Sign up to request clarification or add additional context in comments.

Comments

4

I was going to mark this as a duplicate because the question is actually asked fairly frequently, but I found those question/answers seem fairly hard to search for; so here is the generic template:

SELECT t.* FROM theTable AS t INNER JOIN ( SELECT groupingValue, MIN(someValue) AS lowestValue FROM theTable GROUP BY groupingValue ) AS rIdent ON rIdent.groupingValue = t.groupingValue AND rIdent.lowestValue= t.someValue 

lowest in your particular case being the min(id)...oh, oops; your question says first, but the detail says most recent (which I would interpret as last), so just use MAX instead of MIN. ...and the "groupingValue" is van_id.

Edit: The query should be fairly efficient if there is an index on the grouping fields and the field used to identify lowest/first/highest/recent .

2 Comments

Thanks - I'll play... I've addressed the first/most recent ambiguity.
The similar questions should all be tagged greatest-n-per-group.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.