683

Table:

UserId, Value, Date. 

I want to get the UserId, Value for the max(Date) for each UserId. That is, the Value for each UserId that has the latest date.

How do I do this in SQL? (Preferably Oracle.)

I need to get ALL the UserIds. But for each UserId, only that row where that user has the latest date.

9
  • 26
    What if there are multiple rows having the maximum date value for a particular userid? Commented Sep 23, 2008 at 18:29
  • What are the key fields of the table? Commented Jun 20, 2013 at 9:53
  • some solutions below compared: sqlfiddle.com/#!4/6d4e81/1 Commented Aug 7, 2014 at 7:27
  • 1
    @DavidAldridge, That column is likely unique. Commented Feb 3, 2015 at 3:38
  • stackoverflow.com/questions/2854257/… Commented Oct 11, 2015 at 10:29

36 Answers 36

531

I see many people use subqueries or else window functions to do this, but I often do this kind of query without subqueries in the following way. It uses plain, standard SQL so it should work in any brand of RDBMS.

SELECT t1.* FROM mytable t1 LEFT OUTER JOIN mytable t2 ON (t1.UserId = t2.UserId AND t1."Date" < t2."Date") WHERE t2.UserId IS NULL; 

In other words: fetch the row from t1 where no other row exists with the same UserId and a greater Date.

(I put the identifier "Date" in delimiters because it's an SQL reserved word.)

In case if t1."Date" = t2."Date", doubling appears. Usually tables has auto_inc(seq) key, e.g. id. To avoid doubling can be used follows:

SELECT t1.* FROM mytable t1 LEFT OUTER JOIN mytable t2 ON t1.UserId = t2.UserId AND ((t1."Date" < t2."Date") OR (t1."Date" = t2."Date" AND t1.id < t2.id)) WHERE t2.UserId IS NULL; 

Re comment from @Farhan:

Here's a more detailed explanation:

An outer join attempts to join t1 with t2. By default, all results of t1 are returned, and if there is a match in t2, it is also returned. If there is no match in t2 for a given row of t1, then the query still returns the row of t1, and uses NULL as a placeholder for all of t2's columns. That's just how outer joins work in general.

The trick in this query is to design the join's matching condition such that t2 must match the same userid, and a greater date. The idea being if a row exists in t2 that has a greater date, then the row in t1 it's compared against can't be the greatest date for that userid. But if there is no match -- i.e. if no row exists in t2 with a greater date than the row in t1 -- we know that the row in t1 was the row with the greatest date for the given userid.

In those cases (when there's no match), the columns of t2 will be NULL -- even the columns specified in the join condition. So that's why we use WHERE t2.UserId IS NULL, because we're searching for the cases where no row was found with a greater date for the given userid.

Sign up to request clarification or add additional context in comments.

36 Comments

Wow Bill. This is the most creative solution to this problem I've seen. It is pretty performant too on my fairly large data set. This sure beats many of the other solutions I've seen or my own attempts at solving this quandary.
When applied to a table having 8.8 million rows, this query took almost twice as long as that in the accepted answer.
@Derek: Optimizations depend on the brand and version of RDBMS, as well as presence of appropriate indexes, data types, etc.
On MySQL, this kind of query appears to actually cause it to loop over the result of a Cartesian join between the tables, resulting in O(n^2) time. Using the subquery method instead reduced the query time from 2.0s to 0.003s. YMMV.
@frank, because t2.UserId is not null until after the outer join has been evaluated. Please study about outer joins.
|
466

This will retrieve all rows for which the my_date column value is equal to the maximum value of my_date for that userid. This may retrieve multiple rows for the userid where the maximum date is on multiple rows.

select userid, my_date, ... from ( select userid, my_date, ... max(my_date) over (partition by userid) max_my_date from users ) where my_date = max_my_date 

"Analytic functions rock"

Edit: With regard to the first comment ...

"using analytic queries and a self-join defeats the purpose of analytic queries"

There is no self-join in this code. There is instead a predicate placed on the result of the inline view that contains the analytic function -- a very different matter, and completely standard practice.

"The default window in Oracle is from the first row in the partition to the current one"

The windowing clause is only applicable in the presence of the order by clause. With no order by clause, no windowing clause is applied by default and none can be explicitly specified.

The code works.

15 Comments

When applied to a table having 8.8 million rows, this query took half the time of the queries in some the other highly voted answers.
Anyone care to post a link to the MySQL equivalent of this, if there is one?
Couldn't this return duplicates? Eg. if two rows have the same user_id and the same date (which happens to be the max).
@jastr I think that was acknowledged in the question
Instead of MAX(...) OVER (...) you can also use ROW_NUMBER() OVER (...) (for the top-n-per-group) or RANK() OVER (...) (for the greatest-n-per-group).
|
172
SELECT userid, MAX(value) KEEP (DENSE_RANK FIRST ORDER BY date DESC) FROM table GROUP BY userid 

4 Comments

In my tests using a table having a large number of rows, this solution took about twice as long as that in the accepted answer.
I confirm it's much faster than other solutions
trouble is it does not return the full record
@user2067753 No, it doesn't return the full record. You can use the same MAX()..KEEP.. expression on multiple columns, so you can select all the columns you need. But it is inconvenient if you want a large number of columns and would prefer to use SELECT *.
60

I don't know your exact columns names, but it would be something like this:

SELECT userid, value FROM users u1 WHERE date = ( SELECT MAX(date) FROM users u2 WHERE u1.userid = u2.userid ) 

13 Comments

Probably not very efficent, Steve.
You are probably underestimating the Oracle query optimizer.
Not at all. This will almost certainly be implemented as a full scan with a nested loop join to get the dates. You're talking about logical io's in the order of 4 times the number of rows in the table and be dreadful for non-trivial amounts of data.
FYI, "Not efficient, but works" is the same as "Works, but is not efficient". When did we give up on efficient as a design goal?
+1 because when your datatables are not millions of rows in length anwyays, this is the most easily understood solution. when you have multiple developers of all skill levels modifying the code, understandability is more important then a fraction of a second in performance that is unnoticable.
|
48

Not being at work, I don't have Oracle to hand, but I seem to recall that Oracle allows multiple columns to be matched in an IN clause, which should at least avoid the options that use a correlated subquery, which is seldom a good idea.

Something like this, perhaps (can't remember if the column list should be parenthesised or not):

SELECT * FROM MyTable WHERE (User, Date) IN ( SELECT User, MAX(Date) FROM MyTable GROUP BY User) 

EDIT: Just tried it for real:

SQL> create table MyTable (usr char(1), dt date); SQL> insert into mytable values ('A','01-JAN-2009'); SQL> insert into mytable values ('B','01-JAN-2009'); SQL> insert into mytable values ('A', '31-DEC-2008'); SQL> insert into mytable values ('B', '31-DEC-2008'); SQL> select usr, dt from mytable 2 where (usr, dt) in 3 ( select usr, max(dt) from mytable group by usr) 4 / U DT - --------- A 01-JAN-09 B 01-JAN-09 

So it works, although some of the new-fangly stuff mentioned elsewhere may be more performant.

1 Comment

This works nicely on PostgreSQL too. And I like the simplicity and generality of it -- the subquery says "Here's my criteria", the outer query says "And here's the details I want to see". +1.
16

I know you asked for Oracle, but in SQL 2005 we can now use ROW_NUMBER() and RANK() within a partition.

 -- Single Value ;WITH ByDate AS ( SELECT UserId, Value, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) RowNum FROM UserDates ) SELECT UserId, Value FROM ByDate WHERE RowNum = 1 -- Multiple values where dates match ;WITH ByDate AS ( SELECT UserId, Value, RANK() OVER (PARTITION BY UserId ORDER BY Date DESC) Rnk FROM UserDates ) SELECT UserId, Value FROM ByDate WHERE Rnk = 1 

Comments

8

I don't have Oracle to test it, but the most efficient solution is to use analytic queries. It should look something like this:

SELECT DISTINCT UserId , MaxValue FROM ( SELECT UserId , FIRST (Value) Over ( PARTITION BY UserId ORDER BY Date DESC ) MaxValue FROM SomeTable ) 

I suspect that you can get rid of the outer query and put distinct on the inner, but I'm not sure. In the meantime I know this one works.

If you want to learn about analytic queries, I'd suggest reading http://www.orafaq.com/node/55 and http://www.akadia.com/services/ora_analytic_functions.html. Here is the short summary.

Under the hood analytic queries sort the whole dataset, then process it sequentially. As you process it you partition the dataset according to certain criteria, and then for each row looks at some window (defaults to the first value in the partition to the current row - that default is also the most efficient) and can compute values using a number of analytic functions (the list of which is very similar to the aggregate functions).

In this case here is what the inner query does. The whole dataset is sorted by UserId then Date DESC. Then it processes it in one pass. For each row you return the UserId and the first Date seen for that UserId (since dates are sorted DESC, that's the max date). This gives you your answer with duplicated rows. Then the outer DISTINCT squashes duplicates.

This is not a particularly spectacular example of analytic queries. For a much bigger win consider taking a table of financial receipts and calculating for each user and receipt, a running total of what they paid. Analytic queries solve that efficiently. Other solutions are less efficient. Which is why they are part of the 2003 SQL standard. (Unfortunately Postgres doesn't have them yet. Grrr...)

5 Comments

You also need to return the date value to answer the question completely. If that means another first_value clause then I'd suggest that the solution is more complex than it ought to be, and the analytic method based on max(date) reads better.
The question statement says nothing about returning the date. You can do that either by adding another FIRST(Date) or else just by querying the Date and changing the outer query to a GROUP BY. I'd use the first and expect the optimizer to calculate both in one pass.
"The question statement says nothing about returning the date" ... yes, you're right. Sorry. But adding more FIRST_VALUE clauses would become messy pretty quickly. It's a single window sort, but if you had 20 columns to return for that row then you've written a lot of code to wade through.
It also occurs to me that this solution is non-deterministic for data where a single userid has multiple rows that have the maximum date and different VALUEs. More a fault in the question than the answer though.
I agree it is painfully verbose. However isn't that generally the case with SQL? And you're right that the solution is non-deterministic. There are multiple ways to deal with ties, and sometimes each is what you want.
7

Wouldn't a QUALIFY clause be both simplest and best?

select userid, my_date, ... from users qualify rank() over (partition by userid order by my_date desc) = 1 

For context, on Teradata here a decent size test of this runs in 17s with this QUALIFY version and in 23s with the 'inline view'/Aldridge solution #1.

2 Comments

This is the best answer in my opinion. However, be careful with the rank() function in situations where there are ties. You could end up with more than one rank=1. Better to use row_number() if you really do want just one record returned.
Also, be aware that the QUALIFY clause is specific to Teradata. In Oracle (at least) you have to nest your query and filter using a WHERE clause on the wrapping select statement (which probably hits performance a touch, I'd imagine).
7

In Oracle 12c+, you can use Top n queries along with analytic function rank to achieve this very concisely without subqueries:

select * from your_table order by rank() over (partition by user_id order by my_date desc) fetch first 1 row with ties; 

The above returns all the rows with max my_date per user.

If you want only one row with max date, then replace the rank with row_number:

select * from your_table order by row_number() over (partition by user_id order by my_date desc) fetch first 1 row with ties; 

Comments

7

With PostgreSQL 8.4 or later, you can use this:

select user_id, user_value_1, user_value_2 from (select user_id, user_value_1, user_value_2, row_number() over (partition by user_id order by user_date desc) from users) as r where r.row_number=1 

1 Comment

For PostgreSQL we can now use DISTINCT ON which perfectly addresses the use case: stackoverflow.com/questions/586781/…
5

I'm quite late to the party but the following hack will outperform both correlated subqueries and any analytics function but has one restriction: values must convert to strings. So it works for dates, numbers and other strings. The code does not look good but the execution profile is great.

select userid, to_number(substr(max(to_char(date,'yyyymmdd') || to_char(value)), 9)) as value, max(date) as date from users group by userid 

The reason why this code works so well is that it only needs to scan the table once. It does not require any indexes and most importantly it does not need to sort the table, which most analytics functions do. Indexes will help though if you need to filter the result for a single userid.

3 Comments

It is a good execution plan compared to most, but applying all those tricks to more then a few fields would be tedious and may work against it. But very interesting - thanks. see sqlfiddle.com/#!4/2749b5/23
You are right it can become tedious, which is why this should be done only when the performance of the query requires it. Such is often the case with ETL scripts.
this is very nice. did something similar using LISTAGG but looks ugly. postgres has a better altenative using array_agg. see my answer :)
4

Just had to write a "live" example at work :)

This one supports multiple values for UserId on the same date.

Columns: UserId, Value, Date

SELECT DISTINCT UserId, MAX(Date) OVER (PARTITION BY UserId ORDER BY Date DESC), MAX(Values) OVER (PARTITION BY UserId ORDER BY Date DESC) FROM ( SELECT UserId, Date, SUM(Value) As Values FROM <<table_name>> GROUP BY UserId, Date ) 

You can use FIRST_VALUE instead of MAX and look it up in the explain plan. I didn't have the time to play with it.

Of course, if searching through huge tables, it's probably better if you use FULL hints in your query.

Comments

4

If you're using Postgres, you can use array_agg like

SELECT userid,MAX(adate),(array_agg(value ORDER BY adate DESC))[1] as value FROM YOURTABLE GROUP BY userid 

I'm not familiar with Oracle. This is what I came up with

SELECT userid, MAX(adate), SUBSTR( (LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)), 0, INSTR((LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)), ',')-1 ) as value FROM YOURTABLE GROUP BY userid 

Both queries return the same results as the accepted answer. See SQLFiddles:

  1. Accepted answer
  2. My solution with Postgres
  3. My solution with Oracle

1 Comment

Thanks. Nice to know about the array-agg function. Hypothetically, array-agg may not work well for cases where there are too many rows per userid (the group by column) ? And, also when we need multiple select columns in the result; Then , we would need to apply array_agg to every other column, i.e do a group by with adate with every other select column ? Great answer for OP's question though!
4

Use ROW_NUMBER() to assign a unique ranking on descending Date for each UserId, then filter to the first row for each UserId (i.e., ROW_NUMBER = 1).

SELECT UserId, Value, Date FROM (SELECT UserId, Value, Date, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) rn FROM users) u WHERE rn = 1; 

Comments

2

I think something like this. (Forgive me for any syntax mistakes; I'm used to using HQL at this point!)

EDIT: Also misread the question! Corrected the query...

SELECT UserId, Value FROM Users AS user WHERE Date = ( SELECT MAX(Date) FROM Users AS maxtest WHERE maxtest.UserId = user.UserId ) 

2 Comments

Doesn't meet the "for each UserId" condition
Where would it fail? For every UserID in Users, it will be guaranteed that at least one row containing that UserID will be returned. Or am I missing a special case somewhere?
2

i thing you shuold make this variant to previous query:

SELECT UserId, Value FROM Users U1 WHERE Date = ( SELECT MAX(Date) FROM Users where UserId = U1.UserId) 

Comments

2
Select UserID, Value, Date From Table, ( Select UserID, Max(Date) as MDate From Table Group by UserID ) as subQuery Where Table.UserID = subQuery.UserID and Table.Date = subQuery.mDate 

Comments

2
select VALUE from TABLE1 where TIME = (select max(TIME) from TABLE1 where DATE= (select max(DATE) from TABLE1 where CRITERIA=CRITERIA)) 

Comments

1

(T-SQL) First get all the users and their maxdate. Join with the table to find the corresponding values for the users on the maxdates.

create table users (userid int , value int , date datetime) insert into users values (1, 1, '20010101') insert into users values (1, 2, '20020101') insert into users values (2, 1, '20010101') insert into users values (2, 3, '20030101') select T1.userid, T1.value, T1.date from users T1, (select max(date) as maxdate, userid from users group by userid) T2 where T1.userid= T2.userid and T1.date = T2.maxdate 

results:

userid value date ----------- ----------- -------------------------- 2 3 2003-01-01 00:00:00.000 1 2 2002-01-01 00:00:00.000 

Comments

1

The answer here is Oracle only. Here's a bit more sophisticated answer in all SQL:

Who has the best overall homework result (maximum sum of homework points)?

SELECT FIRST, LAST, SUM(POINTS) AS TOTAL FROM STUDENTS S, RESULTS R WHERE S.SID = R.SID AND R.CAT = 'H' GROUP BY S.SID, FIRST, LAST HAVING SUM(POINTS) >= ALL (SELECT SUM (POINTS) FROM RESULTS WHERE CAT = 'H' GROUP BY SID) 

And a more difficult example, which need some explanation, for which I don't have time atm:

Give the book (ISBN and title) that is most popular in 2008, i.e., which is borrowed most often in 2008.

SELECT X.ISBN, X.title, X.loans FROM (SELECT Book.ISBN, Book.title, count(Loan.dateTimeOut) AS loans FROM CatalogEntry Book LEFT JOIN BookOnShelf Copy ON Book.bookId = Copy.bookId LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan ON Copy.copyId = Loan.copyId GROUP BY Book.title) X HAVING loans >= ALL (SELECT count(Loan.dateTimeOut) AS loans FROM CatalogEntry Book LEFT JOIN BookOnShelf Copy ON Book.bookId = Copy.bookId LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan ON Copy.copyId = Loan.copyId GROUP BY Book.title); 

Hope this helps (anyone).. :)

Regards, Guus

1 Comment

The accepted answer is not "Oracle only" - it's standard SQL (supported by many DBMS)
1

Assuming Date is unique for a given UserID, here's some TSQL:

SELECT UserTest.UserID, UserTest.Value FROM UserTest INNER JOIN ( SELECT UserID, MAX(Date) MaxDate FROM UserTest GROUP BY UserID ) Dates ON UserTest.UserID = Dates.UserID AND UserTest.Date = Dates.MaxDate 

Comments

1

Solution for MySQL which doesn't have concepts of partition KEEP, DENSE_RANK.

select userid, my_date, ... from ( select @sno:= case when @pid<>userid then 0 else @sno+1 end as serialnumber, @pid:=userid, my_Date, ... from users order by userid, my_date ) a where a.serialnumber=0 

Reference: http://benincampus.blogspot.com/2013/08/select-rows-which-have-maxmin-value-in.html

2 Comments

This does not work "on other DBs too". This only works on MySQL and possibly on SQL Server because it has a similar concept of variables. It will definitely not work on Oracle, Postgres, DB2, Derby, H2, HSQLDB, Vertica, Greenplum. Additionally the accepted answer is standard ANSI SQL (which by know only MySQL doesn't support)
horse, I guess you are right. I don't have knowledge about other DBs, or ANSI. My solution is able to solve the issue in MySQL, which doesn't have proper support for ANSI SQL to solve it in standard way.
0
select userid, value, date from thetable t1 , ( select t2.userid, max(t2.date) date2 from thetable t2 group by t2.userid ) t3 where t3.userid t1.userid and t3.date2 = t1.date 

IMHO this works. HTH

Comments

0

I think this should work?

Select T1.UserId, (Select Top 1 T2.Value From Table T2 Where T2.UserId = T1.UserId Order By Date Desc) As 'Value' From Table T1 Group By T1.UserId Order By T1.UserId 

Comments

0

First try I misread the question, following the top answer, here is a complete example with correct results:

CREATE TABLE table_name (id int, the_value varchar(2), the_date datetime); INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'a','1/1/2000'); INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'b','2/2/2002'); INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'c','1/1/2000'); INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'d','3/3/2003'); INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'e','3/3/2003'); 

--

 select id, the_value from table_name u1 where the_date = (select max(the_date) from table_name u2 where u1.id = u2.id) 

--

id the_value ----------- --------- 2 d 2 e 1 b (3 row(s) affected) 

Comments

0

This will also take care of duplicates (return one row for each user_id):

SELECT * FROM ( SELECT u.*, FIRST_VALUE(u.rowid) OVER(PARTITION BY u.user_id ORDER BY u.date DESC) AS last_rowid FROM users u ) u2 WHERE u2.rowid = u2.last_rowid 

Comments

0

Just tested this and it seems to work on a logging table

select ColumnNames, max(DateColumn) from log group by ColumnNames order by 1 desc 

Comments

0

This should be as simple as:

SELECT UserId, Value FROM Users u WHERE Date = (SELECT MAX(Date) FROM Users WHERE UserID = u.UserID) 

Comments

0

Oracle Database 23ai added a new way to solve this - adding the partition byclause tofetch first:

create table t ( userid int, value int, dt date ); insert into t values ( 1, 1, date'2024-01-01' ), ( 1, 2, date'2024-02-02' ), ( 1, 3, date'2024-03-03' ), ( 2, 1, date'2024-01-01' ), ( 2, 2, date'2024-02-02' ), ( 2, 3, date'2024-03-03' ); select * from t order by userid, dt desc fetch first 9999999 partition by userid, 1 row only; USERID VALUE DT ---------- ---------- ----------- 1 3 03-MAR-2024 2 3 03-MAR-2024 

Note you have to specify a number before partition by; this states how many userids you want to fetch. Assuming you want them all, set this to a value (much) larger than the number of userids in the table.

Comments

-1

If (UserID, Date) is unique, i.e. no date appears twice for the same user then:

select TheTable.UserID, TheTable.Value from TheTable inner join (select UserID, max([Date]) MaxDate from TheTable group by UserID) UserMaxDate on TheTable.UserID = UserMaxDate.UserID TheTable.[Date] = UserMaxDate.MaxDate; 

1 Comment

I believe that you need to join by the UserID as well

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.