1

I'm in my absolute infancy in learning PostgreSQL (I'm working my way through the postgresqltutorial SELECT tutorial right now) and saw DISTINCT ON mentioned. Based on my attempted research, this is actually a very complicated & powerful keyword that's a bit beyond where I'm at now, but I wanted a basic understanding.

From toying around in psql, it seems like it's a way to filter data based on values in one column regardless of whether or not the data in other columns also match. E.g. if you had a table of people and their grades, you could filter so that you'd only have one person per grade without also needing them to have the same name, etc.

Am I on the right track?

4
  • This SO question likely comes very close to explaining your doubts. You should review a good tutorial on Postgres and DISTINCT ON to better understand the basics of it. Commented Jul 9, 2021 at 5:51
  • @TimBiegeleisen Thanks! I read over this exact question a little bit ago and didn't understand most of it, so I guess I was right in thinking it's a bit more advanced a concept than where I'm at right now. I'll revisit it once I've progressed a bit to see if it makes more sense. Commented Jul 9, 2021 at 5:54
  • Explanation in the documentation is pretty straightforward. Commented Jul 9, 2021 at 8:12
  • Most questions around DISTINCT ON are answered here: stackoverflow.com/a/7630564/939860 Commented Jul 9, 2021 at 13:15

1 Answer 1

1

DISTINCT ON is a strange beast and might be poorly understood without examples. My answer heavily borrows from this tutorial.

Create table:

CREATE TABLE table1( id serial, fruit_1 TEXT, fruit_2 TEXT ) 

Insert data:

INSERT INTO table1 (fruit_1, fruit_2) VALUES ('apple', 'apple'), ('apple', 'apple'), ('apple', NULL), (NULL, 'apple'), ('apple', 'mango'), ('apple', 'blueberry'), ('mango', 'apple'), ('mango', 'blueberry'), ('mango', 'mango'), ('blueberry', 'apple'), ('blueberry', 'mango'), ('blueberry', 'blueberry') 

SELECT * from table1 gives

id fruit_1 fruit_2 1 apple apple 2 apple apple 3 apple <NULL> 4 <NULL> apple 5 apple mango 6 apple blueberry 7 mango apple 8 mango blueberry 9 mango mango 10 blueberry apple 11 blueberry mango 12 blueberry blueberry 

DISTINCT ON keeps only the first row from the chosen column. However, the first row is unpredictable, so you'll want to use ORDER BY to sort (and get a predictable first row):

SELECT DISTINCT ON (fruit_1) id, fruit_1, fruit_2 FROM table1 ORDER BY fruit_1, fruit_2 id fruit_1 fruit_2 2 apple apple 10 blueberry apple 7 mango apple 4 <NULL> apple 

You can also use a column number in the parentheses:

SELECT DISTINCT ON (2) id, fruit_1, fruit_2 FROM table1 ORDER BY fruit_1, fruit_2 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.