Scope of a column alias in a SELECT with GROUP BY

Question

I have been asked to solve this question:

Say that a big city is defined as a place of type city with a population of at least 100,000. Write an SQL query that returns the scheme (state_name,no_big_city,big_city_population) ordered by state_name, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The column state_name is the name of the state, no_big_city is the number of big cities in the state, and big_city_population is the number of people living in big cities in the state.

Now, as far as I can see, the following query returns correct results:

SELECT state.name AS state_name , COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city , SUM (CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population FROM state JOIN place ON state.code = place.state_code GROUP BY state_name HAVING COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR SUM (CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000 ORDER BY state_name;

The lengthy aggregate function expressions appear twice.
Is there any way to avoid code duplication while preserving functionality?

I have already tried using the alias, but I just get a "column does not exist" error.

Unfortunately I tried this and I just get an error saying that the first column referenced, no_big_city, does not exist... — Paul Benn
– Paul Benn, Commented Nov 10, 2016 at 21:06
@GordonLinoff: The HAVING clause only accepts input column names. I added the relevant quote from the manual below. — Erwin Brandstetter
– Erwin Brandstetter, Commented Nov 10, 2016 at 23:14

Erwin Brandstetter · Accepted Answer · 2025-02-24 22:29:17Z

The manual clarifies:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

Bold emphasis mine.

You can avoid spelling out lengthy expressions repeatedly with a subquery or CTE:

SELECT state_name, no_big_city, big_city_population FROM ( SELECT s.name AS state_name , count(*) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city , sum(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population FROM state s JOIN place p ON s.code = p.state_code GROUP BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity ) sub WHERE no_big_city >= 5 OR big_city_population >= 1000000 ORDER BY state_name;

While being at it, I simplified with the aggregate FILTER clause. See:

Aggregate columns with additional (distinct) filters

However, I suggest this simpler and faster query to begin with:

SELECT s.state_name, p.no_big_city, p.big_city_population FROM state s JOIN ( SELECT state_code AS code -- alias just to simplify join , count(*) AS no_big_city , sum(population) AS big_city_population FROM place WHERE type = 'city' AND population >= 100000 GROUP BY 1 -- can be ordinal number referencing position in SELECT list HAVING count(*) >= 5 OR sum(population) >= 1000000 -- simple expressions now ) p USING (code) ORDER BY 1; -- can also be ordinal number

In GROUP BY and ORDER BY you can also reference output columns in the SELECT list by ordinal position. Can be useful to avoid confusion about the scope of input and output column names.

Aron · Accepted Answer · 2016-11-10 21:27:19Z

Not sure if this is a comment or an answer, since it is more preference based as opposed to technical, but I'll post it anyway

What I usually do when I need to reference calculated columns (usually a LOT at the same time) is I put my calculated columns within a derived table and then reference the calculated columns using its alias outside of the derived table. This syntax should be ANSI-SQL correct, but I am not familiar with PostGRES

select * from ( SELECT STATE.NAME AS state_name ,COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city ,SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population FROM STATE INNER JOIN place ON STATE.code = place.state_code GROUP BY state_name ) sub where no_big_city >= 5 and big_city_population >=100000 --HAVING COUNT(CASE WHEN place.type = 'city' -- AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 -- OR SUM(CASE WHEN place.type = 'city' -- AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000 ORDER BY state_name;

The nice thing about this approach is, although you are adding complication via a subquery/derived table, the formula is kept in one place, so any changes only have to happen once. I do not know if this will perform worse than simply repeating the calcuation in the group-by, but I can't imagine it would be that much worse.

Vadim · Accepted Answer · 2016-11-10 21:49:04Z

SELECT clause is what you want to select from the filtred by WHERE clause table(s).
GROUP BY is a condition how to group filtered records to use in aggregation functions in the SELECT. So alias cannot be there. But you can wrap your filtered records and select from them. Something like that:

SELECT state_name, no_big_city, big_city_population FROM ( SELECT state.name AS state_name, COUNT(1) no_big_city, MAX(place.population) max_city_population, SUM(place.population) AS big_city_population FROM state JOIN place ON state.code = place.state_code WHERE place.type = 'city' AND place.population >= 100000 GROUP BY state.name ) WHERE no_big_city >= 5 OR max_city_population > 1000000 ORDER BY state_name

Also, moving conditions

 place.type = 'city' AND place.population >= 100000

out of CASE to WHERE will perform better. "No city" or "small city records will not be processed. especially if there is an index on place.type column.

Collectives™ on Stack Overflow

Scope of a column alias in a SELECT with GROUP BY

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related