Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The non-aggregating expression <expression> is based on columns which are not participating in the GROUP BY clause.
Add the columns or the expression to the GROUP BY, aggregate the expression, or use <expressionAnyValue> if you do not care which of the values within a group is returned.
Parameters
- expression: Non aggregating, non grouping expression in the
SELECTlist. - expressionAnyValue:
expressionwrapped in an any_value() aggregate function.
Explanation
Within the context of a query with a GROUP BY clause, the local column-references in the SELECT list must be:
- Consumed as an argument to an aggregate function, or
- Part of an expression which matches an expression on the
GROUP BYclause.
A local column reference is a column that has been resolved to a table-reference in the query's FROM clause.
In other words: Column-references must either be part of the grouping keys, or they must be part of the aggregation.
Azure Databricks matches expressions on best effort: For example it will recognize: SELECT c1 + 5 FROM T GROUP BY 5 + c1 as mathing expressions. But SELECT c1 FROM T GROUP BY c1 + 5 is not a match.
Mitigation
The mitigation of the error depends on the cause:
Did you miss a grouping column?
Add
expression, or the relevant subexpression ofexpressionto theGROUP BYclause.Is the column reference part of a
GROUP BYexpression which differs fromepression?Match the expression in the
SELECTlist or simplify theGROUP BYexpression.Are you missing the aggregation?
Wrap the column reference with an aggregate function. If you only want a representative value from the group, you can use any_value(epression).
Examples
-- Sample data > CREATE OR REPLACE TEMPORARY VIEW tasks(name, firstname, task, cost) AS VALUES ('Smith' , 'Sam' , 'UNPIVOT', 10), ('Smith' , 'Sam' , 'LATERAL', 5), ('Shuster', 'Sally' , 'DELETE' , 7), ('Shuster', 'Sally' , 'GRANT' , 8); -- `name` and `firstname` are part of the group by coumns, but incomplete > SELECT name, firstname, sum(cost) FROM tasks GROUP BY firstname || ' ' || name; [MISSING_AGGREGATION] The expression "name" is neither present in the group by, nor is it an aggregate function. -- Match the GROUP BY expression > SELECT firstname || ' ' || name, sum(cost) FROM tasks GROUP BY firstname || ' ' || name; Sam Smith 15 Sally Shuster 15 -- Break up the GROUP BY expression > SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name; Sam Smith 15 Sally Shuster 15 -- Missing grouping column > SELECT name, firstname, sum(cost) FROM tasks GROUP BY name; [MISSING_AGGREGATION] The expression "firstname" is neither present in the group by, nor is it an aggregate function. -- Add the grouping column > SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name; Sam Smith 15 Sally Shuster 15 -- Missing aggregate > SELECT firstname, name, sum(cost), task FROM tasks GROUP BY firstname, name; [MISSING_AGGREGATION] The expression "task" is neither present in the group by, nor is it an aggregate function. -- Add an aggregate > SELECT firstname, name, sum(cost), array_agg(task) FROM tasks GROUP BY firstname, name; Sam Smith 15 ["UNPIVOT","LATERAL"] Sally Shuster 15 ["DELETE","GRANT"] -- Return any task > SELECT firstname, name, sum(cost), any_value(task) FROM tasks GROUP BY firstname, name; Sam Smith 15 LATERAL Sally Shuster 15 DELETE