285

How do I create crosstab queries in PostgreSQL? For example I have the following table:

Section Status Count A Active 1 A Inactive 2 B Active 4 B Inactive 5 

I would like the query to return the following crosstab:

Section Active Inactive A 1 2 B 4 5 
1
  • 1
    I had a slightly different structure and found this example a little hard to understand so I documented my way of thinking of this stackoverflow.com/q/49051959/808723. Maybe its helpful for anyone. Commented Mar 1, 2018 at 15:14

7 Answers 7

462

Install the additional module tablefunc once per database, which provides the function crosstab(). Since Postgres 9.1 you can use CREATE EXTENSION for that:

CREATE EXTENSION IF NOT EXISTS tablefunc; 

Improved test case

CREATE TABLE tbl ( section text , status text , ct integer -- "count" is a reserved word in standard SQL ); INSERT INTO tbl VALUES ('A', 'Active', 1), ('A', 'Inactive', 2) , ('B', 'Active', 4), ('B', 'Inactive', 5) , ('C', 'Inactive', 7); -- ('C', 'Active') is missing 

Simple form - not fit for missing attributes

crosstab(text) with 1 input parameter:

SELECT * FROM crosstab( 'SELECT section, status, ct FROM tbl ORDER BY 1,2' -- needs to be "ORDER BY 1,2" here ) AS ct ("Section" text, "Active" int, "Inactive" int); 

Returns:

 Section | Active | Inactive ---------+--------+---------- A | 1 | 2 B | 4 | 5 C | 7 | -- !! 
  • No need for casting and renaming.
  • Note the incorrect result for C: the value 7 is filled in for the first column. Sometimes, this behavior is desirable, but not for this use case.
  • The simple form is also limited to exactly three columns in the provided input query: row_name, category, value. There is no room for extra columns like in the 2-parameter alternative below.

Safe form

crosstab(text, text) with 2 input parameters:

SELECT * FROM crosstab( 'SELECT section, status, ct FROM tbl ORDER BY 1,2' -- could also just be "ORDER BY 1" here , $$VALUES ('Active'::text), ('Inactive')$$ ) AS ct ("Section" text, "Active" int, "Inactive" int);

Returns:

 Section | Active | Inactive ---------+--------+---------- A | 1 | 2 B | 4 | 5 C | | 7 -- !! 
  • Note the correct result for C.

  • The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end. Often you will want to query distinct attributes from the underlying table like this:

     'SELECT DISTINCT attribute FROM tbl ORDER BY 1' 

That's in the manual.

Since you have to spell out all columns in a column definition list anyway (except for pre-defined crosstabN() variants), it is typically more efficient to provide a short list in a VALUES expression like demonstrated:

 $$VALUES ('Active'::text), ('Inactive')$$ 

Or (not in the manual):

 $$SELECT unnest('{Active,Inactive}'::text[])$$ -- short syntax for long lists 
  • I used dollar quoting to make quoting easier.

  • You can even output columns with different data types with crosstab(text, text) - as long as the text representation of the value column is valid input for the target type. This way you might have attributes of different kind and output text, date, numeric etc. for respective attributes. There is a code example at the end of the chapter crosstab(text, text) in the manual.

db<>fiddle here

Effect of excess input rows

Excess input rows are handled differently - duplicate rows for the same ("row_name", "category") combination - (section, status) in the above example.

The 1-parameter form fills in available value columns from left to right. Excess values are discarded.
Earlier input rows win.

The 2-parameter form assigns each input value to its dedicated column, overwriting any previous assignment.
Later input rows win.

Typically, you don't have duplicates to begin with. But if you do, carefully adjust the sort order to your requirements - and document what's happening.
Or get fast arbitrary results if you don't care. Just be aware of the effect.

Advanced examples


\crosstabview in psql

Postgres 9.6 added this meta-command to its default interactive terminal psql. You can run the query you would use as first crosstab() parameter and feed it to \crosstabview (immediately or in the next step). Like:

db=> SELECT section, status, ct FROM tbl \crosstabview 

Similar result as above, but it's a representation feature on the client side exclusively. Input rows are treated slightly differently, hence ORDER BY is not required. Details for \crosstabview in the manual. There are more code examples at the bottom of that page.

Related answer on dba.SE by Daniel Vérité (the author of the psql feature):

Sign up to request clarification or add additional context in comments.

8 Comments

+1, good writeup, thanks for noticing In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered
I've some problems using $$VALUES .. $$. I've used instead 'VALUES (''<attr>'':: <type>), .. '
@ErwinBrandstetter This is another example of you explaining complicated things in an extremely competent, thoughtful and easy to grasp way. If there were a nobel price for helping on stack overflow you should get it
@AndreSilva: All rows of the same set must have the same column types. It's enough to cast explicitly in one row, and the rest will fall in line. Related: stackoverflow.com/a/30204394/939860; stackoverflow.com/a/12427434/939860
Thanks so much for pointing out that $$ quoting can be used to keep the inner sql 'independently executable' with tools like dbeaver (by selecting the inner sql text only); not to mention keeping any colour coding the editor provides for sql.
|
43
SELECT section, SUM(CASE status WHEN 'Active' THEN count ELSE 0 END) AS active, --here you pivot each status value as a separate column explicitly SUM(CASE status WHEN 'Inactive' THEN count ELSE 0 END) AS inactive --here you pivot each status value as a separate column explicitly FROM t GROUP BY section 

4 Comments

Can someone explain what the crosstab function in the tablefunc module adds to this answer, which both does the job at hand, and to my mind is easier to understand?
@JohnBarça: A simple case like this can easily be solved with CASE statements. However, this gets unwieldy very quickly with more attributes and / or other data types than just integers. As an aside: this form uses the aggregate function sum(), it would be better to use min() or max() and no ELSE which works for text also. But this has subtly different effects than corosstab(), which only uses the "first" value per attribute. Doesn't matter as long as there can only be one. Finally, performance is relevant, too. crosstab() is written in C and optimized for the task.
Consider adding explaination vs just a block of code
In my postgresql for some reason tablefunc and crosstab are not defined, and I am not permitted to define them. This intuitive solution worked for me, so thums up!
35

You can use the crosstab() function of the additional module tablefunc - which you have to install once per database. Since PostgreSQL 9.1 you can use CREATE EXTENSION for that:

CREATE EXTENSION tablefunc; 

In your case, I believe it would look something like this:

CREATE TABLE t (Section CHAR(1), Status VARCHAR(10), Count integer); INSERT INTO t VALUES ('A', 'Active', 1); INSERT INTO t VALUES ('A', 'Inactive', 2); INSERT INTO t VALUES ('B', 'Active', 4); INSERT INTO t VALUES ('B', 'Inactive', 5); SELECT row_name AS Section, category_1::integer AS Active, category_2::integer AS Inactive FROM crosstab('select section::text, status, count::text from t',2) AS ct (row_name text, category_1 text, category_2 text); 

DB Fiddle here:

  • Everything works: https://dbfiddle.uk/iKCW9Uhh
  • Without CREATE EXTENSION tablefunc; you get this error: https://dbfiddle.uk/j8W1CMvI
    ERROR: function crosstab(unknown, integer) does not exist LINE 4: FROM crosstab('select section::text, status, count::text fro... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. 

1 Comment

In case you use a parameter in the crosstab query, you have to escape it properly. Example: (from above) say you want only the active ones: SELECT ... FROM crosstab('select section::text, status, count::text from t where status=''active''', 2) AS ... (notice the double quotes). In case the parameter is passed at runtime by the user (as a function parameter for example) you can say: SELECT ... FROM crosstab('select section::text, status, count::text from t where status=''' || par_active || '''', 2) AS ... (triple quotes here!). In BIRT this also works with the ? placeholder.
15

Solution with JSON aggregation:

CREATE TEMP TABLE t ( section text , status text , ct integer -- don't use "count" as column name. ); INSERT INTO t VALUES ('A', 'Active', 1), ('A', 'Inactive', 2) , ('B', 'Active', 4), ('B', 'Inactive', 5) , ('C', 'Inactive', 7); SELECT section, (obj ->> 'Active')::int AS active, (obj ->> 'Inactive')::int AS inactive FROM (SELECT section, json_object_agg(status,ct) AS obj FROM t GROUP BY section )X 

Comments

2

Sorry this isn't complete because I can't test it here, but it may get you off in the right direction. I'm translating from something I use that makes a similar query:

select mt.section, mt1.count as Active, mt2.count as Inactive from mytable mt left join (select section, count from mytable where status='Active')mt1 on mt.section = mt1.section left join (select section, count from mytable where status='Inactive')mt2 on mt.section = mt2.section group by mt.section, mt1.count, mt2.count order by mt.section asc; 

The code I'm working from is:

select m.typeID, m1.highBid, m2.lowAsk, m1.highBid - m2.lowAsk as diff, 100*(m1.highBid - m2.lowAsk)/m2.lowAsk as diffPercent from mktTrades m left join (select typeID,MAX(price) as highBid from mktTrades where bid=1 group by typeID)m1 on m.typeID = m1.typeID left join (select typeID,MIN(price) as lowAsk from mktTrades where bid=0 group by typeID)m2 on m1.typeID = m2.typeID group by m.typeID, m1.highBid, m2.lowAsk order by diffPercent desc; 

which will return a typeID, the highest price bid and the lowest price asked and the difference between the two (a positive difference would mean something could be bought for less than it can be sold).

1 Comment

You're missing a from clause, otherwise this is correct. The explain plans are wildly different on my system - the crosstab function has a cost of 22.5 while the LEFT JOIN approach is about 4 times as expensive with a cost of 91.38. It also produces about twice as many physical reads and performs hash joins - which can be quite expensive compared to other join types.
0

There's a different dynamic method that I've devised, one that employs a dynamic rec. type (a temp table, built via an anonymous procedure) & JSON. This may be useful for an end-user who can't install the tablefunc/crosstab extension, but can still create temp tables or run anon. proc's.

The example assumes all the xtab columns are the same type (INTEGER), but the # of columns is data-driven & variadic. That said, JSON aggregate functions do allow for mixed data types, so there's potential for innovation via the use of embedded composite (mixed) types.

The real meat of it can be reduced down to one step if you want to statically define the rec. type inside the JSON recordset function (via nested SELECTs that emit a composite type).

dbfiddle.uk

https://dbfiddle.uk/N1EzugHk

Comments

0

Crosstab function is available under the tablefunc extension. You'll have to create this extension one time for the database.

CREATE EXTENSION tablefunc;

You can use the below code to create pivot table using cross tab:

create table test_Crosstab( section text, status text, count numeric) insert into test_Crosstab values ( 'A','Active',1) ,( 'A','Inactive',2) ,( 'B','Active',4) ,( 'B','Inactive',5) select * from crosstab( 'select section ,status ,count from test_crosstab' )as ctab ("Section" text,"Active" numeric,"Inactive" numeric) 

1 Comment

This answer adds nothing over pre-existing answers.