Revisions to Pivot time series into date period columns with array aggregate cells

added 404 characters in body

Source Link

edited Feb 4, 2017 at 13:14

joanolo

13.7k
8
39
67

Intermediate "success_summary" table

Intermediate "all_success_summary" table

Final PIVOT table

the resultresulting table is:

NOTE 1: In order to know which isIt doesn't contain the appropriate column definitionaggregate vectors or any other representation, required when usingbut the CrossTab function,formatted result you can use the following query:could directly import to a SpreadSheet.

NOTES

1: In order to know which is the appropriate column definition, required when using the CrossTab function, you can use the following query:

NOTE 22: I've chosen "date_period" to be just one day (and, in some places, formatted the result for ease of display). All the same can be achieved by using something such as date_trunc('week', date_time) AS date_period, to summarize by weeks instead of days, instead of the definition I used. This can be generalized to any type of grouping.

3: If you still want to have your arrays, there's a hint on the definition of success_summary on where you'd start getting them.

4: The intermediate tables can be skipped alltogether (by redundantly putting their definition where there name appears). They can also be hidden within a user defined function, that would drop them after use. You cannot avoid them by means of a CTE, because crosstab won't "understand" the virtual tables created by the WITH statement. Anyhow, as you normally need also the column_definition... the temporary tables come in handy.

5: There are other variations, instead of using crosstab you could also get all the rows from all_success_summary in JSON format, and have this information post-processed. It all depends on your specific use-case (but I've seen a screenshot of an Excel... and I moved in the closest direction ;-). I must say that Excel can PIVOT itself all the data from just the "success_summary" data (and probably also from the original one).

the result is:

NOTE 1: In order to know which is the appropriate column definition, required when using the CrossTab function, you can use the following query:

NOTE 2: I've chosen "date_period" to be just one day (and, in some places, formatted the result for ease of display). All the same can be achieved by using something such as date_trunc('week', date_time) AS date_period, to summarize by weeks instead of days, instead of the definition I used. This can be generalized to any type of grouping.

Intermediate "success_summary" table

Intermediate "all_success_summary" table

Final PIVOT table

the resulting table is:

It doesn't contain the aggregate vectors or any other representation, but the formatted result you could directly import to a SpreadSheet.

NOTES

1: In order to know which is the appropriate column definition, required when using the CrossTab function, you can use the following query:

2: I've chosen "date_period" to be just one day (and, in some places, formatted the result for ease of display). All the same can be achieved by using something such as date_trunc('week', date_time) AS date_period, to summarize by weeks instead of days, instead of the definition I used. This can be generalized to any type of grouping.

3: If you still want to have your arrays, there's a hint on the definition of success_summary on where you'd start getting them.

4: The intermediate tables can be skipped alltogether (by redundantly putting their definition where there name appears). They can also be hidden within a user defined function, that would drop them after use. You cannot avoid them by means of a CTE, because crosstab won't "understand" the virtual tables created by the WITH statement. Anyhow, as you normally need also the column_definition... the temporary tables come in handy.

5: There are other variations, instead of using crosstab you could also get all the rows from all_success_summary in JSON format, and have this information post-processed. It all depends on your specific use-case (but I've seen a screenshot of an Excel... and I moved in the closest direction ;-). I must say that Excel can PIVOT itself all the data from just the "success_summary" data (and probably also from the original one).

added 404 characters in body

Source Link

edited Feb 4, 2017 at 12:59

joanolo

13.7k
8
39
67

CREATE TEMPORARY TABLE success_summary AS SELECT date_period, entity, /* If needed, following line gives approx. your aggs. */ /* array[count_successes + count_failures, count_successes, count_failures] AS summary, */ /* Next computation renders the percentage of success as etxt */ to_char( count_successes::double precision*100.0 / (count_successes + count_failures), '990.00%') AS pct_text FROM ( SELECT /* date_trunc('day', date_time) AS date_period */ date_time::date AS date_period, entity, /* We count successes and failures. We profit from the facxt that CASE always has an implicit ELSE NULL */ count(caseCASE whenWHEN  result thenTHEN 1 endEND) AS count_successes, count(caseCASE whenWHEN not result thenTHEN 1 endEND) AS count_failures FROM t GROUP BY date_period, entity ) AS q1 ;

The table contains at this point:

Now, in order to be able to (easily) crosstab we need to "fill in" all the missing values, so that everything is filled in our rectangular matrix. This means, for instance, so that there is a row with (2016-01-01, 'a', *something*) values, which is actually not present in the previous table. (I'veNOTE: I've chosen NULL to be the something, but you could use a text such as 'N/A' by using a coalesce(pct_text, 'N/A') instead of pct_text).

We do so with an otherby using yet another intermediate table, making a cartesian product of (date_periods) x (entities)(date_periods) x (entities):

CREATE TEMPORARY TABLE all_success_summary   AS SELECT date_period, entity, pct_text FROM ( -- Cross join to have all (date_period, entity) possible pairs (SELECT DISTINCT date_period FROM success_summary) AS q00 CROSS JOIN (SELECT DISTINCT entity FROM success_summary) AS q01 ) AS q0 -- Left join with original data to retrieve actual pct_text -- where it exists (it will be NULL, otherwise) LEFT JOIN success_summary USING(date_period, entity) ;

At this point, we can use the first version of crosstab to get all the data PIVOTed:

SELECT * FROM crosstab( 'SELECT entity, date_period, pct_text FROM all_success_summary ORDER BY entity, date_period') AS ct (entity text, "2016-01-01" text, "2016-01-02" text, "2016-01-03" text) ;   | entity | 2016-01-01 | 2016-01-02 | 2016-01-03 | |--------|------------|------------|------------| | a | 100.00% | 100.00% | | | b | 0.00% | 0.00% | | | c | 100.00% | | | | d | 100.00% | | | | e | | | 75.00% |

the result is:

| entity | 2016-01-01 | 2016-01-02 | 2016-01-03 | |--------|------------|------------|------------| | a | 100.00% | 100.00% | | | b | 0.00% | 0.00% | | | c | 100.00% | | | | d | 100.00% | | | | e | | | 75.00% |

NOTEsNOTE 1: In order to know which is the appropriate column definition, required when using the CrossTab function, you can use the following query:

NOTE 2: I've chosen "date_period""date_period" to be just one day (and, in some places, formatted the result for ease of display). All the same can be achieved by using something such as date_trunc('week', date_time) AS date_period, to summarize by weeks instead of days, instead of the definition I used. This can be generalized to any type of grouping.

CREATE TEMPORARY TABLE success_summary AS SELECT date_period, entity, /* If needed, following line gives approx. your aggs. */ /* array[count_successes + count_failures, count_successes, count_failures] AS summary, */ /* Next computation renders the percentage of success as etxt */ to_char( count_successes::double precision*100.0 / (count_successes + count_failures), '990.00%') AS pct_text FROM ( SELECT /* date_trunc('day', date_time) AS date_period */ date_time::date AS date_period, entity, count(case when result then 1 end) AS count_successes, count(case when not result then 1 end) AS count_failures FROM t GROUP BY date_period, entity ) AS q1 ;

The table contains:

Now, in order to be able to (easily) crosstab we need to "fill in" all the missing values, so that everything is filled in our rectangular matrix. This means, for instance, so that there is a row with (2016-01-01, 'a', *something*) values (I've chosen NULL to be the something).

We do so with an other intermediate table, making a cartesian product of (date_periods) x (entities):

CREATE TEMPORARY TABLE all_success_summary   AS SELECT date_period, entity, pct_text FROM ( -- Cross join to have all (date_period, entity) possible pairs (SELECT DISTINCT date_period FROM success_summary) AS q00 CROSS JOIN (SELECT DISTINCT entity FROM success_summary) AS q01 ) AS q0 -- Left join with original data to retrieve actual pct_text -- where it exists (it will be NULL, otherwise) LEFT JOIN success_summary USING(date_period, entity) ;

At this point, we can use the first version of crosstab:

SELECT * FROM crosstab( 'SELECT entity, date_period, pct_text FROM all_success_summary ORDER BY entity, date_period') AS ct (entity text, "2016-01-01" text, "2016-01-02" text, "2016-01-03" text) ;   | entity | 2016-01-01 | 2016-01-02 | 2016-01-03 | |--------|------------|------------|------------| | a | 100.00% | 100.00% | | | b | 0.00% | 0.00% | | | c | 100.00% | | | | d | 100.00% | | | | e | | | 75.00% |

NOTEs: In order to know which is the appropriate column definition, you can use the following query:

I've chosen "date_period" to be just one day (and, in some places, formatted the result for ease of display). All the same can be achieved by using something such as date_trunc('week', date_time) AS date_period, to summarize by weeks instead of days, instead of the definition I used. This can be generalized to any type of grouping.

CREATE TEMPORARY TABLE success_summary AS SELECT date_period, entity, /* If needed, following line gives approx. your aggs. */ /* array[count_successes + count_failures, count_successes, count_failures] AS summary, */ /* Next computation renders the percentage of success as etxt */ to_char( count_successes::double precision*100.0 / (count_successes + count_failures), '990.00%') AS pct_text FROM ( SELECT /* date_trunc('day', date_time) AS date_period */ date_time::date AS date_period, entity, /* We count successes and failures. We profit from the facxt that CASE always has an implicit ELSE NULL */ count(CASE WHEN  result THEN 1 END) AS count_successes, count(CASE WHEN not result THEN 1 END) AS count_failures FROM t GROUP BY date_period, entity ) AS q1 ;

The table contains at this point:

Now, in order to be able to (easily) crosstab we need to "fill in" all the missing values, so that everything is filled in our rectangular matrix. This means, for instance, that there is a row with (2016-01-01, 'a', *something*) values, which is actually not present in the previous table. (NOTE: I've chosen NULL to be the something, but you could use a text such as 'N/A' by using a coalesce(pct_text, 'N/A') instead of pct_text).

We do so by using yet another intermediate table, making a cartesian product of (date_periods) x (entities):

CREATE TEMPORARY TABLE all_success_summary AS SELECT date_period, entity, pct_text FROM ( -- Cross join to have all (date_period, entity) possible pairs (SELECT DISTINCT date_period FROM success_summary) AS q00 CROSS JOIN (SELECT DISTINCT entity FROM success_summary) AS q01 ) AS q0 -- Left join with original data to retrieve actual pct_text -- where it exists (it will be NULL, otherwise) LEFT JOIN success_summary USING(date_period, entity) ;

At this point, we can use the first version of crosstab to get all the data PIVOTed:

SELECT * FROM crosstab( 'SELECT entity, date_period, pct_text FROM all_success_summary ORDER BY entity, date_period') AS ct (entity text, "2016-01-01" text, "2016-01-02" text, "2016-01-03" text) ;

the result is:

| entity | 2016-01-01 | 2016-01-02 | 2016-01-03 | |--------|------------|------------|------------| | a | 100.00% | 100.00% | | | b | 0.00% | 0.00% | | | c | 100.00% | | | | d | 100.00% | | | | e | | | 75.00% |

NOTE 1: In order to know which is the appropriate column definition, required when using the CrossTab function, you can use the following query:

NOTE 2: I've chosen "date_period" to be just one day (and, in some places, formatted the result for ease of display). All the same can be achieved by using something such as date_trunc('week', date_time) AS date_period, to summarize by weeks instead of days, instead of the definition I used. This can be generalized to any type of grouping.

Source Link

answered Feb 4, 2017 at 12:52

joanolo

13.7k
8
39
67

A slight variation of your request, with equivalent response (I think).

First, let's assume this is our starting data:

CREATE TABLE t ( date_time timestamp NOT NULL, entity text NOT NULL, result boolean NOT NULL, /* true means 'success', false 'fail' */ PRIMARY KEY(date_time, entity, result) ) ; INSERT INTO t (date_time, entity, result) VALUES ('2016-01-01 11:00', 'a', true), ('2016-01-01 17:00', 'a', true), -- two events for a on same day ('2016-01-01 11:01', 'b', false), ('2016-01-01 11:03', 'c', true), ('2016-01-01 13:00', 'd', true), -- only one event for d ('2016-01-02 11:00', 'a', true), ('2016-01-02 11:01', 'b', false), ('2016-01-03 11:03', 'e', true), -- 75% success 'e' on day 3 ('2016-01-03 11:04', 'e', true), ('2016-01-03 11:05', 'e', true), ('2016-01-03 11:06', 'e', false) ;

We create a (temporary intermediate) table wich I'll call "success_summary" which has all the different success rates for all the different entities and days (where they actually happen):

CREATE TEMPORARY TABLE success_summary AS SELECT date_period, entity, /* If needed, following line gives approx. your aggs. */ /* array[count_successes + count_failures, count_successes, count_failures] AS summary, */ /* Next computation renders the percentage of success as etxt */ to_char( count_successes::double precision*100.0 / (count_successes + count_failures), '990.00%') AS pct_text FROM ( SELECT /* date_trunc('day', date_time) AS date_period */ date_time::date AS date_period, entity, count(case when result then 1 end) AS count_successes, count(case when not result then 1 end) AS count_failures FROM t GROUP BY date_period, entity ) AS q1 ;

The table contains:

SELECT to_char(date_period, 'yyyy-mm-dd') AS date_period, entity, pct_text FROM success_summary ORDER BY entity, date_period; | date_period | entity | pct_text | |-------------|--------|----------| | 2016-01-01 | a | 100.00% | | 2016-01-02 | a | 100.00% | | 2016-01-01 | b | 0.00% | | 2016-01-02 | b | 0.00% | | 2016-01-01 | c | 100.00% | | 2016-01-01 | d | 100.00% | | 2016-01-03 | e | 75.00% |

Now, in order to be able to (easily) crosstab we need to "fill in" all the missing values, so that everything is filled in our rectangular matrix. This means, for instance, so that there is a row with (2016-01-01, 'a', *something*) values (I've chosen NULL to be the something).

We do so with an other intermediate table, making a cartesian product of (date_periods) x (entities):

CREATE TEMPORARY TABLE all_success_summary AS SELECT date_period, entity, pct_text FROM ( -- Cross join to have all (date_period, entity) possible pairs (SELECT DISTINCT date_period FROM success_summary) AS q00 CROSS JOIN (SELECT DISTINCT entity FROM success_summary) AS q01 ) AS q0 -- Left join with original data to retrieve actual pct_text -- where it exists (it will be NULL, otherwise) LEFT JOIN success_summary USING(date_period, entity) ;

The content of this intermediate table is:

| date_period | entity | pct_text | |-------------|--------|----------| | 2016-01-01 | a | 100.00% | | 2016-01-02 | a | 100.00% | | 2016-01-03 | a | (null) | | 2016-01-01 | b | 0.00% | | 2016-01-02 | b | 0.00% | | 2016-01-03 | b | (null) | | 2016-01-01 | c | 100.00% | | 2016-01-02 | c | (null) | | 2016-01-03 | c | (null) | | 2016-01-01 | d | 100.00% | | 2016-01-02 | d | (null) | | 2016-01-03 | d | (null) | | 2016-01-01 | e | (null) | | 2016-01-02 | e | (null) | | 2016-01-03 | e | 75.00% |

At this point, we can use the first version of crosstab:

SELECT * FROM crosstab( 'SELECT entity, date_period, pct_text FROM all_success_summary ORDER BY entity, date_period') AS ct (entity text, "2016-01-01" text, "2016-01-02" text, "2016-01-03" text) ; | entity | 2016-01-01 | 2016-01-02 | 2016-01-03 | |--------|------------|------------|------------| | a | 100.00% | 100.00% | | | b | 0.00% | 0.00% | | | c | 100.00% | | | | d | 100.00% | | | | e | | | 75.00% |

NOTEs: In order to know which is the appropriate column definition, you can use the following query:

SELECT '(entity text, ' || string_agg(c, ', ') || ')' AS column_definition FROM ( SELECT DISTINCT '"' || date_period || '" text' AS c FROM all_success_summary ORDER BY c ) AS q1 ;

I've chosen "date_period" to be just one day (and, in some places, formatted the result for ease of display). All the same can be achieved by using something such as date_trunc('week', date_time) AS date_period, to summarize by weeks instead of days, instead of the definition I used. This can be generalized to any type of grouping.

You can check most of this (except the crosstab itself) at SQLFiddle.

Stack Exchange Network

Return to Answer

Intermediate "success_summary" table

Intermediate "all_success_summary" table

Final PIVOT table

NOTES

Intermediate "success_summary" table

Intermediate "all_success_summary" table

Final PIVOT table

NOTES