1

I would like to query a table which has few columns (such as 4 below) from where i would like to create a snapshot table which can provide insight on the attribute count everyday

userId attr1 attr2 attr3
1 true Hello Dog
2 false Hi
3 Hello
4 false Cat
5 true Hi Dog
6 false Cat
7 Hi
8 false Dog
9 Cat
10 false Hello Cat

Such as from above

userIdCount attrName dateWhenSnapshotInserted
7 attr1 April 16, 2024
6 attr2 April 16, 2024
7 attr3 April 16, 2024

In this case, we may have millions of rows in original table with more than 100 columns and we would like to populate snapshot table everyday with count of user with non null values for each attribute and date when this snapshot created.

I am framing my query as below :-

select COUNT(DISTINCT userId) as userIdCount from orig_table where attr1 IS NOT NULL UNION Select COUNT(DISTINCT userId) as userIdCount from orig_table where attr2 IS NOT NULL UNION Select COUNT(DISTINCT userId) as userIdCount from orig_table where attr3 IS NOT NULL; 

Is there a optimized way to write multiple select query together. I started with creating multiple select query and inserting it to output table one by one but that seems like a bad pattern. Please suggest how to achieve this so as this can be scaled.

ps: Newbie (who is exploring DB queries first time)

4
  • try using views . Commented Apr 16, 2024 at 7:39
  • Thanks Abhishek! Any suggestion on how to merge multiple select to create views Commented Apr 16, 2024 at 7:47
  • I think you mean a materialized view. A regular view would recalculate the query on each select from it, whereas a materialized view keeps a snapshot of the result until you refresh it. The multiple selects part of the problem can be handled with dynamic SQL in a PL/pgSQL block. Commented Apr 16, 2024 at 7:51
  • Removed conflicting product tags. Please add the one back that you actually use! Commented Apr 16, 2024 at 9:11

1 Answer 1

1

I think you mean a materialized view. A regular view would recalculate the query on each select from it, whereas a materialized view keeps a snapshot of the result until you refresh it.

The multiple selects part of the problem can be handled with dynamic SQL in a PL/pgSQL block.

Demo at db<>fiddle:

do $f$ declare column_name_ text; dynamic_sql_query text; dynamic_sql_query_per_table text:=$$SELECT COUNT(DISTINCT "userId") AS %1$I ,'%2$I' as "attrName" ,now() AS "dateWhenSnapshotInserted" FROM orig_table WHERE %2$I IS NOT NULL$$; begin for column_name_ in select column_name from information_schema.columns where table_name='orig_table' loop dynamic_sql_query:=concat_ws(' UNION ALL ' ,dynamic_sql_query ,format( dynamic_sql_query_per_table ,column_name_||'Count' ,column_name_)); end loop; execute format('CREATE MATERIALIZED VIEW orig_table_counts AS %s',dynamic_sql_query); end $f$; select * from orig_table_counts; 
userIdCount attrName dateWhenSnapshotInserted
10 "userId" 2024-04-16 09:07:44.604273+00
7 attr1 2024-04-16 09:07:44.604273+00
6 attr2 2024-04-16 09:07:44.604273+00
7 attr3 2024-04-16 09:07:44.604273+00
  1. concat_ws() takes care of putting a union all between each query that collects counts for each column. It skips null arguments, so having dynamic_sql_query initially declared but unpopulated prevents a leading/trailing union separator.
  2. $$ are quotes that let you use other types of quotes inside them. %1$I and %2$I are format() placeholders, instructing it to insert the 1st/2nd argument as an optionally double-quoted I dentifier. %s tells it to put another one, as a plain string, without any special handling.
  3. Make sure you pay attention to your identifier case sensitivity: unless you use " double-quotes, they all get folded to lowercase, meaning that userId might in reality be called userid.
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.