Measure the size of a PostgreSQL table row

Question

I have a PostgreSQL table. select * is very slow whereas select id is nice and quick. I think it may be that the size of the row is very large and it's taking a while to transport, or it may be some other factor.

I need all of the fields (or nearly all of them), so selecting just a subset isn't a quick fix. Selecting the fields that I want is still slow.

Here's my table schema minus the names:

integer | not null default nextval('core_page_id_seq'::regclass) character varying(255) | not null character varying(64) | not null text | default '{}'::text character varying(255) | integer | not null default 0 text | default '{}'::text text | timestamp with time zone | integer | timestamp with time zone | integer |

The size of the text field may be any size. But still, no more than a few kilobytes in the worst case.

Questions

Is there anything about this that screams 'crazy inefficient'?
Is there a way to measure page size at the Postgres command-line to help me debug this?

Actually... one of the columns is 11 MB. That will explain it I think. So is there a way to do length(*) rather than just length(field)? I know that's chars not bytes but I only need an approx value. — Joe
– Joe, Commented Sep 7, 2012 at 9:51

Erwin Brandstetter · Accepted Answer · 2021-04-16 22:34:29Z

Q2: `way to measure page size`

PostgreSQL provides a number of Database Object Size Functions. I packed the most interesting ones in this query and added some Statistics Access Functions at the bottom. (The additional module pgstattuple provides more useful functions, yet.)

This is going to show that different methods to measure the "size of a row" lead to very different results. It all depends on what you want to measure, exactly.

This query requires Postgres 9.3 or later. For older versions see below.

Using a VALUES expression in a LATERAL subquery, to avoid spelling out calculations for every row.

Replace public.tbl with your optionally schema-qualified table name to get a compact view of collected row size statistics. You could wrap this into a plpgsql function for repeated use, hand in the table name as parameter and use EXECUTE ...

SELECT l.metric, l.nr AS bytes , CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty , CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row FROM ( SELECT min(tableoid) AS tbl -- = 'public.tbl'::regclass::oid , count(*) AS ct , sum(length(t::text)) AS txt_len -- length in characters FROM public.tbl t -- provide table name *once* ) x CROSS JOIN LATERAL ( VALUES (true , 'core_relation_size' , pg_relation_size(tbl)) , (true , 'visibility_map' , pg_relation_size(tbl, 'vm')) , (true , 'free_space_map' , pg_relation_size(tbl, 'fsm')) , (true , 'table_size_incl_toast' , pg_table_size(tbl)) , (true , 'indexes_size' , pg_indexes_size(tbl)) , (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl)) , (true , 'live_rows_in_text_representation' , txt_len) , (false, '------------------------------' , NULL) , (false, 'row_count' , ct) , (false, 'live_tuples' , pg_stat_get_live_tuples(tbl)) , (false, 'dead_tuples' , pg_stat_get_dead_tuples(tbl)) ) l(is_size, metric, nr);

Result:

 metric | bytes | bytes_pretty | bytes_per_row -----------------------------------+----------+--------------+--------------- core_relation_size | 44138496 | 42 MB | 91 visibility_map | 0 | 0 bytes | 0 free_space_map | 32768 | 32 kB | 0 table_size_incl_toast | 44179456 | 42 MB | 91 indexes_size | 33128448 | 32 MB | 68 total_size_incl_toast_and_indexes | 77307904 | 74 MB | 159 live_rows_in_text_representation | 29987360 | 29 MB | 62 ------------------------------ | | | row_count | 483424 | | live_tuples | 483424 | | dead_tuples | 2677 | |

For older versions (Postgres 9.2 or older):

WITH x AS ( SELECT count(*) AS ct , sum(length(t::text)) AS txt_len -- length in characters , 'public.tbl'::regclass AS tbl -- provide table name as string FROM public.tbl t -- provide table name as name ), y AS ( SELECT ARRAY [pg_relation_size(tbl) , pg_relation_size(tbl, 'vm') , pg_relation_size(tbl, 'fsm') , pg_table_size(tbl) , pg_indexes_size(tbl) , pg_total_relation_size(tbl) , txt_len ] AS val , ARRAY ['core_relation_size' , 'visibility_map' , 'free_space_map' , 'table_size_incl_toast' , 'indexes_size' , 'total_size_incl_toast_and_indexes' , 'live_rows_in_text_representation' ] AS name FROM x ) SELECT unnest(name) AS metric , unnest(val) AS bytes , pg_size_pretty(unnest(val)) AS bytes_pretty , unnest(val) / NULLIF(ct, 0) AS bytes_per_row FROM x, y UNION ALL SELECT '------------------------------', NULL, NULL, NULL UNION ALL SELECT 'row_count', ct, NULL, NULL FROM x UNION ALL SELECT 'live_tuples', pg_stat_get_live_tuples(tbl), NULL, NULL FROM x UNION ALL SELECT 'dead_tuples', pg_stat_get_dead_tuples(tbl), NULL, NULL FROM x;

Same result.

Q1: `anything inefficient?`

You could optimize column order to save some bytes per row, currently wasted to alignment padding:

integer | not null default nextval('core_page_id_seq'::regclass) integer | not null default 0 character varying(255) | not null character varying(64) | not null text | default '{}'::text character varying(255) | text | default '{}'::text text | timestamp with time zone | timestamp with time zone | integer | integer |

This saves between 8 and 18 bytes per row. I call it Column Tetris. See:

Also consider:

Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

am I in trouble if the first query always times out in my case? — Sanandrea
– Sanandrea, Commented Oct 11, 2021 at 15:45
@Sanandrea: The query runs a sequential scan over the whole table. So, depending on your timeout settings, that might happen if your table is huge, or if there are locking issues ... You might omit ct and txt_len, and provide the table name manually ('public.tbl'::regclass AS tbl ) to get the rest quickly. — Erwin Brandstetter
– Erwin Brandstetter, Commented Oct 11, 2021 at 20:59
Note that the live_rows_in_text_representation seems to be the best representation of actual row size. The three above it are more so representations of other table data divided by the number of rows, which is very different. They are more helpful when you have enough rows, but with less than 10 rows for example, those numbers can be very misleading. — Akaisteph7
– Akaisteph7, Commented Aug 11, 2023 at 20:53
@Akaisteph7: Completely depends what you are looking for, exactly. If it's about storage requirements, then the size of the text representation is barely useful. — Erwin Brandstetter
– Erwin Brandstetter, Commented Aug 12, 2023 at 1:33
@ErwinBrandstetter I think it's the most useful if all you want to know is the size of the row's content (from DanielVerite's answer below, it seems it does include any toasted content too). If you want to predict the needed size for a table, then it's less helpful but still needed to know how much space indices are taking and all. Overall, very important still. And it's good for the distinction to be clear either ways. — Akaisteph7
– Akaisteph7, Commented Aug 14, 2023 at 14:21

Daniel Vérité · Accepted Answer · 2012-09-07 14:23:02Z

An approximation of the size of a row, including the TOAST'ed contents, is easy to get by querying the length of the TEXT representation of the entire row:

SELECT octet_length(t.*::text) FROM tablename AS t WHERE primary_key=:value;

This is a close approximation to the number of bytes that will be retrieved client-side when executing:

SELECT * FROM tablename WHERE primary_key=:value;

...assuming that the caller of the query is requesting results in text format, which is what most programs do (binary format is possible, but it's not worth the trouble in most cases).

The same technique could be applied to locate the N "biggest-in-text" rows of tablename:

SELECT primary_key, octet_length(t.*::text) FROM tablename AS t ORDER BY 2 DESC LIMIT :N;

Excellent way to quickly get some estimates when working with big data (e.g. the majority of the row size lies in variable-length toast-stored columns), good idea! — fgblomqvist
– fgblomqvist, Commented Mar 27, 2019 at 16:06
@AkmalSalikhov yes, just checked it in documentation. I have also a question - am I correct - octet_length gives uncompressed size, not actual size on disk it takes? I found this answer that suggests it to me. — dankal444
– dankal444, Commented Jul 2, 2020 at 9:56
I don't think this considers the compression. So the real size is less than this. — Milad
– Milad, Commented Jan 4, 2023 at 13:35
The question says the size of the row is very large and it's taking a while to transport so it refers to the size of what goes through the network when an SQL client issues a select * from table query, which is the uncompressed size. — Daniel Vérité
– Daniel Vérité, Commented Jan 4, 2023 at 15:28

WhiteFire Sondergaard · Accepted Answer · 2015-04-30 19:58:19Z

20

Using the Database Object Size Functions mentioned above:

SELECT primary_key, pg_column_size(tablename.*) FROM tablename;

answered Apr 30, 2015 at 19:58

WhiteFire Sondergaard

3012 silver badges3 bronze badges

looked promising, but for any reason it does not work in my case. pg_column_size(tablename.big_column) exceeded the value of pg_column_size(tablename.*)

linqu
– linqu

2016-03-17 17:03:04 +00:00
Commented Mar 17, 2016 at 17:03
That's probably because your big column was TOASTed (store out of line). pg_column_size(tablename.*) only record the bytes in the main table.

kleptog
– kleptog

2022-08-02 07:50:57 +00:00
Commented Aug 2, 2022 at 7:50

Add a comment |

Imanol Y. · Accepted Answer · 2018-06-01 08:16:33Z

There are a few things that could be happening. In general, I doubt that length is the proximal problem. I suspect instead you have a length-related problem.

You say the text fields can get up to a few k. A row cannot go over 8k in main storage, and it is likely that your larger text fields have been TOASTed, or moved out of main storage into an extended storage in separate files. This makes your main storage faster (so select id actually is faster because fewer disk pages to access) but select * becomes slower because there is more random I/O.

If your total row sizes are still all well under 8k you could try altering the storage settings. I would, however, warn that you can get bad things happen when inserting an oversized attribute into main storage so best not to touch this if you don't have to and if you do, set appropriate limits via check constraints. So transportation is not likely the only thing. It may be collating many, many fields that require random reads. Large numbers of random reads may also cause cache misses, and large amounts of memory required can require that things get materialized on disk and large numbers of wide rows, if a join is present (and there is one if TOAST is involved) may require costlier join patterns, etc.

The first thing I would look at doing is selecting fewer rows and see if that helps. If that works, you could try adding more RAM to the server too, but I would start and see where the performance starts falling off due to plan changes and cache misses first.

FredG · Accepted Answer · 2020-06-30 14:17:57Z

If an average of the current rows size is wanted, you could use pg_column_size :

SELECT SUM(pg_column_size(table_name.*))/COUNT(*) FROM tablename;

Using it per column :

SELECT SUM(pg_column_size(table_name.column_name))/COUNT(*) FROM tablename;

Or SELECT AVG(pg_column_size(table_name.*)) FROM tablename; — Brandon
– Brandon, Commented Aug 11, 2021 at 21:54

Stack Exchange Network

Measure the size of a PostgreSQL table row

Questions

5 Answers 5

Q2: `way to measure page size`

Q1: `anything inefficient?`

Linked

Hot Network Questions

Measure the size of a PostgreSQL table row

Questions

5 Answers 5

Q2: way to measure page size

Q1: anything inefficient?

Linked

Related

Hot Network Questions

Q2: `way to measure page size`

Q1: `anything inefficient?`