The SUMMARIZE command can be used to easily compute a number of aggregates over a table or a query. The SUMMARIZE command launches a query that computes a number of aggregates over all columns (min, max, approx_unique, avg, std, q25, q50, q75, count), and return these along the column name, column type, and the percentage of NULL values in the column.
Usage
In order to summarize the contents of a table, use SUMMARIZE followed by the table name.
SUMMARIZE tbl; In order to summarize a query, prepend SUMMARIZE to a query.
SUMMARIZE SELECT * FROM tbl; Example
Below is an example of SUMMARIZE on the lineitem table of TPC-H SF1 table, generated using the tpch extension.
INSTALL tpch; LOAD tpch; CALL dbgen(sf = 1); SUMMARIZE lineitem; | column_name | column_type | min | max | approx_unique | avg | std | q25 | q50 | q75 | count | null_percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|
| l_orderkey | INTEGER | 1 | 6000000 | 1508227 | 3000279.604204982 | 1732187.8734803519 | 1509447 | 2989869 | 4485232 | 6001215 | 0.0% |
| l_partkey | INTEGER | 1 | 200000 | 202598 | 100017.98932999402 | 57735.69082650496 | 49913 | 99992 | 150039 | 6001215 | 0.0% |
| l_suppkey | INTEGER | 1 | 10000 | 10061 | 5000.602606138924 | 2886.9619987306114 | 2501 | 4999 | 7500 | 6001215 | 0.0% |
| l_linenumber | INTEGER | 1 | 7 | 7 | 3.0005757167506912 | 1.7324314036519328 | 2 | 3 | 4 | 6001215 | 0.0% |
| l_quantity | DECIMAL(15,2) | 1.00 | 50.00 | 50 | 25.507967136654827 | 14.426262537016918 | 13 | 26 | 38 | 6001215 | 0.0% |
| l_extendedprice | DECIMAL(15,2) | 901.00 | 104949.50 | 923139 | 38255.138484656854 | 23300.43871096221 | 18756 | 36724 | 55159 | 6001215 | 0.0% |
| l_discount | DECIMAL(15,2) | 0.00 | 0.10 | 11 | 0.04999943011540163 | 0.03161985510812596 | 0 | 0 | 0 | 6001215 | 0.0% |
| l_tax | DECIMAL(15,2) | 0.00 | 0.08 | 9 | 0.04001350893110812 | 0.025816551798842728 | 0 | 0 | 0 | 6001215 | 0.0% |
| l_returnflag | VARCHAR | A | R | 3 | NULL | NULL | NULL | NULL | NULL | 6001215 | 0.0% |
| l_linestatus | VARCHAR | F | O | 2 | NULL | NULL | NULL | NULL | NULL | 6001215 | 0.0% |
| l_shipdate | DATE | 1992-01-02 | 1998-12-01 | 2516 | NULL | NULL | NULL | NULL | NULL | 6001215 | 0.0% |
| l_commitdate | DATE | 1992-01-31 | 1998-10-31 | 2460 | NULL | NULL | NULL | NULL | NULL | 6001215 | 0.0% |
| l_receiptdate | DATE | 1992-01-04 | 1998-12-31 | 2549 | NULL | NULL | NULL | NULL | NULL | 6001215 | 0.0% |
| l_shipinstruct | VARCHAR | COLLECT COD | TAKE BACK RETURN | 4 | NULL | NULL | NULL | NULL | NULL | 6001215 | 0.0% |
| l_shipmode | VARCHAR | AIR | TRUCK | 7 | NULL | NULL | NULL | NULL | NULL | 6001215 | 0.0% |
| l_comment | VARCHAR | Tiresias | zzle? furiously iro | 3558599 | NULL | NULL | NULL | NULL | NULL | 6001215 | 0.0% |
Using SUMMARIZE in a Subquery
SUMMARIZE can be used a subquery. This allows creating a table from the summary, for example:
CREATE TABLE tbl_summary AS SELECT * FROM (SUMMARIZE tbl); Summarizing Remote Tables
It is possible to summarize remote tables via the httpfs extension using the SUMMARIZE TABLE statement. For example:
SUMMARIZE TABLE 'https://blobs.duckdb.org/data/Star_Trek-Season_1.csv';