- Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
my table detail
PARTITIONED BY (dt,a,b) TBLPROPERTIES ( 'chain-table.enabled' = 'true', 'partition.timestamp-formatter' = 'yyyyMMdd', 'partition.timestamp-pattern' = '$dt', 'primary-key' = 'dt,a,b,c,d'); Because it involves hundreds of millions of data row, i wish using PARTITIONED BY (dt,a,b) parition path for a and b column, to query more faster than using PARTITIONED BY (dt),
but i find using select count(1) from table where dt=20250101 result is 7000+
using select count(1) from table$branch_snapshot where dt=20250101 result is 16m+
using select count(1) from table$branch_delta where dt=20250102 result is 170k+
i assumed that chain table filter data by dt , and also by a 、b column
example
two table:
t1 and t1_only_dt
CREATE TABLE default.t1 ( `t1` string , `t2` string , `t3` string, `dt` string ) PARTITIONED BY (dt,t1) TBLPROPERTIES ( 'chain-table.enabled' = 'true', -- props about primary key table 'primary-key' = 'dt,t1,t3', 'sequence.field' = 't2', 'bucket' = '2', -- props about partition 'partition.timestamp-pattern' = '$dt', 'partition.timestamp-formatter' = 'yyyyMMdd' ); CALL sys.create_branch('default.t1', 'snapshot'); CALL sys.create_branch('default.t1', 'delta'); ALTER TABLE default.t1 SET tblproperties ('scan.fallback-snapshot-branch' = 'snapshot', 'scan.fallback-delta-branch' = 'delta'); ALTER TABLE `default`.`t1$branch_snapshot` SET tblproperties ('scan.fallback-snapshot-branch' = 'snapshot', 'scan.fallback-delta-branch' = 'delta'); ALTER TABLE `default`.`t1$branch_delta` SET tblproperties ('scan.fallback-snapshot-branch' = 'snapshot', 'scan.fallback-delta-branch' = 'delta'); The only difference between t1 and t1_only_dt is that t1_only_dt is: PARTITIONED BY (dt)
insert data sql
INSERT OVERWRITE `default`.`t1$branch_snapshot` PARTITION (dt = '20250810') VALUES ('aaa', 'seq1', 'x'), ('bbb', 'seq2', 'y'), ('ccc', 'seq3', 'z1'), ('ccc', 'seq4', 'z2'); INSERT OVERWRITE `default`.`t1$branch_delta` PARTITION (dt = '20250811') VALUES ('aaa', 'seq5', 'x_new'), ('ddd', 'seq6', 'w'), ('eee', 'seq7', 'm'), ('eee', 'seq8', 'n'); INSERT OVERWRITE `default`.`t1_only_dt$branch_snapshot` PARTITION (dt = '20250810') VALUES ('aaa', 'seq1', 'x'), ('bbb', 'seq2', 'y'), ('ccc', 'seq3', 'z1'), ('ccc', 'seq4', 'z2'); INSERT OVERWRITE `default`.`t1_only_dt$branch_delta` PARTITION (dt = '20250811') VALUES ('aaa', 'seq5', 'x_new'), ('ddd', 'seq6', 'w'), ('eee', 'seq7', 'm'), ('eee', 'seq8', 'n'); query result:
select *,"t1_only_dt" from `default`.`t1_only_dt` where dt = '20250811' union all select *,"t1" from `default`.`t1` where dt = '20250811'; aaa seq5 x_new 20250811 t1_only_dt bbb seq2 y 20250811 t1_only_dt aaa seq1 x 20250811 t1_only_dt ccc seq3 z1 20250811 t1_only_dt ccc seq4 z2 20250811 t1_only_dt ddd seq6 w 20250811 t1_only_dt eee seq7 m 20250811 t1_only_dt eee seq8 n 20250811 t1_only_dt aaa seq3 z1 20250811 t1 aaa seq4 z2 20250811 t1 ddd seq3 z1 20250811 t1 ddd seq4 z2 20250811 t1 eee seq3 z1 20250811 t1 eee seq4 z2 20250811 t1 Solution
adding a new table options like
chain-table.fallback-partition-keys , using this options to fallback and filtered data ,
not using partition keys
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!