I am looking for advice on large tables and partitioning
The Data
- I have two resources:
participantsandmeasurements - There are 500,000 participants
- Each user will have 8,760 measurements
- That means there will be 4.38 billion measurements
- Once the data is loaded, it will not change
- Values are not going to be updated (except for e.g. fixing an error)
- No new participants or measurements will be added
The Queries
- The typical query will fetch all the measurements for a single participant
- e.g.
SELECT * from measurements WHERE part_id = 384352;
- e.g.
- There will not be aggregation of values across participants
- e.g.
SELECT pulse FROM measurements WHERE hour = 23 AND day = 12;won't happen
- e.g.
The Questions
This seems like a candidate for table partitioning. But I'm not familiar enough with partitioning.
I'm hoping someone can tell me:
- Will a partitioned table likely be faster than a single table in this use case?
- Given the data will be "static" once loaded, is there a better approach?
- Which partition method would be most appropriate? Range?
- It seems like it would be best to get all measurements for a single participant on the same partition.
- What's a good rule of thumb for partition sizing?
The Tables
The participants table with some example data:
CREATE TABLE participants ( id serial PRIMARY KEY, uuid uuid UNIQUE NOT NULL ); -- for example... id | uuid -------+-------------- 1 | 51243542... 2 | abcbdbab... ... 500000 | efe65e76... Without partitioning, the measurements table would have data like this:
CREATE TABLE measurements ( id bigserial PRIMARY KEY, part_id integer NOT NULL, hour integer NOT NULL, day integer NOT NULL, month integer NOT NULL, pulse real NOT NULL ); ALTER TABLE measurements ADD CONSTRAINT fk_measurements_to_participant_id FOREIGN KEY(part_id) REFERENCES participants(id) ON DELETE CASCADE; -- for example... id | part_id | hour | day | month | pulse -----------+-----------+-------+-------+-------+------- 1 | 1 | 0 | 1 | 1 | 58.2 2 | 1 | 1 | 1 | 1 | 52.6 3 | 1 | 2 | 1 | 1 | 56.2 4 | 1 | 3 | 1 | 1 | 57.4 ... 8760 | 1 | 23 | 31 | 12 | 67.9 8761 | 2 | 0 | 1 | 1 | 81.0 8762 | 2 | 1 | 1 | 1 | 83.3 ... 4380000000 | 500000 | 23 | 31 | 12 | 77.7 Cheers.
measurementstable's rows would be 28 bytes long, so multiplied by 4.38 billion is 122,640,000,000 bytes - or 114GB in total. Now, in 2021, a DB server with 256GB+ of RAM is not that extraordinary, so it might not be worth partitioning at all in your case.