added 2091 characters in body

edited Sep 28, 2024 at 14:09

151
5

CREATE TABLE `states` ( `state_id` bigint(20) NOT NULL AUTO_INCREMENT, `domain` varchar(64) DEFAULT NULL, `entity_id` char(0) DEFAULT NULL, `state` varchar(255) DEFAULT NULL, `attributes` char(0) DEFAULT NULL, `event_id` bigint(20) DEFAULT NULL, `last_changed` char(0) DEFAULT NULL, `last_changed_ts` double DEFAULT NULL, `last_updated` char(0) DEFAULT NULL, `created` datetime(6) DEFAULT NULL, `last_updated_ts` double DEFAULT NULL, `old_state_id` bigint(20) DEFAULT NULL, `attributes_id` bigint(20) DEFAULT NULL, `context_id` char(0) DEFAULT NULL, `context_user_id` char(0) DEFAULT NULL, `context_parent_id` char(0) DEFAULT NULL, `origin_idx` smallint(6) DEFAULT NULL, `context_id_bin` tinyblob DEFAULT NULL, `context_user_id_bin` tinyblob DEFAULT NULL, `context_parent_id_bin` tinyblob DEFAULT NULL, `metadata_id` bigint(20) DEFAULT NULL, `last_reported_ts` double DEFAULT NULL, PRIMARY KEY (`state_id`), KEY `ix_states_last_updated_ts` (`last_updated_ts`), KEY `ix_states_context_id_bin` (`context_id_bin`(16)), KEY `ix_states_attributes_id` (`attributes_id`), KEY `ix_states_old_state_id` (`old_state_id`), KEY `ix_states_metadata_id_last_updated_ts` (`metadata_id`,`last_updated_ts`), CONSTRAINT `states_ibfk_1` FOREIGN KEY (`old_state_id`) REFERENCES `states` (` state_id`), CONSTRAINT `states_ibfk_2` FOREIGN KEY (`attributes_id`) REFERENCES `state_att ributes` (`attributes_id`), CONSTRAINT `states_ibfk_3` FOREIGN KEY (`metadata_id`) REFERENCES `states_meta ` (`metadata_id`) ) ENGINE=InnoDB AUTO_INCREMENT=373236082 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4 _unicode_ci;

CREATE TABLE `states` ( `state_id` bigint(20) NOT NULL AUTO_INCREMENT, `domain` varchar(64) DEFAULT NULL, `entity_id` char(0) DEFAULT NULL, `state` varchar(255) DEFAULT NULL, `attributes` char(0) DEFAULT NULL, `event_id` bigint(20) DEFAULT NULL, `last_changed` char(0) DEFAULT NULL, `last_changed_ts` double DEFAULT NULL, `last_updated` char(0) DEFAULT NULL, `created` datetime(6) DEFAULT NULL, `last_updated_ts` double DEFAULT NULL, `old_state_id` bigint(20) DEFAULT NULL, `attributes_id` bigint(20) DEFAULT NULL, `context_id` char(0) DEFAULT NULL, `context_user_id` char(0) DEFAULT NULL, `context_parent_id` char(0) DEFAULT NULL, `origin_idx` smallint(6) DEFAULT NULL, `context_id_bin` tinyblob DEFAULT NULL, `context_user_id_bin` tinyblob DEFAULT NULL, `context_parent_id_bin` tinyblob DEFAULT NULL, `metadata_id` bigint(20) DEFAULT NULL, `last_reported_ts` double DEFAULT NULL, PRIMARY KEY (`state_id`), KEY `ix_states_last_updated_ts` (`last_updated_ts`), KEY `ix_states_context_id_bin` (`context_id_bin`(16)), KEY `ix_states_attributes_id` (`attributes_id`), KEY `ix_states_old_state_id` (`old_state_id`), KEY `ix_states_metadata_id_last_updated_ts` (`metadata_id`,`last_updated_ts`), CONSTRAINT `states_ibfk_1` FOREIGN KEY (`old_state_id`) REFERENCES `states` (` state_id`), CONSTRAINT `states_ibfk_2` FOREIGN KEY (`attributes_id`) REFERENCES `state_att ributes` (`attributes_id`), CONSTRAINT `states_ibfk_3` FOREIGN KEY (`metadata_id`) REFERENCES `states_meta ` (`metadata_id`) ) ENGINE=InnoDB AUTO_INCREMENT=373236082 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4 _unicode_ci;

added 80 characters in body

Source Link

edited Jun 12, 2023 at 16:03

cybernard

151
5

I am using homeassistant with mariadb. This is hosted on a Pi 4 with 8gb of RAM I have been collecting data a long time, and a single table grows to 20gb. The storage is a M.2 NVMe via USB 3. CPU usage is normal less than 20 but climbs to 40%. A few times a day it briefly spike. Memory usage has never been above 2gb with more than 6gb free.

Right now 350 (approx) sensors periodically record data to a single table.

Issues to solve:

performance gets slower over time
Too much data in table creates single point of failure. All my data can be gone with a single error.
Backups are slow and errors occur because the entire table is locked during a backup and new data can't be added to the table when it locked.

Solution I can up with

Split table by date (1 table per month)
Split table by sensor number. (Each sensor has its own table)
Ask community for alternative solutions.

Regarding solution 1

All sensors data is combined so even if I only ask for 12 sensors it has to sift through 350 worth of data. When I add senors, the table will continue to grow. Will it grow to be a problem? hope not.

Regarding solution 2

Will having 350 tables (maybe 500 in the future) cause performance issues of its own even though most table are less frequently used.

Implementation

I envision creating a view and/or trigger,procedures. I would create a view with the existing table name and use that to split the data.

In either solution 1 or 2 how would I identify and query only the necessary tables transparently using a view or etc. When homeassistant executes a query it should return all relevant data as if all the data was in a single table.

Using solution 1, The date would have to be extracted so that if the data range exceed a single month the from would have to be altered to include all relevant months. Instead of select * from table where data is between ... I would need that but then the from would be from 062023,052023,042023 (or whatever is relevant)

Using solution 2, For example if homeassistant asks select * from table where sensors =1,2, 3,5,9,11select * from table where sensors =1,2, 3,5,9,11 then query would need to be altered to select * from 1,2,3,5,9,11;select * from 1,2,3,5,9,11;

Would using a view allow me to make the necessary changes transparently?

Is there another solution I had overlooked.

I am using homeassistant with mariadb. This is hosted on a Pi 4 with 8gb of RAM I have been collecting data a long time, and a single table grows to 20gb. The storage is a M.2 NVMe via USB 3. CPU usage is normal less than 20 but climbs to 40%. A few times a day it briefly spike. Memory usage has never been above 2gb with more than 6gb free.

Right now 350 (approx) sensors periodically record data to a single table.

Issues to solve:

performance gets slower over time
Too much data in table creates single point of failure. All my data can be gone with a single error.
Backups are slow and errors occur because the entire table is locked during a backup and new data can't be added to the table when it locked.

Solution I can up with

Split table by date (1 table per month)
Split table by sensor number. (Each sensor has its own table)
Ask community for alternative solutions.

Regarding solution 1

All sensors data is combined so even if I only ask for 12 sensors it has to sift through 350 worth of data. When I add senors, the table will continue to grow. Will it grow to be a problem? hope not.

Regarding solution 2

Will having 350 tables (maybe 500 in the future) cause performance issues of its own even though most table are less frequently used.

Implementation

I envision creating a view and/or trigger,procedures. I would create a view with the existing table name and use that to split the data.

In either solution 1 or 2 how would I identify and query only the necessary tables transparently using a view or etc. When homeassistant executes a query it should return all relevant data as if all the data was in a single table.

Using solution 1, The date would have to be extracted so that if the data range exceed a single month the from would have to be altered to include all relevant months. Instead of select * from table where data is between ... I would need that but then the from would be from 062023,052023,042023 (or whatever is relevant)

Using solution 2, For example if homeassistant asks select * from table where sensors =1,2, 3,5,9,11 then query would need to be altered to select * from 1,2,3,5,9,11;

Is there another solution I had overlooked.

I am using homeassistant with mariadb. This is hosted on a Pi 4 with 8gb of RAM I have been collecting data a long time, and a single table grows to 20gb. The storage is a M.2 NVMe via USB 3. CPU usage is normal less than 20 but climbs to 40%. A few times a day it briefly spike. Memory usage has never been above 2gb with more than 6gb free.

Right now 350 (approx) sensors periodically record data to a single table.

Issues to solve:

performance gets slower over time
Too much data in table creates single point of failure. All my data can be gone with a single error.
Backups are slow and errors occur because the entire table is locked during a backup and new data can't be added to the table when it locked.

Solution I can up with

Split table by date (1 table per month)
Split table by sensor number. (Each sensor has its own table)
Ask community for alternative solutions.

Regarding solution 1

All sensors data is combined so even if I only ask for 12 sensors it has to sift through 350 worth of data. When I add senors, the table will continue to grow. Will it grow to be a problem? hope not.

Regarding solution 2

Will having 350 tables (maybe 500 in the future) cause performance issues of its own even though most table are less frequently used.

Implementation

I envision creating a view and/or trigger,procedures. I would create a view with the existing table name and use that to split the data.

In either solution 1 or 2 how would I identify and query only the necessary tables transparently using a view or etc. When homeassistant executes a query it should return all relevant data as if all the data was in a single table.

Using solution 1, The date would have to be extracted so that if the data range exceed a single month the from would have to be altered to include all relevant months. Instead of select * from table where data is between ... I would need that but then the from would be from 062023,052023,042023 (or whatever is relevant)

Using solution 2, For example if homeassistant asks select * from table where sensors =1,2, 3,5,9,11 then query would need to be altered to select * from 1,2,3,5,9,11;

Would using a view allow me to make the necessary changes transparently?

Is there another solution I had overlooked.

Source Link

asked Jun 12, 2023 at 15:55

cybernard

151
5

transparently splitting a single table so the program doesn't notice

I am using homeassistant with mariadb. This is hosted on a Pi 4 with 8gb of RAM I have been collecting data a long time, and a single table grows to 20gb. The storage is a M.2 NVMe via USB 3. CPU usage is normal less than 20 but climbs to 40%. A few times a day it briefly spike. Memory usage has never been above 2gb with more than 6gb free.

Right now 350 (approx) sensors periodically record data to a single table.

Issues to solve:

performance gets slower over time
Too much data in table creates single point of failure. All my data can be gone with a single error.
Backups are slow and errors occur because the entire table is locked during a backup and new data can't be added to the table when it locked.

Solution I can up with

Split table by date (1 table per month)
Split table by sensor number. (Each sensor has its own table)
Ask community for alternative solutions.

Regarding solution 1

All sensors data is combined so even if I only ask for 12 sensors it has to sift through 350 worth of data. When I add senors, the table will continue to grow. Will it grow to be a problem? hope not.

Regarding solution 2

Will having 350 tables (maybe 500 in the future) cause performance issues of its own even though most table are less frequently used.

Implementation

I envision creating a view and/or trigger,procedures. I would create a view with the existing table name and use that to split the data.

In either solution 1 or 2 how would I identify and query only the necessary tables transparently using a view or etc. When homeassistant executes a query it should return all relevant data as if all the data was in a single table.

Using solution 1, The date would have to be extracted so that if the data range exceed a single month the from would have to be altered to include all relevant months. Instead of select * from table where data is between ... I would need that but then the from would be from 062023,052023,042023 (or whatever is relevant)

Using solution 2, For example if homeassistant asks select * from table where sensors =1,2, 3,5,9,11 then query would need to be altered to select * from 1,2,3,5,9,11;

Is there another solution I had overlooked.

mariadb table

Stack Exchange Network

Return to Question

transparently splitting a single table so the program doesn't notice