Let's start with an example. I've got an application that logs a point every minute. When I aggregate this data into a query where I'll have the count of logs per day, I get the following result. Keep in mind, that the query and the data is in UTC.
SELECT count(int_3) FROM data WHERE time >= '2023-10-26T00:00:00Z' AND time < '2023-10-31T23:59:59Z' GROUP BY time(1d) fill(none) 2023-10-26T00:00:00Z 1440 2023-10-27T00:00:00Z 1440 2023-10-28T00:00:00Z 1440 2023-10-29T00:00:00Z 1440 2023-10-30T00:00:00Z 1440 2023-10-31T00:00:00Z 1440 This is the expected, result. Also when I use a timezone in the query I get the expected result. Keep in mind that this query includes daylight saving, which results in one day having 1 bucket more.
SELECT count(int_3) FROM data WHERE time >= '2023-10-26T00:00:00+02:00' AND time < '2023-10-31T23:59:59+01:00' GROUP BY time(1d) fill(none) TZ('Europe/Amsterdam') 2023-10-26T00:00:00+02:00 1320 2023-10-27T00:00:00+02:00 1440 2023-10-28T00:00:00+02:00 1440 2023-10-29T00:00:00+02:00 1500 2023-10-30T00:00:00+01:00 1440 2023-10-31T00:00:00+01:00 1440 So far so good right. Let's focus on the issue I'm facing. When ever somebody want to move the start time of the query, in the example (2023-10-26T00:00:00Z), we need to add an offset to the GROUP BY time() clause, as InfluxDB normally start bucketing from the unixtimestamp 0.
According to their documentation page you can use a offset_interval to fix the start time of the query results
The offset_interval is a duration literal. It shifts forward or back the InfluxDB database’s preset time boundaries. The offset_interval can be positive or negative.
For example let's use the first query (UTC) again and shift the start time and endtime one hour back.
SELECT count(int_3) FROM data WHERE time >= '2023-10-25T23:00:00Z' AND time < '2023-10-31T22:59:59Z' GROUP BY time(1d) fill(none) Without using a offset_interval, the results are still timestamped on midnight.
2023-10-26T00:00:00Z 1440 2023-10-27T00:00:00Z 1440 2023-10-28T00:00:00Z 1440 2023-10-29T00:00:00Z 1440 2023-10-30T00:00:00Z 1440 2023-10-31T00:00:00Z 1380 Because the starttime of the query is 2023-10-25T23:00:00Z I also want the results to reflect that. I need to use an offset of 23 hours (calculated with this post https://stackoverflow.com/a/54807316/5686835)
SELECT count(int_3) FROM data WHERE time >= '2023-10-25T23:00:00Z' AND time < '2023-10-31T22:59:59Z' GROUP BY time(1d,23h) fill(none) 2023-10-25T23:00:00Z 1380 2023-10-26T23:00:00Z 1440 2023-10-27T23:00:00Z 1440 2023-10-28T23:00:00Z 1440 2023-10-29T23:00:00Z 1440 2023-10-30T23:00:00Z 1440 Perfect, now with the Timezone I want.
SELECT count(int_3) FROM data WHERE time >= '2023-10-25T23:00:00+02:00' AND time < '2023-10-31T22:59:59+01:00' GROUP BY time(1d,23h) fill(none) TZ('Europe/Amsterdam') 2023-10-25T23:00:00+02:00 1260 2023-10-26T23:00:00+02:00 1440 2023-10-27T23:00:00+02:00 1440 2023-10-28T23:00:00+02:00 1440 2023-10-29T23:00:00+01:00 1320 2023-10-30T23:00:00+01:00 1440 And there is the problem. 2023-10-29 should have one hour of point (60 points) more than the other days, but actually got 2 hours (120 points) less, which I find difficult to explain.
Does somebody have an explanation or a solution for this?
I'm using InfluxDB v1.8.10 and v1.11.1.