Time Series - Pre Development Stage #1180

lvca · 2023-07-21T16:36:24Z

lvca
Jul 21, 2023
Maintainer

We have some users that are already using ArcadeDB for time series despite it's not an official model. We collected many use cases and come up with a design that should be easy to implement and, most importantly, blazing fast and space efficient.

The idea is simple: when you create a time series type, you define the following special attributes:

timestamp property name
file aggregation unit (year, month, day, hour)
page aggregation unit (day, hour, minute, second)
page size (default 64k)

You can find some of these concepts in clustered tables.

For example, if you have sensor data with millisecond precision, let's say around 1-10K measurements per minute, you could create a type "Sensor" with the following settings:

timestamp property name = "timestamp"
file aggregation unit = "day"
page aggregation = "minute"
page size = 32K

This means ArcadeDB will create a new file every day (with a name such as "Sensor_20230721") and it will start storing sensor data from this day in that file.

Each page stores only a minute in this case. The page size is configurable. Let's say you are keeping the following data arrived from a measurement:

{ "timestamp": 1689956195339, "sensorId": 2321, "temperature": 40.5 }

Then ArcadeDB will save the record that hosts the minute relative to the timestamp 1689956195339 that is Friday, July 21, 2023 4:16:35.339 PM (GMT) in the file Sensor_20230721.

A page has the following header (8,210 bytes total):

bucket header (2 + 4,096 pointers to the record inside the page, 2 bytes each - max 65KB page)
page timestamp (long - 8 bytes))
previous page id (int - 4 bytes)
next page id (int - 4 bytes)

The record above will be stored in the following way:

record's timestamp as the delta in milliseconds between the actual timestamp and page timestamp. This is to reduce the number of bytes required to store the integer number, stored as a varint. In this way, many timestamps can be stored with 1-2 bytes instead of the 8 bytes required for a fixed-size long type.
the record will be stored with the normal serializer, striped of the timestamp property (because stored as above)

The record in the example above could be stored in only 3 bytes with the most favorable conditions. A 64K page, without the header (that is 8,210 bytes), can use up to 57,326 bytes of content = an average of 13 bytes per record.

The other page attributes (previous page id and next page id) work as a linked list. In the perfect scenario that sensor data are coming ordered by timestamp, a dichotomic search would be very efficient to look up the right page during a query. In the case some records arrive late, the relative page is updated until there is space, otherwise a new page is appended and linked to the previous one.

If you're looking for a sensor in a particular range you will be able to issue this query:

SELECT FROM Sensor WHERE timestamp >= 1689956195339 AND timestamp <= 168995699999 AND sensorId = 2321

And ArcadeDB will use this special search to look for the record in this range. This works as a clustered index and there is no need to create an index on timestamp for an efficient retrieval. Also, this clustered index layout allows fast lookups and minimal storage = fast search and blazing fast insert.

We run some benchmarks internally to simulate this structure with the current buckets and we were able to measure >3M insert per second on a MacBook Pro 2019 using 7 parallel threads (!)

Another topic is a configurable pre-aggregation of data. In the example above, you could specify to aggregate the temperature by minute, using the AVERAGE function. In this way, during the insertion, the aggregated value would be updated and ready to be returned without any calculation. This is meant for phase 2 of the time series module.

WDYT? Any feedback about this?

topofocus · 2023-08-15T08:16:07Z

topofocus
Aug 15, 2023

I appreciate your thoughts on establishing effective time-series in arcadedb.

For me its not clear, how a time-series differs form an ordinary embedded Hash with limited update functionality.
If I understood correctly, the enhancement is an optimized index which is applied automatically.
Because its a time-series, the index has to be a timestamp.

I am asking myself, why not enhance the already implemented embedded List?
Then it might just as simple as implementing indizes on embedded documents and the support of time-series is just a special case of the document-database usage.

1 reply

lvca Aug 15, 2023
Maintainer Author

The idea above is to avoid having an index but rather rely on the unique property of time series structures:

order by timestamp
mostly append-only (but with support for delayed data)

Instead of creating an index, the data itself could be partitioned in a way that is O(1) or close to retrieving a range of data points.

gazillion101 · 2024-05-10T21:14:59Z

gazillion101
May 10, 2024

Hello! Any update on the time series model?

2 replies

lvca May 11, 2024
Maintainer Author

Hi @gazillion101, we've been busy with other priorities and nobody sponsored this feature yet.

reapnow-ionx Dec 5, 2024

Can you enlighten me regarding sponsoring regarding supporting parquet data format for time-series and other data streams that are best fitted for parquet.

lvca · 2026-02-20T21:34:19Z

lvca
Feb 20, 2026
Maintainer Author

Old discussion, time to give an update. We have a working timeseries model in the branch "timeseries-model" 🎆 Switching to the issue #3488

0 replies

reapnow-ionx · 2026-02-20T21:51:10Z

reapnow-ionx
Feb 20, 2026

Hi Luca, This is great news. When you met with Robert Buchannon, I asked him to give you my position paper (ArcadeDB Auto-Sharding Architecture & Algorithms) on how we are approaching auto-sharding in ArcadeDB. I know he was meeting you on direct business related matters, and he may not have had the time to give the paper to you such that I will attach it now. Justyn Hornor, CEO -- I/ONX post reading the first position paper responded with 2 questions regarding the hybrid rules engine in relationship to the auto-sharding. I responded with the second attachment (Bayesian Shadow Updatin ArcadeDB Auto-Sharding. I hope you find the attached papers helpful. I will be back to you next weekn per our updated progress.. Patrick Gilmartin

…

On Fri, Feb 20, 2026 at 1:34 PM Luca Garulli ***@***.***> wrote: Old discussion, time to give an update. We have a working timeseries model in the branch "timeseries-model" 🎆 Switching to the issue #3488 <#3488> — Reply to this email directly, view it on GitHub <#1180 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BGVBZHCUNXD7HQ62P2FFE2T4M54XDAVCNFSM6AAAAACV25Z6TGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKOBXGY2DKOA> . You are receiving this because you commented.Message ID: ***@***.***>

-- *CONFIDENTIALITY & SECURITY NOTICE* This email contains Sensitive Security Information, Proprietary Data, and Confidential Communications protected under U.S. state and federal law, including the Gramm-Leach-Bliley Act (GLBA) 15 USC, Subchapter 1, Sections 6801-6809 and other regulations governing Non-Public Personal Information (NPI). Additionally, this message may contain export-controlled technology subject to U.S. Export Administration Regulations (EAR), 15 CFR Parts 730-744. Any transfer or disclosure to non-U.S. persons or entities may constitute a federal violation. If you are not the intended recipient, you are strictly prohibited from reviewing, using, copying, retaining, or distributing this communication. Please notify the sender immediately, then permanently delete and destroy all copies of this email and any attachments. We take data security seriously—please exercise caution when handling sensitive information and adhere to all applicable compliance, cybersecurity, and encryption policies. For additional details regarding privacy protections, visit: FTC GLBA <https://www.ftc.gov/privacy/glbact/glbsub1.htm> Information.

0 replies

lvca · 2026-02-24T19:29:17Z

lvca
Feb 24, 2026
Maintainer Author

The TimeSeries model is out!

0 replies

reapnow-ionx · 2026-02-24T19:45:36Z

reapnow-ionx
Feb 24, 2026

Congratulations on the release of the TimeSeries Model. Well Done!!!!!!!!!

…

On Tue, Feb 24, 2026 at 11:29 AM Luca Garulli ***@***.***> wrote: Closed #1180 <#1180> as resolved. — Reply to this email directly, view it on GitHub <#1180>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BGVBZHBMKYB4TSVXJRRVLTL4NSRCFAVCNFSM6AAAAACV25Z6TGVHI2DSMVQWIX3LMV45UABFIRUXGY3VONZWS33OIV3GK3TUHI5E433UNFTGSY3BORUW63R3GI2DENJXGQYQ> . You are receiving this because you commented.Message ID: ***@***.*** .com>

-- *CONFIDENTIALITY & SECURITY NOTICE* This email contains Sensitive Security Information, Proprietary Data, and Confidential Communications protected under U.S. state and federal law, including the Gramm-Leach-Bliley Act (GLBA) 15 USC, Subchapter 1, Sections 6801-6809 and other regulations governing Non-Public Personal Information (NPI). Additionally, this message may contain export-controlled technology subject to U.S. Export Administration Regulations (EAR), 15 CFR Parts 730-744. Any transfer or disclosure to non-U.S. persons or entities may constitute a federal violation. If you are not the intended recipient, you are strictly prohibited from reviewing, using, copying, retaining, or distributing this communication. Please notify the sender immediately, then permanently delete and destroy all copies of this email and any attachments. We take data security seriously—please exercise caution when handling sensitive information and adhere to all applicable compliance, cybersecurity, and encryption policies. For additional details regarding privacy protections, visit: FTC GLBA <https://www.ftc.gov/privacy/glbact/glbsub1.htm> Information.

0 replies

reapnow-ionx · 2026-02-24T19:55:35Z

reapnow-ionx
Feb 24, 2026

Hi Luca, To bring you up to speed, we are completing the Hybrid Rules Engine I have designed, and with it, we have started to develop the ArcadeDB Agentic Substrate. While the full version of the Substrate is based on implementing auto-sharding per the design and documents I have provided you, ware implementing an interim solution that uses the current release of ArcadeDB with Cassandra for the interim solution. While my plans are to travel to Florida in March, I am not going to firm up my plans until the government gets HHS funded which includes TSA funding. I do not want to get caught up in travel issues due to TSA not being funded. Once the partial government shutdown trickles down to where TSA personnel do not show up due to delayed pay, I am sure the politicians will get their asses back to work and resolve this conflict I will keep you posted on my travel plans. Patrick Gilmartin On Tue, Feb 24, 2026 at 11:45 AM Patrick Gilmartin ***@***.***> wrote:

…

Congratulations on the release of the TimeSeries Model. Well Done!!!!!!!!! On Tue, Feb 24, 2026 at 11:29 AM Luca Garulli ***@***.***> wrote: > Closed #1180 <#1180> > as resolved. > > — > Reply to this email directly, view it on GitHub > <#1180>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BGVBZHBMKYB4TSVXJRRVLTL4NSRCFAVCNFSM6AAAAACV25Z6TGVHI2DSMVQWIX3LMV45UABFIRUXGY3VONZWS33OIV3GK3TUHI5E433UNFTGSY3BORUW63R3GI2DENJXGQYQ> > . > You are receiving this because you commented.Message ID: > <ArcadeData/arcadedb/repo-discussions/1180/discussion_event/2425741@ > github.com> >

-- *CONFIDENTIALITY & SECURITY NOTICE* This email contains Sensitive Security Information, Proprietary Data, and Confidential Communications protected under U.S. state and federal law, including the Gramm-Leach-Bliley Act (GLBA) 15 USC, Subchapter 1, Sections 6801-6809 and other regulations governing Non-Public Personal Information (NPI). Additionally, this message may contain export-controlled technology subject to U.S. Export Administration Regulations (EAR), 15 CFR Parts 730-744. Any transfer or disclosure to non-U.S. persons or entities may constitute a federal violation. If you are not the intended recipient, you are strictly prohibited from reviewing, using, copying, retaining, or distributing this communication. Please notify the sender immediately, then permanently delete and destroy all copies of this email and any attachments. We take data security seriously—please exercise caution when handling sensitive information and adhere to all applicable compliance, cybersecurity, and encryption policies. For additional details regarding privacy protections, visit: FTC GLBA <https://www.ftc.gov/privacy/glbact/glbsub1.htm> Information.

0 replies

Uh oh!

Time Series - Pre Development Stage #1180

Uh oh!

Uh oh!

lvca Jul 21, 2023 Maintainer

Replies: 7 comments · 3 replies

Uh oh!

topofocus Aug 15, 2023

Uh oh!

lvca Aug 15, 2023 Maintainer Author

Uh oh!

gazillion101 May 10, 2024

Uh oh!

lvca May 11, 2024 Maintainer Author

Uh oh!

reapnow-ionx Dec 5, 2024

Uh oh!

lvca Feb 20, 2026 Maintainer Author

Uh oh!

reapnow-ionx Feb 20, 2026

Uh oh!

lvca Feb 24, 2026 Maintainer Author

Uh oh!

reapnow-ionx Feb 24, 2026

Uh oh!

reapnow-ionx Feb 24, 2026

lvca
Jul 21, 2023
Maintainer

Replies: 7 comments 3 replies

topofocus
Aug 15, 2023

lvca Aug 15, 2023
Maintainer Author

gazillion101
May 10, 2024

lvca May 11, 2024
Maintainer Author

lvca
Feb 20, 2026
Maintainer Author

reapnow-ionx
Feb 20, 2026

lvca
Feb 24, 2026
Maintainer Author

reapnow-ionx
Feb 24, 2026

reapnow-ionx
Feb 24, 2026