Skip to navigationSkip to main contentSkip to footerScaleway DocsAsk our AI
Ask our AI

Data Warehouse for ClickHouse® features and limitations

This page lists the different features and limitations of Scaleway Data Warehouse for ClickHouse®.

Features

Load Balancer

Every Scaleway Data Warehouse for ClickHouse® deployment comes automatically with a Load Balancer, even for deployments consisting of only 1 node.

This Load Balancer automatically balances the queries over the different nodes of your deployment. It is therefore not possible to know which node will process a query.

Data replication

For better performance and ease of use, Data Warehouse for ClickHouse® replicates the data across all nodes of a deployment.

Replication is achieved by aliasing commands:

Default commandReplaced by
CREATE DATABASE <database>CREATE DATABASE <database> ON CLUSTER <Scaleway cluster>
DELETE DATABASE <database>DELETE DATABASE <database> ON CLUSTER <Scaleway cluster>

Creating a table in the MergeTree family will also be aliased in order to create the Replicated version:

Default tableReplaced by
MergeTreeReplicatedMergeTree
ReplacingMergeTreeReplicatedReplacingMergeTree
CoalescingMergeTreeReplicatedCoalescingMergeTree
SummingMergeTreeReplicatedSummingMergeTree
AggregatingMergeTreeReplicatedAggregatingMergeTree
CollapsingMergeTreeReplicatedCollapsingMergeTree
VersionedCollapsingMergeTreeReplicatedVersionedCollapsingMergeTree
GraphiteMergeTreeReplicatedGraphiteMergeTree
Note

If no table engine is specified at table creation, Data Warehouse for ClickHouse® will use ReplicatedMergeTree by default.

Refer to the official ClickHouse® documentation for more information on the MergeTree table family.

Limitations

Sharding

Sharding cannot be manually configured in Data Warehouse for ClickHouse®. All nodes in the cluster contain a full copy of the data, meaning the deployment operates in a replicated (or "replica") mode rather than a sharded (or "distributed") architecture.

The total data capacity of the cluster is therefore limited to the storage of a single node, and single queries cannot be parallelized across shards to enhance performance.

Queries are executed on each replica independently, so while high availability and read scalability are improved, compute resources are not horizontally scalable for large analytical workloads that would benefit from data distribution.

Distributed table engine

Due to the absence of sharding, the Distributed table engine has no effect in a Data Warehouse for ClickHouse®.

Still need help?

Create a support ticket
No Results