1

I currently have a SQL Server, 20 column fact table that could grow even wider into the hundreds. The current row count is about 9 million records. The indexing strategy is currently row-based. The query joins to every lookup table totaling about a dozen joins (most columns in the table are FKs). This is an analytical table that is generated nightly. Query filters are ad hoc so row-indexing most permutations is not feasible. I would like to explore removing these joins and using a columnstore index, since for the most part, only a handful of filter columns will be specified per query request.

Given I have three tables:

Table 1: SomeFactTable (~9 million records)

| Column | Description | Cardinality | | -------- | -------------- | -------------- | | Id | int32 PK | Unique | | Timestamp | datetime, non-null | medium | | LookupTableA_ID| int32, non-null, FK | Medium-high (40k records) | | LookupTableB_ID| int32, non-null, FK | Low (dozens records) | | NumberA | int32, non-null | Very high (8 million records) | 

Table 2: LookupTableA (~40,000 records)

| Column | Description | Cardinality | | -------- | -------------- | -------------- | | Id | int32 PK | Unique | | Description | varchar(255), non-null | high | 

Table 3: LookupTableB (~50 records)

| Column | Description | Cardinality | | -------- | -------------- | -------------- | | Id | int32 PK | Unique | | Description | varchar(255), non-null | high | 

I have two questions.

  1. How can I combine the benefits of a columnstore index with also grouping on the high cardinality NumberA column?

For example, given this query, it performs in about 100ms (which is great!). If I uncomment the group by NumberA, then the performance degrades to ~900ms (good, but I'd like to do better).

SELECT max(Id) Id ,[Timestamp] FROM [dbo].[SomeFactTable] group by [Timestamp] --,NumberA ORDER BY [Timestamp] ASC OFFSET 500 ROWS FETCH NEXT 20 ROWS ONLY 
  1. Most of the time I need to order by one or more of the lookup tables which requires a join. Sometimes these tables have many records (e.g. LookupTableA) and sometimes they only have a handful (e.g. LookupTableB). When there are many records, then the performance degrades. When there are only a handful, then the performance does not degrade. How can I maintain high performance when the table has many records?

As an example, this is a join with medium cardinality. It also has the high cardinality column from question 1. The query returns in about 2,900 ms.

SELECT max(SomeFactTable.Id) Id, [Timestamp], [Description] FROM dbo.SomeFactTable INNER JOIN dbo.LookupTableA ON SomeFactTable.LookupTableA_ID = LookupTableA.Id group by [Timestamp], [Description], NumberA ORDER BY [Description] ASC, [DateAbsent] ASC OFFSET 500 ROWS FETCH NEXT 20 ROWS ONLY 

This is a join with low cardinality. It also has the high cardinality column from question 1. The query returns in about 1,000 ms (and returns even much faster if NumberA wasn't grouped).

SELECT max(SomeFactTable.Id) Id, [Timestamp], [Description] FROM dbo.SomeFactTable INNER JOIN dbo.LookupTableB ON SomeFactTable.LookupTableB_ID = LookupTableB.Id group by [Timestamp], [Description], NumberA ORDER BY [Description] ASC, [DateAbsent] ASC OFFSET 500 ROWS FETCH NEXT 20 ROWS ONLY 

1 Answer 1

1

I only have a little bit of experience with columnstore indexing, but my experience has been awesomely positive so far. I'm assuming when you say the cardinality of the SomeFactTable.NumberA column is ~8 million of the ~9 million records, that you mean they share the same value. If so, then I think a columnstore index would work great for you on it, because it compresses the data of the columns it encompasses, and ~8 million / ~9 million records would be about a 90% compression ratio.

Columnstore Indexes also enable the execution plan to utilize Batch Mode processing, which operates on multiple rows at a time, generally resulting in significantly faster execution.

Again I'm no expert but I'd recommend testing a columnstore index on just the (NumberA) column vs one on all the fields you might group on, i.e. (Timestamp, Description, NumberA).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.