How to create pre-computed tables in order to speed up the query speed

Question

One of the issues that I am encountering presently is that we have certain very large tables (>10 Million rows).When we reference these large tables or create joins, the speed of query is extremely slow.

One of the hypothesis for solving the issue is to create pre-computed tables, where the computation for the use cases will be done already and instead of referencing the raw data, we will query the pre-computed table instead

Are there any resources in order to implement this ? Do we only use mySQL or can we also use Pandas or other such modules in order to accomplish the same

Which is the optimal way?

I don't use ClickHouse, but typically indexes are a good way to optimize joins. Have you considered creating indexes for the join lookups? — Bill Karwin
– Bill Karwin, Commented Sep 7, 2022 at 14:59
The problem with pre-computed tables, which are commonly called summary tables, is that you're never sure if the table needs to be re-computed. Checking it is at least as costly as doing the query against the raw data. So it's unsuitable if you need the summary table to be up to date, and your raw data changes frequently. — Bill Karwin
– Bill Karwin, Commented Sep 7, 2022 at 15:01
Agreed with Bill^. You should understand why your queries are slow, don't just assume it's because of the number of rows in your tables - that's rarely the reason. More often queries run slow with bigger tables because the queries themselves aren't designed as efficiently as they can be, or there's an architecture problem like missing indexes. — J.D.
– J.D., Commented Sep 7, 2022 at 16:58
I agree about the point. The present is not optimally designed and tends to cause large delays due to speed reference — databasequestion
– databasequestion, Commented Sep 8, 2022 at 9:39
However, considering the scale of things - we might be soon moving to NoSQL in future or perhaps change the underlying architecture of the storage. Considering the state of things - we have to make do with what we have presently — databasequestion
– databasequestion, Commented Sep 8, 2022 at 9:41

Rick James · Accepted Answer · 2022-09-07 17:30:44Z

Yes.

See my blog on Summary Tables. It discusses their purpose (similar to what you describe), how to build them, some metrics on properly sizing them, etc.

Often I see upwards of 10-fold speedup.

A well-design Data Warehouse uses the "Fact" table only when you need to fetch individual entries, which is rare. Most queries can be done against the Summary table(s).

And, by using PARTITIONing, you can efficiently toss "old" Fact rows, while keeping the Summary data "forever". This makes disk space more manageable.

It is usually good to heavily 'normalize' the Fact table, saving disk space. Meanwhile, the Summary tables can be .denormalized', improving speed.

If you want more specifics, please divulge more info.

Thank You Can you share more light on some examples of the mySQL query that will be run for extracting summary tables out of large tables (>10 Million Rows) — databasequestion
– databasequestion, Commented Sep 8, 2022 at 10:28
@databasequestion - I would need to see SHOW CREATE TABLE for the existing tables. And some hints of what the "reports" need to present. — Rick James
– Rick James, Commented Sep 9, 2022 at 23:36

Stack Exchange Network

How to create pre-computed tables in order to speed up the query speed

1 Answer 1

Hot Network Questions

How to create pre-computed tables in order to speed up the query speed

1 Answer 1

Related

Hot Network Questions