Timeline for Which columns should be indexed when all may be used in different search queries?

Current License: CC BY-SA 4.0

13 events

when toggle format	what		by	license	comment
Feb 9, 2019 at 9:31	comment	added	Visual Vincent		Thanks for all the feedback! These discussions and my tests have given me a much better insight in how MySQL databases work, and I think indexing is going to be a lot easier in the future!
Feb 9, 2019 at 9:25	comment	added	Visual Vincent		Also I'm not limited to one index, as the others suggested I use a composite primary key and a composite secondary index. It proved to be quite effective and still not that much of disk cost. 398k rows took 44,5 MB indexed, ~12 MB not indexed. For 14,5M rows that's 1,5 GB vs 0,5 GB, which I think is quite alright considering how much data that actually is. :)
Feb 9, 2019 at 9:21	comment	added	Visual Vincent		I did test with a full table scan, that's how I got the results. Unfortunately though, even in the more realistic scenarios with less data the indexed table beat a full table scan by quite much, most notably when looking up all showtimes for a cinema.
Feb 9, 2019 at 0:10	comment	added	jmoreno		@VisualVincent: also, if you aren’t worried about space, and my answer was indicating that you shouldn’t be worried about space, you shouldn’t limit yourself to just 1 index if you feel that you have to keep everything in one table in one file, and find that performance is still not where you want it.
Feb 9, 2019 at 0:03	comment	added	jmoreno		@VisualVincent: just to be clear, my suggestion was to TEST with a full table scan. My suggestion for maximum performance was to do partitioning / audit table for any data more than a week old, that should keep your typical look up as fast as possible. PS I had the same ~200ms for 1m records using sql server(not using any key, just the values).
Feb 8, 2019 at 12:00	comment	added	Visual Vincent		And I tested with only 398k records in the table, meaning if I would've had 14 million it could have taken up to 7 seconds to find a single record.
Feb 8, 2019 at 11:51	comment	added	Visual Vincent		I just compared the indexing described in the first two answers with your suggestion of doing a full table scan (both with no index whatsoever and with a single, separate primary key). With indexes it took approximately 1,5 ms to find a single record and 15-17 ms to find every record for a specific cinema (~3400 records returned), whereas when doing a full table scan it took approximately 200 ms to locate a single record (with a separate primary key).
Feb 7, 2019 at 11:44	comment	added	jmoreno		@VisualVincent: I’ve added to the answer explaining why you seem to be optimizing for size. As for data type size. Not really, 4 billion was (and is) enough for a totally different system. A private network, when it went public it was obviously not enough. Really the fact that it lasted so long, shows how over-engineered it was (also note how forward thinking they were, VOIP was considered). 2 billion movies obviously isn’t enough for YouTube, it is for any possible list of movies shown in a theater. Your cinema ID should really be an int, that is an obvious place to allow growth.
Feb 7, 2019 at 11:21	history	edited	jmoreno	CC BY-SA 4.0	added 504 characters in body
Feb 7, 2019 at 10:28	comment	added	Visual Vincent		I realize I'm being rather pessimistic about this, but I highly value speed and sustainability. And compared to processing power, disk space is relatively cheap so I'm okay with a couple of extra GBs for the indexes. -- Having that said, I will of course try out and measure your suggestion, and compare it against the others' to see which is the fastest.
Feb 7, 2019 at 10:24	comment	added	Visual Vincent		Thank you for your answer! I'm not optimizing for size (typo?), I'm optimizing for speed. When designing a system I generally have 2-3 things in mind: speed, sustainability and somewhat scalability. Even though I might not need an index right now, I want to for all of those three reasons; I want it to be durable in the future. For the same reasons I use large data types. I know an unsigned INT is more than enough right now, but remember when they said 4 billion IP addresses would be enough?
Feb 7, 2019 at 6:28	history	edited	jmoreno	CC BY-SA 4.0	added 427 characters in body
Feb 7, 2019 at 6:21	history	answered	jmoreno	CC BY-SA 4.0