Timeline for Which columns should be indexed when all may be used in different search queries?
Current License: CC BY-SA 4.0
13 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Feb 9, 2019 at 9:31 | comment | added | Visual Vincent | Thanks for all the feedback! These discussions and my tests have given me a much better insight in how MySQL databases work, and I think indexing is going to be a lot easier in the future! | |
| Feb 9, 2019 at 9:25 | comment | added | Visual Vincent | Also I'm not limited to one index, as the others suggested I use a composite primary key and a composite secondary index. It proved to be quite effective and still not that much of disk cost. 398k rows took 44,5 MB indexed, ~12 MB not indexed. For 14,5M rows that's 1,5 GB vs 0,5 GB, which I think is quite alright considering how much data that actually is. :) | |
| Feb 9, 2019 at 9:21 | comment | added | Visual Vincent | I did test with a full table scan, that's how I got the results. Unfortunately though, even in the more realistic scenarios with less data the indexed table beat a full table scan by quite much, most notably when looking up all showtimes for a cinema. | |
| Feb 9, 2019 at 0:10 | comment | added | jmoreno | @VisualVincent: also, if you aren’t worried about space, and my answer was indicating that you shouldn’t be worried about space, you shouldn’t limit yourself to just 1 index if you feel that you have to keep everything in one table in one file, and find that performance is still not where you want it. | |
| Feb 9, 2019 at 0:03 | comment | added | jmoreno | @VisualVincent: just to be clear, my suggestion was to TEST with a full table scan. My suggestion for maximum performance was to do partitioning / audit table for any data more than a week old, that should keep your typical look up as fast as possible. PS I had the same ~200ms for 1m records using sql server(not using any key, just the values). | |
| Feb 8, 2019 at 12:00 | comment | added | Visual Vincent | And I tested with only 398k records in the table, meaning if I would've had 14 million it could have taken up to 7 seconds to find a single record. | |
| Feb 8, 2019 at 11:51 | comment | added | Visual Vincent | I just compared the indexing described in the first two answers with your suggestion of doing a full table scan (both with no index whatsoever and with a single, separate primary key). With indexes it took approximately 1,5 ms to find a single record and 15-17 ms to find every record for a specific cinema (~3400 records returned), whereas when doing a full table scan it took approximately 200 ms to locate a single record (with a separate primary key). | |
| Feb 7, 2019 at 11:44 | comment | added | jmoreno | @VisualVincent: I’ve added to the answer explaining why you seem to be optimizing for size. As for data type size. Not really, 4 billion was (and is) enough for a totally different system. A private network, when it went public it was obviously not enough. Really the fact that it lasted so long, shows how over-engineered it was (also note how forward thinking they were, VOIP was considered). 2 billion movies obviously isn’t enough for YouTube, it is for any possible list of movies shown in a theater. Your cinema ID should really be an int, that is an obvious place to allow growth. | |
| Feb 7, 2019 at 11:21 | history | edited | jmoreno | CC BY-SA 4.0 | added 504 characters in body |
| Feb 7, 2019 at 10:28 | comment | added | Visual Vincent | I realize I'm being rather pessimistic about this, but I highly value speed and sustainability. And compared to processing power, disk space is relatively cheap so I'm okay with a couple of extra GBs for the indexes. -- Having that said, I will of course try out and measure your suggestion, and compare it against the others' to see which is the fastest. | |
| Feb 7, 2019 at 10:24 | comment | added | Visual Vincent | Thank you for your answer! I'm not optimizing for size (typo?), I'm optimizing for speed. When designing a system I generally have 2-3 things in mind: speed, sustainability and somewhat scalability. Even though I might not need an index right now, I want to for all of those three reasons; I want it to be durable in the future. For the same reasons I use large data types. I know an unsigned INT is more than enough right now, but remember when they said 4 billion IP addresses would be enough? | |
| Feb 7, 2019 at 6:28 | history | edited | jmoreno | CC BY-SA 4.0 | added 427 characters in body |
| Feb 7, 2019 at 6:21 | history | answered | jmoreno | CC BY-SA 4.0 |