Skip to main content
26 events
when toggle format what by license comment
Jun 27, 2023 at 17:10 comment added candied_orange "My ugly car wont start" "You need to paint it" "How's that going to help?" "It wont be so ugly".
Jun 6, 2023 at 15:21 comment added JimmyJames "There is arguably a redundancy in even storing weekday number for each date, since weekday can be algorithmically determined from the date." Don't get hung up on the example, it's not literally what they are doing but a very good approximation of the 'approach'. There's an instance of one of these tables that has a PK and an int column consisting of only positive single digit ints. There are hundreds of thousands of rows with the value 3. There's no reason that the child tables couldn't have used the value 3 directly instead holding the key (e.g. 955529). This is OLTP BTW.
Jun 6, 2023 at 14:58 comment added JimmyJames This isn't helping performance in any way. It's objectively and empirically a major performance drag. Consider how this affects caching. On the client there is a huge cache (high memory use) which accomplishes almost nothing for these tables (because no rows are reused across lookups). On the DB, it causes the LRU cache to constantly churn.
Jun 6, 2023 at 14:54 comment added JimmyJames "Unless other tables in the database have a foreign key to the second table, the partitioning does not seem useful." There are foreign keys to these tables from other tables. Some of these have a dozen or more tables referencing them but AFAICT, they never share any rows. That is, if there were twelve tables using the same data values (e.g. Saturday of week), there are separate rows each usage from each table i.e. child A and child B never have an FK to the same row.
Jun 4, 2023 at 17:14 history edited JacquesB CC BY-SA 4.0
added 76 characters in body
Jun 4, 2023 at 16:39 history edited JacquesB CC BY-SA 4.0
added 313 characters in body
Jun 4, 2023 at 12:03 history edited JacquesB CC BY-SA 4.0
added 1395 characters in body
Jun 2, 2023 at 15:43 comment added JacquesB Let us continue this discussion in chat.
Jun 2, 2023 at 15:25 comment added JimmyJames I don't thing you are following. Thanks for trying.
Jun 1, 2023 at 22:09 comment added JacquesB @JimmyJames if the two tables are 1:1 then it is not a lookup table, it is a vertical partition. Lookup tables and vertical portions are independent concerns - you can have both, a vertical partition and a lookup table for the unique values. Or you can have either, or none. Perhaps you think the vertical partition was supposed to be a lookup table but just badly designed? This is of course possible but probably more likely it is deliberate use of vertical partitioning.
Jun 1, 2023 at 18:19 comment added JimmyJames I don't know what else to tell you to get you to understand that the 700K rows are in the lookup table.
Jun 1, 2023 at 17:53 comment added JimmyJames So, you think it would be normal to have 700K rows in a days of the week table with only 7 distinct rows (aside from the primary key)? Is that how you normalize data?
Jun 1, 2023 at 17:50 comment added JacquesB @JimmyJames So the be clear, it is not a question of either 700 K values or just 7 values. If you need to add information to 700 K entities, you need a column of 700 K rows either way, regardless of the size of the data type, The 7 rows in a lookup table would be in addition to that.
Jun 1, 2023 at 17:42 comment added JacquesB @JimmyJames If you need to associate a day-of-the-week with 700K entities (and there are no non-key functional dependencies otherwise), then you would need a column with 700K values. If you add a lookup table for the seven possible values, then you would need 700K + 7 rows in all.
Jun 1, 2023 at 17:36 comment added JimmyJames So 7 rows, not 700K?
Jun 1, 2023 at 17:29 comment added JacquesB @JimmyJames Let's say you have a table with a million persons, a row for each, with a column indicating the day of the week they were born. OK so you have lots of duplicates in this row, since there are only seven days of the week. Now you extract the day of the week to a separate table with seven rows, and instead use the column as foreign key to this new table. You still have exactly the same number of duplicates in that column.
Jun 1, 2023 at 17:17 comment added JimmyJames I'm sorry but have you actually read the question? "Duplicate values are not problematic as long as the data is otherwise normalized." OK, so if you normalized some columns into a reference table, how many duplicate values would you expect in the reference table? For example, let's say I have a 'days of the week' table, how many rows should it have?
Jun 1, 2023 at 17:03 comment added JacquesB @JimmyJames OK, if I understand correctly, you are not really concerned about the vertical partitioning of the table into multiple 1:1 tables, you are concerned about the duplicate values in some columns? Duplicate values are not problematic as long as the data is otherwise normalized. If you changed the duplicate values to foreign keys to a smaller lookup table you would just get duplicate foreign keys in the main table.
Jun 1, 2023 at 16:52 comment added JimmyJames I am nof conflating the issue, you are referring to something irrelevant to the situation at hand. "But if you are talking about extracting duplicate values in a column to a lookup table, that would just introduce a join." Right, which is what this is all about.
Jun 1, 2023 at 16:49 comment added JacquesB @JimmyJames: I doubt it would be faster. But I think you are conflating different issues with this question. Vertical partitioning does not increase or decrease the number of rows, it increases the number of tables. But if you are talking about extracting duplicate values in a column to a lookup table, that would just introduce a join. E.g. imagine you have a boolean column and then extract the two values to lookup-table with two rows - you would still need a FK to the lookup-table for each row in the main table. This would introduce a join you may not need.
Jun 1, 2023 at 16:29 comment added JimmyJames Can you address the question I asked? E.g.: How is joining a thousand rows to a thousand rows going to be better/faster than joining 1000 rows to 6 rows? The results will be identical.
Jun 1, 2023 at 15:58 history edited JacquesB CC BY-SA 4.0
added 99 characters in body
Jun 1, 2023 at 15:26 comment added JacquesB @JimmyJames: It is called vertical partitioning or row splitting or 6th normal form, if you want to look further into it. It can in some cases increase performance e.g if queries often scan and select from some columns significantly much more than others. But it very much depends on the particular database and workload if there is any benefit.
Jun 1, 2023 at 13:41 comment added JimmyJames Explain to me how having hundreds of millions of rows with the exact same data values improves performance for joins. Consider indexes and inserts. Why would joining the child rows to a single parent row be less performant?
Jun 1, 2023 at 6:21 history edited JacquesB CC BY-SA 4.0
added 289 characters in body
Jun 1, 2023 at 6:14 history answered JacquesB CC BY-SA 4.0