Background: I'm doing some analysis on a complex database defined by a 3rd-party. I keep finding an odd pattern (in the general sense) to the way that the parent-child relationships are populated with data. From
I thought of a schema perspective, the design is pretty orthodoxconcrete example of this style:
--------------- |main| date | -------- ------- --------------- | PK |ref | | day_of_week | | PKdate | --------------- | FK day_of_week | --> | PK | | data | | data | | data ... | | ...number | --------------- | name | | weekend | --------------- All well and goodOK. JustLooks like a basicrelatively typical normalized structure. But when I look at the data in the tables, I find something puzzling:
------------------------------ ----------------------------- | main date | | ref day_of_week | -----------------------------| ----------------------------- | PK | FKdate | dow | data | | PK | datanum | name | wkend | ------------------------------ ----------------------------- | 1 | 23-05-28 | 1 | bcd | | 1 | 1231 | sun | true | | 2 | 23-05-29 | 2 | cde | | 2 | 1232 | mon | false | | 3 | 23-05-30 | 3 | def | | 3 | 3453 | tue | false | | 4 | 23-05-31 | 4 | efg | | 4 | 3454 | wed | false | | 5 | 23-06-01 | 5 | fgh | | 5 | 1235 | thurs | false | | 6 | 23-06-02 | 6 | ghi | | 6 | 1236 | fri | false | | 7 | 23-06-03 | 7 | hij | | 7 | 1237 | sat | true | | 8 | 23-06-04 | 8 | ijk | | 8 | 1231 | sun | true | | 9 | 23-06-05 | 9 | jkl | | 9 | 1232 | mon | false | | 10 | 23-06-06 | 10 | klm | | 10 | 1233 | tue | false | | 11 | 23-06-07 | 11 | lmn | | 11 | 1234 | wed | false | | 12 | 23-06-08 | 12 | mno | | 12 | 1235 | thurs | false | | 13 | 23-06-09 | 13 | nop | | 13 | 3456 | fri | false | | 14 | 23-06-10 | 14 | opq | | 14 | 4567 | sat | true | | 15 | 23-06-11 | 15 | pqr | | 15 | 1231 | sun | true | |... |... | |... | |... | ... | ... | ... | SuchIn some cases that I've dug into a small set of unique data values in the reference table can make up than half of the rows (or more) in the reference table. The repeated data values are literally identical. I am seeing this in many places in the DB. Sometimes there are multiple tables pointing to the reference table and again, there's a separate entry for each dependent row regardless of whether that reference data already exists.