Skip to main content
added 770 characters in body
Source Link
JimmyJames
  • 31.1k
  • 3
  • 59
  • 111

Background: I'm doing some analysis on a complex database defined by a 3rd-party. I keep finding an odd pattern (in the general sense) to the way that the parent-child relationships are populated with data. From

I thought of a schema perspective, the design is pretty orthodoxconcrete example of this style:

--------------- |main| date  |   --------  ------- --------------- | PK |ref  | | day_of_week | | PKdate  | --------------- | FK day_of_week | --> | PK | | data | | data | | data ...  | | ...number  | --------------- | name | | weekend |  --------------- 

All well and goodOK. JustLooks like a basicrelatively typical normalized structure. But when I look at the data in the tables, I find something puzzling:

------------------------------ ----------------------------- | main  date  | | ref day_of_week  | -----------------------------| ----------------------------- | PK | FKdate  | dow | data | | PK | datanum | name | wkend | ------------------------------ ----------------------------- | 1 | 23-05-28 | 1  | bcd | | 1 | 1231 | sun | true  | | 2 | 23-05-29 | 2  | cde | | 2 | 1232 | mon | false | | 3 | 23-05-30 | 3  | def | | 3 | 3453 | tue | false | | 4 | 23-05-31 | 4  | efg | | 4 | 3454  | wed | false | | 5 | 23-06-01 | 5  | fgh | | 5 | 1235 | thurs | false | | 6 | 23-06-02 | 6  | ghi | | 6 | 1236 | fri | false | | 7 | 23-06-03 | 7  | hij | | 7 | 1237  | sat | true | | 8 | 23-06-04 | 8  | ijk | | 8 | 1231 | sun | true  | | 9  | 23-06-05 | 9  | jkl | | 9 | 1232 | mon  | false | | 10 | 23-06-06 | 10  | klm | | 10 | 1233 | tue  | false | | 11 | 23-06-07 | 11  | lmn | | 11 | 1234  | wed | false | | 12 | 23-06-08 | 12  | mno | | 12 | 1235 | thurs | false  | | 13 | 23-06-09 | 13  | nop | | 13 | 3456 | fri  | false | | 14 | 23-06-10 | 14  | opq | | 14 | 4567  | sat | true | | 15 | 23-06-11 | 15  | pqr | | 15 | 1231  | sun | true | |... |...  | |... |  |... | ... | ...  | ...  | 

SuchIn some cases that I've dug into a small set of unique data values in the reference table can make up than half of the rows (or more) in the reference table. The repeated data values are literally identical. I am seeing this in many places in the DB. Sometimes there are multiple tables pointing to the reference table and again, there's a separate entry for each dependent row regardless of whether that reference data already exists.

Background: I'm doing some analysis on a complex database defined by a 3rd-party. I keep finding an odd pattern (in the general sense) to the way that the parent-child relationships are populated with data. From a schema perspective, the design is pretty orthodox:

-------- |main | --------  -------- |ref | | PK | -------- | FK  | --> | PK | | data | | data | | ... | | ... | -------- -------- 

All well and good. Just a basic normalized structure. But when I look at the data in the tables, I find something puzzling:

------------------ ------------- | main | | ref | -----------------| ------------- | PK | FK | data | | PK | data | ------------------ ------------- | 1 | 1 | bcd | | 1 | 123 | | 2 | 2 | cde | | 2 | 123 | | 3 | 3 | def | | 3 | 345 | | 4 | 4 | efg | | 4 | 345 | | 5 | 5 | fgh | | 5 | 123 | | 6 | 6 | ghi | | 6 | 123 | | 7 | 7 | hij | | 7 | 123 | | 8 | 8 | ijk | | 8 | 123 | | 9 | 9 | jkl | | 9 | 123 | | 10 | 10 | klm | | 10 | 123 | | 11 | 11 | lmn | | 11 | 123 | | 12 | 12 | mno | | 12 | 123 | | 13 | 13 | nop | | 13 | 345 | | 14 | 14 | opq | | 14 | 456 | | 15 | 15 | pqr | | 15 | 123 | |... |... |... | |... | ... | 

Such that a small set of unique data values in the reference table can make up than half of the rows (or more) in the reference table. The repeated data values are literally identical. I am seeing this in many places in the DB. Sometimes there are multiple tables pointing to the reference table and again, there's a separate entry for each dependent row regardless of whether that reference data already exists.

Background: I'm doing some analysis on a complex database defined by a 3rd-party. I keep finding an odd pattern (in the general sense) to the way that the parent-child relationships are populated with data.

I thought of a concrete example of this style:

--------------- | date  |   --------------- --------------- | PK   | | day_of_week | | date  | --------------- | day_of_week | --> | PK | | data ...  | | number  | --------------- | name | | weekend |  --------------- 

OK. Looks like a relatively typical normalized structure. But when I look at the data in the tables, I find something puzzling:

------------------------------ ----------------------------- |   date  | | day_of_week  | -----------------------------| ----------------------------- | PK | date  | dow | data | | PK | num | name | wkend | ------------------------------ ----------------------------- | 1 | 23-05-28 | 1  | bcd | | 1 | 1 | sun | true  | | 2 | 23-05-29 | 2  | cde | | 2 | 2 | mon | false | | 3 | 23-05-30 | 3  | def | | 3 | 3 | tue | false | | 4 | 23-05-31 | 4  | efg | | 4 | 4  | wed | false | | 5 | 23-06-01 | 5  | fgh | | 5 | 5 | thurs | false | | 6 | 23-06-02 | 6  | ghi | | 6 | 6 | fri | false | | 7 | 23-06-03 | 7  | hij | | 7 | 7  | sat | true | | 8 | 23-06-04 | 8  | ijk | | 8 | 1 | sun | true  | | 9  | 23-06-05 | 9  | jkl | | 9 | 2 | mon  | false | | 10 | 23-06-06 | 10  | klm | | 10 | 3 | tue  | false | | 11 | 23-06-07 | 11  | lmn | | 11 | 4  | wed | false | | 12 | 23-06-08 | 12  | mno | | 12 | 5 | thurs | false  | | 13 | 23-06-09 | 13  | nop | | 13 | 6 | fri  | false | | 14 | 23-06-10 | 14  | opq | | 14 | 7  | sat | true | | 15 | 23-06-11 | 15  | pqr | | 15 | 1  | sun | true | |... |...  | |... |  |... | ... | ...  | ...  | 

In some cases that I've dug into a small set of unique data values in the reference table can make up than half of the rows (or more) in the reference table. The repeated data values are literally identical. I am seeing this in many places in the DB. Sometimes there are multiple tables pointing to the reference table and again, there's a separate entry for each dependent row regardless of whether that reference data already exists.

added 199 characters in body
Source Link
JimmyJames
  • 31.1k
  • 3
  • 59
  • 111

NOTE: Please don't respond by telling me that I probably don't understand what I am looking at. You can't possibly know that and it's wrong. Just don't answer if that's all you have to say.

I'm looking for a term akin to the "N+1 problem" but for a different issue. I know that the term 'anti-pattern' may raise some hackles but I'm pretty certain that the approach described below is either an implementation error or comes from some fundamental misunderstandings about database normalization. If anyone can explain any legitimate reason for doing this, I'd love to read it.

I'm looking for a term akin to the "N+1 problem" but for a different issue. I know that the term 'anti-pattern' may raise some hackles but I'm pretty certain that the approach described below is either an implementation error or comes from some fundamental misunderstandings about database normalization. If anyone can explain any legitimate reason for doing this, I'd love to read it.

NOTE: Please don't respond by telling me that I probably don't understand what I am looking at. You can't possibly know that and it's wrong. Just don't answer if that's all you have to say.

I'm looking for a term akin to the "N+1 problem" but for a different issue. I know that the term 'anti-pattern' may raise some hackles but I'm pretty certain that the approach described below is either an implementation error or comes from some fundamental misunderstandings about database normalization. If anyone can explain any legitimate reason for doing this, I'd love to read it.

edited body
Source Link
JimmyJames
  • 31.1k
  • 3
  • 59
  • 111
------------------ ------------- | main | | parent ref  | -----------------| ------------- | PK | FK | data | | PK | data | ------------------ ------------- | 1 | 1 | bcd | | 1 | 123 | | 2 | 2 | cde | | 2 | 123 | | 3 | 3 | def | | 3 | 345 | | 4 | 4 | efg | | 4 | 345 | | 5 | 5 | fgh | | 5 | 123 | | 6 | 6 | ghi | | 6 | 123 | | 7 | 7 | hij | | 7 | 123 | | 8 | 8 | ijk | | 8 | 123 | | 9 | 9 | jkl | | 9 | 123 | | 10 | 10 | klm | | 10 | 123 | | 11 | 11 | lmn | | 11 | 123 | | 12 | 12 | mno | | 12 | 123 | | 13 | 13 | nop | | 13 | 345 | | 14 | 14 | opq | | 14 | 456 | | 15 | 15 | pqr | | 15 | 123 | |... |... |... | |... | ... | 
------------------ ------------- | main | | parent | -----------------| ------------- | PK | FK | data | | PK | data | ------------------ ------------- | 1 | 1 | bcd | | 1 | 123 | | 2 | 2 | cde | | 2 | 123 | | 3 | 3 | def | | 3 | 345 | | 4 | 4 | efg | | 4 | 345 | | 5 | 5 | fgh | | 5 | 123 | | 6 | 6 | ghi | | 6 | 123 | | 7 | 7 | hij | | 7 | 123 | | 8 | 8 | ijk | | 8 | 123 | | 9 | 9 | jkl | | 9 | 123 | | 10 | 10 | klm | | 10 | 123 | | 11 | 11 | lmn | | 11 | 123 | | 12 | 12 | mno | | 12 | 123 | | 13 | 13 | nop | | 13 | 345 | | 14 | 14 | opq | | 14 | 456 | | 15 | 15 | pqr | | 15 | 123 | |... |... |... | |... | ... | 
------------------ ------------- | main | |  ref  | -----------------| ------------- | PK | FK | data | | PK | data | ------------------ ------------- | 1 | 1 | bcd | | 1 | 123 | | 2 | 2 | cde | | 2 | 123 | | 3 | 3 | def | | 3 | 345 | | 4 | 4 | efg | | 4 | 345 | | 5 | 5 | fgh | | 5 | 123 | | 6 | 6 | ghi | | 6 | 123 | | 7 | 7 | hij | | 7 | 123 | | 8 | 8 | ijk | | 8 | 123 | | 9 | 9 | jkl | | 9 | 123 | | 10 | 10 | klm | | 10 | 123 | | 11 | 11 | lmn | | 11 | 123 | | 12 | 12 | mno | | 12 | 123 | | 13 | 13 | nop | | 13 | 345 | | 14 | 14 | opq | | 14 | 456 | | 15 | 15 | pqr | | 15 | 123 | |... |... |... | |... | ... | 
added 79 characters in body
Source Link
JimmyJames
  • 31.1k
  • 3
  • 59
  • 111
Loading
Source Link
JimmyJames
  • 31.1k
  • 3
  • 59
  • 111
Loading