Skip to main content
Tweeted twitter.com/StackDBAs/status/1099549574014881792
added info from comments into the question
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process.

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are usually one to one to the invoice, so there would only ever be one control number per invoice. There are also some invoices that will just not have a control number because they're older invoices being imported in from a system that didn't keep track of control numbers (they'll represent about 25% of the initial data). Even those older invoices may eventually get a control number (although most likely they're old enough that won't happen).

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

This is part of an OLTP database.

Probably 75% of the rows will have a control number eventually.

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process.

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are usually one to one to the invoice, so there would only ever be one control number per invoice. There are also some invoices that will just not have a control number because they're older invoices being imported in from a system that didn't keep track of control numbers (they'll represent about 25% of the initial data). Even those older invoices may eventually get a control number (although most likely they're old enough that won't happen).

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process.

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are usually one to one to the invoice, so there would only ever be one control number per invoice. There are also some invoices that will just not have a control number because they're older invoices being imported in from a system that didn't keep track of control numbers (they'll represent about 25% of the initial data). Even those older invoices may eventually get a control number (although most likely they're old enough that won't happen).

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

This is part of an OLTP database.

Probably 75% of the rows will have a control number eventually.

added 350 characters in body
Source Link

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process.

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are alwaysusually one to one to the invoice, so there would only ever be one control number per invoice. There are also some invoices that will just not have a control number because they're older invoices being imported in from a system that didn't keep track of control numbers (they'll represent about 25% of the initial data). Even those older invoices may eventually get a control number (although most likely they're old enough that won't happen).

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process.

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are always one to one to the invoice, so there would only ever be one control number per invoice.

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process.

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are usually one to one to the invoice, so there would only ever be one control number per invoice. There are also some invoices that will just not have a control number because they're older invoices being imported in from a system that didn't keep track of control numbers (they'll represent about 25% of the initial data). Even those older invoices may eventually get a control number (although most likely they're old enough that won't happen).

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

Incorporated comments; edited tags
Source Link
Paul White
  • 96.1k
  • 30
  • 442
  • 691

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process. 

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are always one to one to the invoice, so there would only ever be one control number per invoice.

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process. For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are always one to one to the invoice, so there would only ever be one control number per invoice.

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

The example I am working with is an invoice data model, where there may be around 30 columns by the time we're done designing. Each row of data would be unique to a single invoice.

Some of the columns are used by multiple parts of the application (invoice number, obviously, charge totals, etc.) but there are some columns that are only used by a single process. 

For example, we have three columns that reflect control numbers (varchar(18)) used by external clients that we should track, but only the tracking process ever looks at these columns. The control numbers are always one to one to the invoice, so there would only ever be one control number per invoice.

Does it make more sense to take these columns and make a separate table for them, from a data modeling perspective or a performance perspective, or, should we just leave them in the Invoices table? Answers for this specific example is appreciated, but I'm also curious about a more general answer, since we'll certainly come across this sort of scenario again.

Based on how I read this Wikipedia article, I do intend to mean vertical partitioning, at least in a logical sense (the two different tables would be on the same physical storage in this case).

Source Link
Loading