Return to Answer

added 336 characters in body

edited Jul 13, 2024 at 14:28

62.4k
21
137
190

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs or data corruption.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CSV file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

The question of white-space is pretty cut-and-dried. The RFC for CSV states that "Spaces are considered part of a field and should not be ignored." So unless you have specifically agreed with the producer of the CSV that whitespace should be ignored, you should not trim fields.

If you have to do with end-user input (e.g. someone typing into a search box) it might be a good idea to trim, perform case-insensitive matching, ignore diacritical, accepts a certain degree of misspellings etc. But this is a completely different scenario than machine-to-machine communication like a CSV.

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs or data corruption.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CSV file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

If you have to do with end-user input (e.g. someone typing into a search box) it might be a good idea to trim, perform case-insensitive matching, ignore diacritical, accepts a certain degree of misspellings etc. But this is a completely different scenario than machine-to-machine communication like a CSV.

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs or data corruption.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CSV file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

The question of white-space is pretty cut-and-dried. The RFC for CSV states that "Spaces are considered part of a field and should not be ignored." So unless you have specifically agreed with the producer of the CSV that whitespace should be ignored, you should not trim fields.

If you have to do with end-user input (e.g. someone typing into a search box) it might be a good idea to trim, perform case-insensitive matching, ignore diacritical, accepts a certain degree of misspellings etc. But this is a completely different scenario than machine-to-machine communication like a CSV.

added 19 characters in body

edited Jul 13, 2024 at 12:53

62.4k
21
137
190

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs or data corruption.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CVSCSV file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

If you have to do with end-user input (e.g. someone typing into a search box) it might be a good idea to trim, perform case-insensitive matching, ignore diacritical, accepts a certain degree of misspellings etc. But this is a completely different scenario than machine-to-machine communication like a CSV.

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs or data corruption.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CVS file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs or data corruption.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CSV file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

If you have to do with end-user input (e.g. someone typing into a search box) it might be a good idea to trim, perform case-insensitive matching, ignore diacritical, accepts a certain degree of misspellings etc. But this is a completely different scenario than machine-to-machine communication like a CSV.

added 19 characters in body

edited Jul 13, 2024 at 12:46

62.4k
21
137
190

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs or data corruption.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CVS file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CVS file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

If the values in the CSV does not exactly conform to the agreed-upon format, then you should fail fast and investigate the issue. It might be an error in the specification or a bug in the software on either end, but in any case you want to investigate it immediately. You do not want to process illegal data and potentially gloss over bugs or data corruption.

There are plenty of cases where the meaning of words may differ due to case - the classic example is "Turkey" (a country) and "turkey" (a large bird). In your particular case with just two values it might seem reasonably safe, but it is still an unnecessary risk which provide no benefit, and as a general rule it is a very bad idea.

There might also be other systems which process the same CVS file but treat case-discrepancies different, leading to inconsistent transactions. You absolutely want to investigate such issues immediately rather than sweep them under the rug.

Of course, if it is specifically agreed that the format is case-insensitive then converting to upper case is fine, but this would not be a question of robustness, just of following the spec. (Although case insensitivity is a can of worms if you go outside the ASCII character range, so it is not necessarily a good default.)

answered Jul 13, 2024 at 12:34

62.4k
21
137
190

Loading