4

Title says it all. If the CSV's system delimiter was " (as opposed to a comma or pipe or other common alternatives), how would anything deal with it?

The crux of the matter is of course that by definition, CSV will surround any values containing the delimiter with quotation marks, and will convert all quotation marks to double quotation marks.

Would the result be parse-able?

(Inspired by an answer in Most common "Y2K-style" bugs today?)

System Delimiter (which drives excel, databases, etc)

5
  • 5
    Seeing as CSV stans for Comma Seperated Values, you'd have to change your spelling of "quotation mark" to "cuotation mark" :P ...or change your file extension to ".qsv"... Commented Mar 30, 2011 at 15:13
  • While CSV traditionally stands for comma separated values, the standard for the format allows for any separator to be used. There is a windows system variable in the registry used for controlling this. It's often programatically manipulated and changed with language settings. Commented Mar 30, 2011 at 19:35
  • I think the general name for such files is "Character-delimited values". Commented Mar 30, 2011 at 19:41
  • The comma-separated values (CSV) file format is a set of file formats used to store tabular data in which numbers and text are stored in plain textual form that can be read in a text editor. Lines in the text file represent rows of a table, and delimiters in a line separate what are fields in the tables row. In general, the choice of delimiter being a comma is a common assumption, not a standard. No system that uses semicolons or pipes as the standard delimiter adopts a different extension. In any event, this is just a epistemological problem. The practical use is what is in question now. Commented Mar 30, 2011 at 19:45
  • Interesting screenshot, I've never seen that dialogue before. I don't know if that applies to all CSV files on your system, or how Windows decides to display lists on the UI. Commented Mar 30, 2011 at 20:01

5 Answers 5

4

Answer: It Breaks the system

I altered my system settings to test this problem out: Altered System Settings

I found out that Microsoft does not know how to handle this.

My original data was:

Original Spreadsheet

After I saved the data, it produced the following ambiguous data file:

This "This"122,342.23""Test""quote" Is"Is"231,123.42""""quote""test" A"A"234,234.23""""something" Test"Test"234.34""something""" 

Sure enough, when I tried to open the file back up, it had screwed it up:

Reloaded Data

This shows that the CSV standard fails in the case that the chosen delimiter is a quotation mark and the actual data contains quotation marks. This means the windows operating system should probably disable the user from selecting this as a quotation mark, or change the CSV standard so that in the sole event that the quotation mark is chosen as the delimiter, it uses replaces the escape character (normally a quotation mark) with some other character.

2
  • IOW, Excel always interprets the " here as a delimiter instead of a quote. Commented Mar 31, 2011 at 0:02
  • 1
    I think the best approach here is "Localize data. Don't localize file formats." Commented Mar 31, 2011 at 0:04
1

You have to consider the actual system implementation. CSV is just a basic standard. If its coming out of Excel, a custom system, or some Linux editor the actual mileage may vary.

That being said, since you are a programmer I assume the system is something you have source code for.

"3\"4\"" 

The problem is obvious. The code is hard for a human being to read. Standard CSV

"3,4" 

is much easier.

What I would do is change the delimiter. If existing output exists, write a script to find and replace \" with , (or another acceptable delimiter that does not affect the data)

2
  • As far as I know, backslash is not used as an escape character for delimiters in the CSV standard. Commented Mar 30, 2011 at 19:34
  • @Alain - It may or may not be part of a published recognized std like of IEEE. I dont think it matters. Like any standard, its just a guide that the implementation may or may not follow. Just look at any browser, they all have HTML and js, but they cant agree on any one way to follow a standard. The answer chosen is also just MS's way to do it. By no means is that a catch all answer. Commented Apr 7, 2011 at 1:58
0

Why not?

The only problem would be if you wrote a parser using a regex and didn't properly escape the search char

0

The only thing you really need to consider is how often you're going to find the character you use as a delimiter in your data fields. I'd worry a bit about using double quotes, simply because double quotes are often used in conjunction with the regular delimiter (e.g. "A","B","C","D","ETC").

0

There would be no difference. You are still using some character to delimit each field and that character would need to be escaped when it occurs in the data. Choosing what that character is should be based on the following:

  • The character is not likely to occur frequently in the data (Reduce overhead)
  • The character should be easy to parse out (Make the job of the person writing the parser easier. If the character has other well defined uses in the context of text manipulation libraries, it leaves room for errors.)
1
  • The CSV standard implemented on the windows system uses the quotation mark to escape the delimiter appearing in the data - therein lies the difference. There's nothing in the CSV standard that accounts for the case where the quotation mark (used to escape the delimiter) IS the delimiter. Commented Mar 30, 2011 at 19:39

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.