8

Possible Duplicate:
Dealing with commas in a CSV file

I wrote myself a CSV parser it works fine until I hit this record: B002VECGTG,B002VECGTG,HAS_17131_spaceshooter,"4,426",0.04%,"4,832",0.03%,0%,1,0.02%,$20.47 ,1 The escaped , in "4,426" and in "4,426" brake my parser.

This is what I am using to parse the line of text:

 char[] comma = { ',' }; string[] words = line.Split(comma); 

How do I prevent my program from breaking?

1
  • 1
    @kd7 That solution uses a library to parse the file, here I am trying to do it with split. Commented Sep 19, 2011 at 18:31

3 Answers 3

13

You can't just split on comma. To implement a proper parser for that case, you need to loop through the string yourself, keeping track of whether you are inside quotes or not. If you are inside a quoted string, you should keep on until you find another quote.

IEnumerable<string> LineSplitter(string line) { int fieldStart = 0; for(int i = 0; i < line.Length; i++) { if(line[i] == ',') { yield return line.SubString(fieldStart, i - fieldStart); fieldStart = i + 1; } if(line[i] == '"') for(i++; line[i] != '"'; i++) {} } } 
Sign up to request clarification or add additional context in comments.

4 Comments

This method loves chopping up words.
I've submitted an edit for the out-by-one error in the SubString, and the missing curly braces on the second for loop. Additionally, this doesn't removed the quotes from quoted fields and has different behaviour from split when called on ""
This example drops the last item out of the list.
IEnumerable<string> LineSplitter(string line) { int fieldStart = 0; for (int i = 0; i < line.Length; i++) { if (line[i] == ',') { yield return line.Substring(fieldStart, i - fieldStart); fieldStart = i + 1; } if (line[i] == '"') for (i++; line[i] != '"'; i++) { } } yield return line.Substring(fieldStart, line.Length - fieldStart); }
5

I suggest using a CSV parser instead of trying to parse by yourself.

There are some nuances to parsing CSV correctly, as you have already found out.

There are many third party ones (and several of these are free), and even one built into the Visual Basic namespace - the TextFieldParser in the Microsoft.VisualBasic.FileIO namespace.

3 Comments

+1 for the TextFieldParser - why is it hidden in the VB namespace?
@Anders - Good question... It is a really useful class. Though, for large CSV files, it is apparently slow.
Love the TextFieldParser here too! And also relevant to this answer for using a third-party utility, check out the FileHelpers library at filehelpers.com (LGPL licensing) for something not only very easy to use, but also as a great example to study and learn from if desired...
1

It is possible to use a Regex:

List<List<String>> rows = new List<List<String>>(); MatchCollection matches = Regex.Matches(input, @"^(?:(?:\s*""(?<value>[^""]*)""\s*|(?<value>[^,]*)),)*?(?:\s*""(?>value>[^""]*)""\s*|(?<value>[^,]*))$", RegexOptions.Multiline); foreach(Match row in matches) { List<String> values = new List<String>(); foreach(Capture value in row.Groups["value"].Captures) { values.Add(value.Value); } rows.Add(values); } 

I do not suggest that it is the best solution, but for small files (a couple of rows) it probably isn't too bad.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.