2

I want to compare two csv files and print the differences in a file. I currently use the code below to remove a row. Can I change this code so that it compares two csv files or is there a better way in c# to compare csv files?

 List<string> lines = new List<string>(); using (StreamReader reader = new StreamReader(System.IO.File.OpenRead(path))) { string line; while ((line = reader.ReadLine()) != null) { if (line.Contains(csvseperator)) { string[] split = line.Split(Convert.ToChar(scheidingsteken)); if (split[selectedRow] == value) { } else { line = string.Join(csvseperator, split); lines.Add(line); } } } } using (StreamWriter writer = new StreamWriter(path, false)) { foreach (string line in lines) writer.WriteLine(line); } } 
5
  • 3
    If you want to find out added, deleted and changed lines, please have a look at the edit distance en.wikipedia.org/wiki/Edit_distance Commented Oct 11, 2017 at 12:48
  • I can't use that. Commented Oct 11, 2017 at 13:07
  • 2
    Why are you so sad? Why can't you use it? The easiest edit distance (Levenshtein one) is easy to implement en.wikipedia.org/wiki/Levenshtein_distance Commented Oct 11, 2017 at 13:09
  • You really shouldn't use empty if blocks in your code. Changing the condition solves this issue. Commented Oct 11, 2017 at 13:15
  • 2
    What do you want your program to output when two CSV files contain exactly the same data, but in a different order? Also, do records need to match 100%? Or is 1,Pete,2 equal to 1,"Pete",2? Commented Oct 11, 2017 at 13:30

2 Answers 2

3

Here is another way to find differences between CSV files, using Cinchoo ETL - an open source library

For the below sample CSV files

sample1.csv

id,name 1,Tom 2,Mark 3,Angie 

sample2.csv

id,name 1,Tom 2,Mark 4,Lu 

METHOD 1:

Using Cinchoo ETL, below code shows how to find differences between rows by all columns

var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray(); var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray(); using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader()) { output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default)); output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default)); } 

sampleDiff.csv

id,name 3,Angie 4,Lu 

Sample fiddle: https://dotnetfiddle.net/nwLeJ2

METHOD 2:

If you want to do the differences by id column,

var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray(); var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray(); using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader()) { output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" }))); output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" }))); } 

Sample fiddle: https://dotnetfiddle.net/t6mmJW

Sign up to request clarification or add additional context in comments.

4 Comments

This is a great tool. Thank you. In my case I have a "master" CSV file and a "detail" CSV file. How can I do the same above where CSV 1 is the master and CSV 2 is the detail to ouput a file saying record '3' has been "deleted" and record "4" is new? Obviously a new column to show this status. Could you please add an example to your answer? Oh and my CSVs have a unique ID column like your samples. Appreciated.
@Chinchoo Where does the result file get stored to?
can be routed to file, stream etc.
0

If you only want to compare one column you can use this code:

 List<string> lines = new List<string>(); List<string> lines2 = new List<string>(); try { StreamReader reader = new StreamReader(System.IO.File.OpenRead(pad)); StreamReader read = new StreamReader(System.IO.File.OpenRead(pad2)); string line; string line2; //With this you can change the cells you want to compair int comp1 = 1; int comp2 = 1; while ((line = reader.ReadLine()) != null && (line2 = read.ReadLine()) != null) { string[] split = line.Split(Convert.ToChar(seperator)); string[] split2 = line2.Split(Convert.ToChar(seperator)); if (line.Contains(seperator) && line2.Contains(seperator)) { if (split[comp1] != split2[comp2]) { //It is not the same } else { //It is the same } } } reader.Dispose(); read.Dispose(); } catch { } 

2 Comments

This only checks the 2nd column of each line, and ignores lines if one CSV contains more lines than the other.
How can I fix this?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.