0

I have a CSV file, with 3 columns. But, one of these 3 columns include commas that break the CSV format. My csv is as below:

id,name,score 1,Black,1 2, Brown,J,0 

I want to copy only the second column in a different CSV file. My code looks like below:

for row in inpTweets: total_score = 0 name = row [1] writer.writerow([row [1], total_score]) 

Is there any appropriate way to change this format, in order to choose all the name field using python?

3
  • 1
    Are you producing this CSV file, or are you trying to read this broken CSV file? If just reading: well, it's broken and it's ambiguous where the value ends and where the column ends. For some very limited usecase it may be possible to have enough information to fix it, but we can't be sure without a lot more information on the issue. Commented Nov 27, 2015 at 11:53
  • i am reading it. So what kind on info must be added? Commented Nov 27, 2015 at 11:57
  • 2
    There is no general fix for broken CSV formats. To come up with a specific fix for your specific data we need to know what your data looks like exactly. Is the above sample representative? Could that be expressed as integer, followed by some string, ending with integer? Is it guaranteed that no additional commas are in any other columns? Commented Nov 27, 2015 at 12:29

1 Answer 1

1

Because your source CSV file is malformed, you will get a different number of elements when the CSV reader splits the various lines. For example,

import csv with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in: inpTweets = csv.reader(csv_in, skipinitialspace=True) header_row = True for row in inpTweets: if header_row: header_row = False else: print(row) 

will print

['1', 'Black', '1'] ['2', 'Brown', 'J', '0'] 

Notice that the first list conains three (3) elements and the second list contains four (4) elements.

If we know that

  • the source file is supposed to contain only three columns, and
  • the first and last columns are "id" and "score"

then we can "glue" second column back together from the intermediate elements in the list, i.e.,

row[1] + ', ' + row[2] + ... + row[n-1] 

That can be done with a list comprehension over range(1, len(row) - 1) ...

[row[x] for x in range(1, len(row) - 1)] 

... which we can then pass to ', '.join() to "glue" the individual elements back into a string

', '.join([row[x] for x in range(1, len(row) - 1)]) 

The final code would look something like this:

import csv with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in: inpTweets = csv.reader(csv_in, skipinitialspace=True) with open(r'C:\Users\Gord\Desktop\output.csv', 'wb') as csv_out: writer = csv.writer(csv_out, quoting=csv.QUOTE_NONNUMERIC) header_row = True for row in inpTweets: if header_row: header_row = False else: out_row = [', '.join([row[x] for x in range(1, len(row) - 1)]), 0] writer.writerow(out_row) 

and the resulting output CSV file would be

"Black",0 "Brown, J",0 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.