A single CSV column includes commas - Python

Question

I have a CSV file, with 3 columns. But, one of these 3 columns include commas that break the CSV format. My csv is as below:

id,name,score 1,Black,1 2, Brown,J,0

I want to copy only the second column in a different CSV file. My code looks like below:

for row in inpTweets: total_score = 0 name = row [1] writer.writerow([row [1], total_score])

Is there any appropriate way to change this format, in order to choose all the name field using python?

Are you producing this CSV file, or are you trying to read this broken CSV file? If just reading: well, it's broken and it's ambiguous where the value ends and where the column ends. For some very limited usecase it may be possible to have enough information to fix it, but we can't be sure without a lot more information on the issue. — deceze
– deceze ♦, Commented Nov 27, 2015 at 11:53
There is no general fix for broken CSV formats. To come up with a specific fix for your specific data we need to know what your data looks like exactly. Is the above sample representative? Could that be expressed as integer, followed by some string, ending with integer? Is it guaranteed that no additional commas are in any other columns? — deceze
– deceze ♦, Commented Nov 27, 2015 at 12:29

Gord Thompson · Accepted Answer · 2015-11-27 13:40:13Z

Because your source CSV file is malformed, you will get a different number of elements when the CSV reader splits the various lines. For example,

import csv with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in: inpTweets = csv.reader(csv_in, skipinitialspace=True) header_row = True for row in inpTweets: if header_row: header_row = False else: print(row)

will print

['1', 'Black', '1'] ['2', 'Brown', 'J', '0']

Notice that the first list conains three (3) elements and the second list contains four (4) elements.

If we know that

the source file is supposed to contain only three columns, and
the first and last columns are "id" and "score"

then we can "glue" second column back together from the intermediate elements in the list, i.e.,

row[1] + ', ' + row[2] + ... + row[n-1]

That can be done with a list comprehension over range(1, len(row) - 1) ...

[row[x] for x in range(1, len(row) - 1)]

... which we can then pass to ', '.join() to "glue" the individual elements back into a string

', '.join([row[x] for x in range(1, len(row) - 1)])

The final code would look something like this:

import csv with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in: inpTweets = csv.reader(csv_in, skipinitialspace=True) with open(r'C:\Users\Gord\Desktop\output.csv', 'wb') as csv_out: writer = csv.writer(csv_out, quoting=csv.QUOTE_NONNUMERIC) header_row = True for row in inpTweets: if header_row: header_row = False else: out_row = [', '.join([row[x] for x in range(1, len(row) - 1)]), 0] writer.writerow(out_row)

and the resulting output CSV file would be

"Black",0 "Brown, J",0

Collectives™ on Stack Overflow

A single CSV column includes commas - Python

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related