I have a large csv file than has floats, ints and strings in it, using only the csv module i need to determine the data type, then perform calculations like mode, mean etc on the columns with numbers in them. So far i have this:
import csv file = open('adult.csv') reader = csv.reader(file) filename = 'output.xml' text = open(filename, 'w') text.write('<?xml version="1.0"?>') text.write('<!DOCTYPE summary [') def getType2(value): try: float(value) if "." in value: print 'float', return 'float' else: print 'int', return 'int' except ValueError: print 'str', headers = reader.next() length= len(headers) print length i=0 while i<length: print '<name>', print headers[i], print '</name>' print '<type>', value = reader.next() if getType2(value[i]) == 'int': data =[] total =0 for row in reader: data.append(float(row[i])) total += int(row[i]) print total print '</type>\n\n' print i= i+1 print '<!ELEMENT summary\n\n>' This correctly determines the data type, and will correctly do the first column, but i get an index error and it will not move onto the next one.
Pretty sure i am doing this an extremely convoluted way, as there must be an easier way to deal with this problem.