3

I've a csv file and I am trying to calculate the average of each of the columns present in it.

#!/usr/bin/python with open('/home/rnish/Desktop/lbm-reference.dat.ref-2013-01-30-13-00-15big.csv', "rU") as f: columns = f.readline().strip().split(' ') numRows = 0 sums = [0] * len(columns) for line in f: values = line.split(" ") print values for i in xrange(len(values)): sums[i] += float(values[i]) numRows += 1 # for index, summedRowValue in enumerate(sums): # print columns[index], 1.0 * summedRowValue / numRows 

The error I get is:

 File "excel.py", line 15, in <module> sums[i] += float(values[i]) ValueError: invalid literal for float(): 0,536880742,8861743,0,4184866,4448905 

This is how the output from print values is:

['0,256352728,10070198,5079543,5024472,34764\n'] ['0,352618127,10102320,4987654,3082111,1902909\n'] ['0,505838297,9977968,423278,4709666,5041639\n'] ['0,506598469,10083489,0,5032146,5054715\n'] ['0,536869414,7229488,39934,4322290,3607046\n'] 

This is how the csv file looks:

0,256641418,10669052,4803710,4759922,0 0,484517531,9889830,1457230,4084777,4959529 0,506902273,9673699,0,5281012,5293376 

Could some one shed some light and help me understand this issue:

I am assuming after reading a couple of posts that it is due to the new line character. am I correct?

2 Answers 2

3

You are splitting a .cvs file at a space - but there are no spaces in your string. Try splitting at the comma instead:

 columns = f.readline().strip().split(',') 
Sign up to request clarification or add additional context in comments.

5 Comments

columns = f.readline().strip().split(',') numRows = 0 sums = [0] * len(columns) this is how it looks atm. but it doesn't work.
Just realized - you also need to split the line at commas in the ` values = line.split(" ")` line... change it to ` values = line.split(",")`. Is that better?
Thanks!. I did not catch that.
It ignores the first line of the csv file. I don't know why.
It ignores the first line because you're explicitly reading the first line (to get the columns list), but then not using it again. Try replacing the code initializing sums with sums = columns and you should be set.
1

Using numpy:

import numpy as np a = np.loadtxt("data.csv", delimiter=",") mean = np.mean(a, axis=0) print(mean) 

Using csv module:

import csv import sys it = csv.reader(sys.stdin, quoting=csv.QUOTE_NONNUMERIC) avg = next(it, []) count = 1 for count, row in enumerate(it, start=2): for i, value in enumerate(row): avg[i] += value avg = [a/count for a in avg] print(avg) 

Output

[0.0, 431655407.0, 9492692.6, 2106081.8, 4434137.0, 3128214.6] 

4 Comments

mean = np.mean(a, axis=0) It gives out values in [ 0.00000000e+00 5.49352277e+08 1.06912389e+07 7.33264049e+06 6.79808023e+06 6.80590208e+06]. that's an overhead for converting into floating point values
@rnish: check your input data. I've got: [0.00000000e+00 4.31655407e+08 9.49269260e+06 2.10608180e+06 4.43413700e+06 3.12821460e+06] that coincides exactly with the csv results.
I need to check for a way to convert ndarray data from scientific notation to floating point. :-). The data that I gave was just a small part of the csv file
@rnish: the data is the same. You can print it however you like. To save an array in the csv format: np.savetxt(sys.stdout, (mean,), fmt="%.1f", delimiter=", ") You could use a filename or any file object instead of `sys.stdout.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.