Skip to main content
6 of 8
added 706 characters in body

Try csvstat

The common CSV toolkits csvkit and xsv include some basic statistics features.

So just pretend that your one-record-per-line input data is a single column of a header-less CSV file.

CSVKIT is older and more well-known, so you can usually easily install it via your package manager of choice. XSV is newer and much faster for big inputs but you may have to install it manually.

Input:

$ echo 1 2 9 9 | tr " " "\n" 1 2 9 9 

csvkit's csvstat

csvstat is one of the commands of csvkit.

The default csvstat output is for humans...

$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row /usr/lib/python2.7/site-packages/agate/table/from_csv.py:74: RuntimeWarning: Error sniffing CSV dialect: Could not determine delimiter 1. "a" Type of data: Number Contains null values: False Unique values: 3 Smallest value: 1 Largest value: 9 Sum: 21 Mean: 5.25 Median: 5.5 StDev: 4.349 Most common values: 9 (2x) 1 (1x) 2 (1x) Row count: 4 

...but you can also get output as a CSV itself, which is better further processing:

$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv /usr/lib/python2.7/site-packages/agate/table/from_csv.py:74: RuntimeWarning: Error sniffing CSV dialect: Could not determine delimiter column_id,column_name,type,nulls,unique,min,max,sum,mean,median,stdev,len,freq 1,a,Number,False,3,1,9,21,5.25,5.5,4.349,,"9, 1, 2" 

csvstat will always complain that the lines do not contain any delimiter. To get rid of that error message just pipe it to /dev/null like so:

$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv 2>/dev/null column_id,column_name,type,nulls,unique,min,max,sum,mean,median,stdev,len,freq 1,a,Number,False,3,1,9,21,5.25,5.5,4.349,,"9, 1, 2" 

And if you want a slightly more human readable version you can pipe the whole thing through csvlook again:

$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv 2>/dev/null | csvlook | column_id | column_name | type | nulls | unique | min | max | sum | mean | median | stdev | len | freq | | --------- | ----------- | ------ | ----- | ------ | ---- | --- | --- | ---- | ------ | ----- | --- | ------- | | True | a | Number | False | 3 | True | 9 | 21 | 5.25 | 5.5 | 4.349 | | 9, 1, 2 | 

xsv stats

For speed reasons xsv stats does not include median by default...

$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers field,type,sum,min,max,min_length,max_length,mean,stddev 0,Integer,21,1,9,1,1,5.25,3.766629793329841 
$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers | xsv table field type sum min max min_length max_length mean stddev 0 Integer 21 1 9 1 1 5.25 3.766629793329841 

...but you can enable it via the --everything switch. This will give you these three extra columns: median,mode,cardinality:

$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers --everything field,type,sum,min,max,min_length,max_length,mean,stddev,median,mode,cardinality 0,Integer,21,1,9,1,1,5.25,3.766629793329841,5.5,9,3 $ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers --everything | xsv table field type sum min max min_length max_length mean stddev median mode cardinality 0 Integer 21 1 9 1 1 5.25 3.766629793329841 5.5 9 3