Return to Revisions

3 of 5

added 4 characters in body

edited Oct 10, 2015 at 15:39

Taking cues from Bruce's code. Here is a more efficient implementation which does not keep the whole data in memory. Uses Process substitution to pass the length of data also to the awk code. Assumes that the file salaries.csv has a single column.

I Start by calculating the lines in the File which are not starting with 0 and catting that with the sorted file and giving the output to the awk command

FILENAME="Salaries.csv" cat <(awk 'BEGIN{c=0} $1 ~ /^[-0-9](\.[0-9]*)?$/{c=c+1;}END{print c;}' < "$FILENAME") <(sort -n <"$FILENAME") | awk ' BEGIN { c = 0; sum = 0; med1_loc = 0; med2_loc = 0; med1_val = 0; med2_val = 0; min = 0; max = 0; } NR==1{ LINES = $1; #We check if numlines it is even or odd so that we only keep locations in array where median might be if (LINES%2==0){med1_loc = LINES/2-1; med2_loc = med1_loc+1;} if (LINES%2!=0){med1_loc = (LINES-1)/2; med2_loc = med1_loc;} } $1 ~ /^[-0-9]*(\.[0-9]*)?$/ && NR!=1 { #setting min value if (c==0){min = $1;} #middle two values in array if (c==med1_loc){med1_val = $1;} if (c==med2_loc){med2_val = $1;} c++; sum += $1; max = $1 } END { ave = sum / c; median = (med1_val + med2_val ) / 2; print "sum:" sum "\n" "count:" c "\n" "mean:" ave "\n" "median:" median "\n" "min:" min "\n" "max:" max; } '

answered Oct 10, 2015 at 1:44

Rahul Agarwal