Taking cues from Bruce's code, here is a more efficient implementation
which does not keep the whole data in memory.
As stated in the question,
it assumes that the input file has (at most) one number per line.
It counts the lines in the input file that contain a qualifying number
and passes the count to the `awk` command
along with (preceding) the sorted data.
So, for example, if the file contains
6.0
4.2
8.3
9.5
1.7
then the input to `awk` is actually
5
1.7
4.2
6.0
8.3
9.5
Then the `awk` script captures the data count in the `NR==1` code block
and saves the middle value
(or the two middle values, which are averaged to yield the median)
when it sees them.
FILENAME="Salaries.csv"
(awk 'BEGIN {c=0} $1 ~ /^[-0-9]*(\.[0-9]*)?$/ {c=c+1;} END {print c;}' "$FILENAME"; \
sort -n "$FILENAME") | awk '
BEGIN {
c = 0
sum = 0
med1_loc = 0
med2_loc = 0
med1_val = 0
med2_val = 0
min = 0
max = 0
}
NR==1 {
LINES = $1
# We check whether numlines is even or odd so that we keep only
# the locations in the array where the median might be.
if (LINES%2==0) {med1_loc = LINES/2-1; med2_loc = med1_loc+1;}
if (LINES%2!=0) {med1_loc = med2_loc = (LINES-1)/2;}
}
$1 ~ /^[-0-9]*(\.[0-9]*)?$/ && NR!=1 {
# setting min value
if (c==0) {min = $1;}
# middle two values in array
if (c==med1_loc) {med1_val = $1;}
if (c==med2_loc) {med2_val = $1;}
c++
sum += $1
max = $1
}
END {
ave = sum / c
median = (med1_val + med2_val ) / 2
print "sum:" sum
print "count:" c
print "mean:" ave
print "median:" median
print "min:" min
print "max:" max
}
'