2

I have data structured like this

X 43808504 G 1 ^]. < X 43808505 C 3 . 4 X 43808506 T 8 . ? X 43808507 G 5 . C 

I want to get the max (8), min (1), and mean (4.25) from column 4 and write that to a file.

I've been wrestling with sorting and then cutting data away but that seems really inefficient.

Thanks for any help

4
  • You might want to take a look at csvsql, unless you require a solution without additional software. Commented Dec 20, 2019 at 15:29
  • Why not just use a for loop and do it yourself? I don't know of a way to use sort | cut to get the mean anyway. Commented Dec 20, 2019 at 15:31
  • 1
    awk '{print $4}' but you could do the whole lot in awk pretty trivially Commented Dec 20, 2019 at 15:31
  • 5
    Related: unix.stackexchange.com/q/13731/117549 Commented Dec 20, 2019 at 15:33

2 Answers 2

7

Using awk:

awk 'NR == 1 { min = $4; max = $4 } { sum += $4 if ($4 > max) { max = $4 } if ($4 < min) { min = $4 } } END { print max print min print sum / NR }' input 

First we set the min and max variable as the value of the 4th column in line 1, later we will check each value in column 4 to see if it is less than the current value of min or more than the current value of max, if so set min to that value.

Then we create a sum variable with the sum of all values of column 4. This will later be used to calculate the mean by dividing the sum by the total number of rows.

At the end we print the max, min, and mean.

4
  • Empty file gets a fatal divide-by-zero. print (NR ? sum / NR : "NaN") Commented Dec 20, 2019 at 16:15
  • 6
    @Paul_Pedant: Don't use it on an empty file. Commented Dec 20, 2019 at 16:16
  • I wouldn't. But users are a whole new thing. I like your initialising from NR==1. I use BEGIN and some arbitrary max and min, and it always looks clumsy. Commented Dec 20, 2019 at 16:19
  • 4
    If it's being run against an empty file it should error IMO because something has been done wrong. Commented Dec 20, 2019 at 16:20
6

With Miller

$ mlr --nidx --repifs stats1 -a 'min,max,mean' -f 4 data 1 8 4.250000 

You can redirect the output to a file in the usual way, by adding > file

With GNU datamash

$ datamash -W min 4 max 4 mean 4 < data 1 8 4.25 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.