I have some data which looks like:
sampleA ATGC 10 100 sampleA ATGC 120 230 sampleA ATGC 200 110 I want to print the min and max using the values in both column 3 and 4. So my output should look like:
sampleA 10 230 Thanks in advance
Short awk solution:
awk '{ a[++c]=$3; a[++c]=$4 }END{ asort(a); print $1,a[1],a[length(a)] }' file The output:
sampleA 10 230 Short datamash solution (for separate min/max calculation within 3rd/4th columns):
datamash -W -g1 min 3 max 4 < file -g1 - group records by 1st column value
min 3 - get minimum value on 3rd column
max 4 - get maximum value on 4rd column
The output:
sampleA 10 230 datamash! does it can find the min/max within 2columns as OP is clarified? Using awk:
awk 'BEGIN{getline; min=$3;max=$4} {(min>$3)?min=$3:"";(max>$4)?"":max=$4} END{print min, max}' infile.txt The output is:
10 230 But I guess you are looking for something like below to find min/max within 2Columns not min in 3rd Column and max in 4th Column only as above is finding.
Sample Input:
sampleA ATGC 10 100 sampleA ATGC 300 2 sampleA ATGC 200 1100 sampleA ATGC 2301 9 sampleA ATGC 12345 15 sampleA ATGC 235 7 The command:
awk 'BEGIN{getline;min=max=$3; ($4>$3)?max=$4:min=$4} { ($3>$4 && min>$4)?min=$4:((min>$3)?min=$3:""); ($3>$4 && $3>max)?max=$3:((max<$4)?max=$4:""); } END{print min, max}' infile.txt The output would be:
2 12345 NF == 4 { if (++totalSamples == 1) { sampleName = $1 minValue = $3; maxValue = $3; } else { if ($3 < minValue) minValue = $3 else if ($3 > maxValue) maxValue = $3 } if ($4 < minValue) minValue = $4 else if ($4 > maxValue) maxValue = $4 } END { if (totalSamples) printf("%s %d %d\n", sampleName, minValue, maxValue) }
sampleA 1000 10?asortfunction will do the job