0

I have some data which looks like:

sampleA ATGC 10 100 sampleA ATGC 120 230 sampleA ATGC 200 110 

I want to print the min and max using the values in both column 3 and 4. So my output should look like:

sampleA 10 230 

Thanks in advance

5
  • 2
    Do you want to print the min value from column 3 only and the max value from column 4 only? Or if column 3 was: 1000, 120, 200 and column 4 was 100, 230, 10 would you want your results to be: sampleA 1000 10? Commented Aug 9, 2017 at 12:18
  • The max or min could be in column 3 or 4. Because this is DNA ORF information some reads are in reverse like the last line in the above example. Commented Aug 9, 2017 at 14:14
  • So in that case I believe both answers below wont work. NVM just noticed AFSHIN edited his answer. Commented Aug 9, 2017 at 14:30
  • awk's asort function will do the job Commented Aug 9, 2017 at 20:34
  • It appears that columns 1 and 2 have absolutely nothing to do with the output. Are the column 1 values identical in all rows? If not, I don't see how you can even define what the first field of output should be. Commented Aug 9, 2017 at 20:48

3 Answers 3

3

Short awk solution:

awk '{ a[++c]=$3; a[++c]=$4 }END{ asort(a); print $1,a[1],a[length(a)] }' file 

The output:

sampleA 10 230 

Short datamash solution (for separate min/max calculation within 3rd/4th columns):

datamash -W -g1 min 3 max 4 < file 
  • -g1 - group records by 1st column value

  • min 3 - get minimum value on 3rd column

  • max 4 - get maximum value on 4rd column

The output:

sampleA 10 230 
2
  • 1
    I'm not familiar with the datamash! does it can find the min/max within 2columns as OP is clarified? Commented Aug 9, 2017 at 17:52
  • @AFSHIN, see my short update Commented Aug 9, 2017 at 20:35
2

Using awk:

awk 'BEGIN{getline; min=$3;max=$4} {(min>$3)?min=$3:"";(max>$4)?"":max=$4} END{print min, max}' infile.txt 

The output is:

10 230 

But I guess you are looking for something like below to find min/max within 2Columns not min in 3rd Column and max in 4th Column only as above is finding.

Sample Input:

sampleA ATGC 10 100 sampleA ATGC 300 2 sampleA ATGC 200 1100 sampleA ATGC 2301 9 sampleA ATGC 12345 15 sampleA ATGC 235 7 

The command:

awk 'BEGIN{getline;min=max=$3; ($4>$3)?max=$4:min=$4} { ($3>$4 && min>$4)?min=$4:((min>$3)?min=$3:""); ($3>$4 && $3>max)?max=$3:((max<$4)?max=$4:""); } END{print min, max}' infile.txt 

The output would be:

2 12345 
0
0
NF == 4 { if (++totalSamples == 1) { sampleName = $1 minValue = $3; maxValue = $3; } else { if ($3 < minValue) minValue = $3 else if ($3 > maxValue) maxValue = $3 } if ($4 < minValue) minValue = $4 else if ($4 > maxValue) maxValue = $4 } END { if (totalSamples) printf("%s %d %d\n", sampleName, minValue, maxValue) } 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.