2

I have a data file (file.txt) contains the below lines:

123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com 345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00 456 team=efg, pro=bvy,ETA=22:00,dom=sss.co.uk,user2=lis 

I'm expecting to get the first column ($1) only if the ETA= number is greater than 15, like here I will have 2nd and 3rd line first column only is expected.

345 456 

I tried like cat file.txt | awk -F [,TPF=]' '{print $1}' but its print whole line which has ETA at the end.

5
  • $1 is the first field, but you path the string [,TPF=] as field separator, and this string does not occur anywhere in your file. Therefore the first field equals the whole line. BTW, the command you posted has an unbalanced single quote, so it would not even execute. Commented Nov 29, 2022 at 12:05
  • @user1934428 FYI [,TPF=] in that context isn't a string, it's a bracket expression containing 5 characters, so the record would be split into fields at every T, =, etc. Commented Nov 29, 2022 at 13:45
  • @EdMorton : Ah, you are right. I forgot that the field separator can be a regular expression. Thank you for reminding me. But in this case, the output should not be the whole line, as the OP claimed, but the part of the line before the first comma (assuming that the input is exactly as posted). Can you explain how print $1 could print the whole line? Commented Nov 30, 2022 at 7:41
  • @user1934428 That script in the question has a syntax error (missing ' before [,TPF=]') so it can't print anything. idk what the script was that printed a whole line for the OP but if you fixed the syntax error, that script wouldn't do that, it'd print up to the first = from the line shown in the example. Commented Nov 30, 2022 at 10:08
  • @EdMorton : Yes, this is what I told the OP too. It might well be this error is not present in the original code; perhaps the OP retyped the command instead of properly copying. The whole question is pretty unclear, and perhaps should be closed. Commented Nov 30, 2022 at 10:28

6 Answers 6

4

Using awk

$ awk -F"[=, ]" '{for (i=1;i<NF;i++) if ($i=="ETA") if ($(i+1) > 15) print $1}' input_file 345 456 
Sign up to request clarification or add additional context in comments.

Comments

4

With your shown samples please try following GNU awk code. Using match function of GNU awk where I am using regex (^[0-9]+).*ETA=([0-9]+):[0-9]+ which creates 2 capturing groups and saves its values into array arr. Then checking condition if 2nd element of arr is greater than 15 then print 1st value of arr array as per requirement.

awk ' match($0,/(^[0-9]+).*\<ETA=([0-9]+):[0-9]+/,arr) && arr[2]+0>15{ print arr[1] } ' Input_file 

Comments

3

I would harness GNU AWK for this task following way, let file.txt content be

123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com 345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00 456 team=efg, pro=bvy,ETA=02:00,dom=sss.co.uk,user2=lis 

then

awk 'substr($0,index($0,"ETA=")+4,2)+0>15{print $1}' file.txt 

gives output

345 

Explanation: I use String functions, index to find where is ETA= then substr to get 2 characters after ETA=, 4 is used as ETA= is 4 characters long and index gives start position, I use +0 to convert to integer then compare it with 15. Disclaimer: this solution assumes every row has ETA= followed by exactly 2 digits.

(tested in GNU Awk 5.0.1)

1 Comment

index($0,"ETA=") would fail if the input could contain PETA= or similar before the ETA= string.
3

Whenever input contains tag=value pairs as yours does, it's best to first create an array of those mappings (v[]) below and then you can just access the values by their tags (names):

$ cat tst.awk BEGIN { FS = "[, =]+" OFS = "," } { delete v for ( i=2; i<NF; i+=2 ) { v[$i] = $(i+1) } } v["ETA"]+0 > 15 { print $1 } 

$ awk -f tst.awk file 345 456 

With that approach you can trivially enhance the script in future to access whatever values you like by their names, test them in whatever combinations you like, output them in whatever order you like, etc. For example:

$ cat tst.awk BEGIN { FS = "[, =]+" OFS = "," } { delete v for ( i=2; i<NF; i+=2 ) { v[$i] = $(i+1) } } (v["pro"] ~ /b/) && (v["ETA"]+0 > 15) { print $1, v["team"], v["dom"] } 

$ awk -f tst.awk file 345,abc,sbc.int 456,efg,sss.co.uk 

Think about how you'd enhance any other solution to do the above or anything remotely similar.

Comments

2

It's unclear why you think your attempt would do anything of the sort. Your attempt uses a completely different field separator and does not compare anything against the number 15.

You'll also want to get rid of the useless use of cat.

When you specify a column separator with -F that changes what the first column $1 actually means; it is then everything before the first occurrence of the separator. Probably separately split the line to obtain the first column, space-separated.

awk -F 'ETA=' '$2 > 15 { split($0, n, /[ \t]+/); print n[1] }' file.txt 

The value in $2 will be the data after the first separator (and up until the next one) but using it in a numeric comparison simply ignores any non-numeric text after the number at the beginning of the field. So for example, on the first line, we are actually literally checking if 12:00, team=xyz,user1=tom,dom=dby.com is larger than 15 but it effectively checks if 12 is larger than 15 (which is obviously false).

When the condition is true, we split the original line $0 into the array n on sequences of whitespace, and then print the first element of this array.

1 Comment

-F 'ETA=' would fail if the input could contain PETA= or similar.
2

Using awk you could match ETA= followed by 1 or more digits. Then get the match without the ETA= part and check if the number is greater than 15 and print the first field.

awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) { if(substr($0, RSTART+4, RLENGTH-4)+0 > 15) print $1 }' file 

Output

345 456 

If the first field should start with a number:

awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) { if(substr($0, RSTART+4, RLENGTH-4) > 15)+0 print $1 }' file 

2 Comments

@EdMorton Good point, I have added +0 to make it a numeric comparison. Perhaps using PETA and Useless use of cat on the same page will give us some interesting comments :-)
hadn't noticed that - lol ;-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.