grep (or sed?): skip a specified number of lines before looking for matches

Question

I'm working with huge log files that accumulate over days that I can't truncate/rotate but need to parse new entries hourly.

I've been using grep to grab entries with a specific string then counting how many I get and tossing the first N, where N is the number of entries

I've already ingested on all prior loops, but of course this means inefficiently grepping the whole file every loop. I'm relatively unix naive, but I feel like there's a more efficient way to do this? I don't think tail would work because I won't know how many new lines have been written since the last parsing. This post talks of skipping, but using a search string to determine how many lines to skip whereas I'd be looking to supply the skip number as an argument. This one speaks to skipping a specified number of characters on each line, but I'd be looking to skip a specified number of lines.

Any suggestions?

Mike Lawrence · Accepted Answer · 2021-06-14 13:58:28Z

4

Figured it out while writing the Q, posting for posterity:

tail -n+N file | grep ...

where N is the number of lines to skip minus 1.

answered Jun 14, 2021 at 13:58

Mike Lawrence

1617 bronze badges

1

Or tail -f -n+M file | grep ... to carry on searching afterwards waiting for more lines to be added.

Stéphane Chazelas
– Stéphane Chazelas

2021-06-14 14:17:47 +00:00
Commented Jun 14, 2021 at 14:17
Or sed '/start pattern/,$!d; /pattern/!d' to look for pattern starting with the first line that matches pattern.

Stéphane Chazelas
– Stéphane Chazelas

2021-06-14 14:18:54 +00:00
Commented Jun 14, 2021 at 14:18
Or awk '$0 >= "2021-06-14 10:00" && /pattern/' if your logs are timestamped like that.

Stéphane Chazelas
– Stéphane Chazelas

2021-06-14 14:19:54 +00:00
Commented Jun 14, 2021 at 14:19

Add a comment |

Kusalananda · Accepted Answer · 2021-06-14 14:21:23Z

sed can be used to skip an initial number of lines. The command

sed '1,200d'

would delete the first 200 lines and pass all other lines on unchanged.

Likewise, awk could be used in a similar manner:

awk 'FNR > 200'

The above command would print line 201 and on but discard earlier lines. The FNR variable is the number of records (lines by default) read from the current file.

You could parametrize this easily to take a number from the command line:

awk -v n=200 'FNR > n'

You could also combine it with grep (replacing the function of grep with awk):

awk -v n=200 'FNR > n && /pattern/' somefile

... where pattern is some extended regular expression.

Or, to take the pattern from some value on the command line,

awk -v n=200 -v p='pattern' 'FNR > n && $0 ~ p'

or, safer, using an environment variable,

pattern='pattern' awk -v n=200 'FNR > n && $0 ~ ENVIRON["pattern"]' somefile

Any comment on the relative efficiency of tail -n+N file | grep … vs your see & awk options? — Mike Lawrence
– Mike Lawrence, Commented Jun 14, 2021 at 14:23
@MikeLawrence No. This depends on the implementation of the tools and the characteristics of the system that is used. What may be faster on one person's system, may be slower on some other person's system. Benchmarking is best made individually. — Kusalananda
– Kusalananda ♦, Commented Jun 14, 2021 at 14:29

Stack Exchange Network

grep (or sed?): skip a specified number of lines before looking for matches

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

grep (or sed?): skip a specified number of lines before looking for matches

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions