Reverse grepping

Question

Let's say, I have a really big text file (about 10.000.000 lines). I need to grep it from the end and save result to a file. What's the most efficient way to accomplish task?

In addition to the excellent solutions posted, GNU grep has a --max-count (number) switch that aborts after a certain number of matches, which might be interesting to you. — Ulrich Schwarz
– Ulrich Schwarz, Commented Jul 23, 2014 at 13:28
Do you know how much hits you will have? When you think your grep will find 3 lines, start grepping and reverse afterwards. — Walter A
– Walter A, Commented May 1, 2015 at 19:24

chaos · Accepted Answer · 2014-07-23 20:20:03Z

47

tac/grep Solution

tac file | grep whatever

Or a bit more effective:

grep whatever < <(tac file)

Time with a 500MB file:

real 0m1.225s user 0m1.164s sys 0m0.516s

sed/grep Solution:

sed '1!G;h;$!d' | grep whatever

Time with a 500MB file: Aborted after 10+ minutes.

awk/grep Solution:

awk '{x[NR]=$0}END{while (NR) print x[NR--]}' file | grep whatever

Time with a 500MB file:

real 0m5.626s user 0m4.964s sys 0m1.420s

perl/grep Solution:

perl -e 'print reverse <>' file | grep whatever

Time with a 500MB file:

real 0m3.551s user 0m3.104s sys 0m1.036s

edited Jul 23, 2014 at 20:20

answered Jul 23, 2014 at 12:26

chaos

49.4k11 gold badges128 silver badges147 bronze badges

@chaos, I think grep "somepattern" < <(tac filename) will be faster.

Valentin Bajrami
– Valentin Bajrami

2014-07-23 12:43:53 +00:00
Commented Jul 23, 2014 at 12:43
2

@val0x00ff The < <(tac filename) should be as fast as a pipe: in both cases, the commands run in parallel.

vinc17
– vinc17

2014-07-23 12:46:12 +00:00
Commented Jul 23, 2014 at 12:46
7

If you're going for efficiency, it would be better to put the tac after the grep. If you've got a 10,000,000 line file, with only 2 matches, tac will only have to reverse 2 lines, not 10m. grep is still going to have to go through the whole thing either way.

phemmer
– phemmer

2014-07-23 14:10:49 +00:00
Commented Jul 23, 2014 at 14:10
3

If you put tac after the grep, it will be reading from a pipe and so can't seek. That will make it less efficient (or fail completely) if the number of found lines is large.

jjanes
– jjanes

2014-07-23 19:45:30 +00:00
Commented Jul 23, 2014 at 19:45
1

@Bernhard If you tac a real file, it lseeks backwards through the file to read it backwards in chunks, and then reverses the lines in each chunk, remembering the line broken across chunks to put them back together. If reading from a pipe, it can't do that. It either needs to read the whole thing into memory, or write it to a temp file, or fail.

jjanes
– jjanes

2014-07-24 15:52:16 +00:00
Commented Jul 24, 2014 at 15:52

| Show 5 more comments

derobert · Accepted Answer · 2014-07-26 18:28:54Z

17

This solution might help:

tac file_name | grep -e expression

edited Jul 26, 2014 at 18:28

derobert

113k20 gold badges242 silver badges289 bronze badges

answered Jul 23, 2014 at 12:13

Anshul Patel

6615 silver badges11 bronze badges

3

tac is the GNU command. On most other systems, the equivalent is tail -r.

Stéphane Chazelas
– Stéphane Chazelas

2014-07-23 14:55:47 +00:00
Commented Jul 23, 2014 at 14:55
@Stéphane: On at least some Unix systems, tail -r is limited to a small number of lines, this might be an issue.

RedGrittyBrick
– RedGrittyBrick

2014-07-23 16:20:29 +00:00
Commented Jul 23, 2014 at 16:20
1

@RedGrittyBrick, do you have any reference for that, or could you please tell which systems have that limitation?

Stéphane Chazelas
– Stéphane Chazelas

2014-07-23 16:50:21 +00:00
Commented Jul 23, 2014 at 16:50
@StéphaneChazelas, tail -r /etc/passwd fails with tail: invalid option -- 'r'. I'm using coreutils-8.21-21.fc20.x86_64.

Cristian Ciupitu
– Cristian Ciupitu

2014-07-23 20:14:51 +00:00
Commented Jul 23, 2014 at 20:14
@CristianCiupitu, as I said, GNU has tac (and only GNU has tac) many other Unices have tail -r. GNU tail doesn't support -r

Stéphane Chazelas
– Stéphane Chazelas

2014-07-23 22:41:00 +00:00
Commented Jul 23, 2014 at 22:41

| Show 1 more comment

zzapper · Accepted Answer · 2014-07-24 11:05:47Z

This one exits as soon as it finds the first match:

 tac hugeproduction.log | grep -m1 WhatImLookingFor

The following gives the 5 lines before and after the first two matches:

 tac hugeproduction.log | grep -m2 -A 5 -B 5 WhatImLookingFor

Remember not to use -i (case insensitive) unless you have to as that will slow down the grep.

If you know the exact string you are looking for then consider fgrep (Fixed String)

 tac hugeproduction.log | grep -F -m2 -A 5 -B 5 'ABC1234XYZ'

cuonglm · Accepted Answer · 2015-05-07 16:25:20Z

If the file is really big, can not fit in memory, I will use Perl with File::ReadBackwards module from CPAN:

$ cat reverse-grep.pl #!/usr/bin/perl use strict; use warnings; use File::ReadBackwards; my $pattern = shift; my $rev = File::ReadBackwards->new(shift) or die "$!"; while (defined($_ = $rev->readline)) { print if /$pattern/; } $rev->close;

Then:

$ ./reverse-grep.pl pattern file

The advantage of this approach is that you can tweak the Perl to do anything you want. — zzapper
– zzapper, Commented Jul 24, 2014 at 15:52
@zzapper: It's memory efficient, too, since when it read file line by line instead of slurp file in memory like tac. — cuonglm
– cuonglm, Commented Jul 24, 2014 at 15:54
can anyone add a -m support for this ? I'd like to test in on real files. See : gist.githubusercontent.com/ychaouche/… — ychaouche
– ychaouche, Commented Nov 5, 2018 at 14:29

Stack Exchange Network

Reverse grepping

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

Reverse grepping

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions