Print everything between two patterns, then delete first and last line of the resulting output [duplicate]

Question

otherdata otherdata start_data one two three four end_data otherdata otherdata

The resulting output should just be:

one two three four

This looked like a job for sed to me:

sed -n '/start_data/,/end_data/{1d;$d;p}' myfile

Did not work. First line was deleted, but not the last line! (for no reason that I could explain by logic so far)

OK, so let's try the ugly way:

sed -n '/start_data/,/end_data/{/start_data\|end_data/!p}' myfile

Fair enough, this works. But I'd like to make the shorter method work as well, as the resulting output will always contain the two patterns on first and last line, since we're only extracting the data in between.

Why does sed choke at the attempt of combining the 1d and $d statements in curly braces?

Seriously. Define "between". Define "print, then delete". What do you mean, "... the resulting output will always contain the two patterns on first and last line ..."? Do you want start_data and end_data in your output or don't you? — G-Man Says 'Reinstate Monica'
– G-Man Says 'Reinstate Monica', Commented Jun 18, 2015 at 5:44
@G-Man "Resulting output" refers to the output between the two pattern matches A and B (both of which I want to exclude). And from plain logic, what you get out will always have pattern A as first line and pattern B as the last, so a simple sed statement that does a d on first and last line will do. — syntaxerror
– syntaxerror, Commented Jun 18, 2015 at 8:47
@Cyrus Oops!! Good catch. Totally forgot to specify my output... — syntaxerror
– syntaxerror, Commented Jun 18, 2015 at 8:52

Stéphane Chazelas · Accepted Answer · 2015-06-18 09:59:25Z

4

You can reverse the logic:

sed '1,/start_data/d;/end_data/,$d'

That assumes start_data is not on the first line. To work around that, if you have GNU sed, you can make it instead:

sed '0,/start_data/d;/end_data/Q'

That 0 and Q are GNU-specific. Q quits sed without printing the pattern space, so that would also make it more efficient as it wouldn't keep reading and discarding the rest of the file as with the first solution.

edited Jun 18, 2015 at 9:59

Stéphane Chazelas

586k96 gold badges1.1k silver badges1.7k bronze badges

answered Jun 18, 2015 at 4:37

jimmij

48.7k20 gold badges136 silver badges141 bronze badges

Though this solution is very elegant, it gives a null string. :(( Have you actually tested your line?

syntaxerror
– syntaxerror

2015-06-18 08:43:08 +00:00
Commented Jun 18, 2015 at 8:43
@syntaxerror works fine on your example with GNU sed version 4.2.1.

jimmij
– jimmij

2015-06-18 09:26:51 +00:00
Commented Jun 18, 2015 at 9:26
1

@syntaxerror. It certainly works on the sample you provided. It would only give an empty output if start_data was not found in the input or it was only found on the first line (with GNU sed you can replace 1 with 0 to work around that), or if there's end_data on the next line after the first one containing start_data.

Stéphane Chazelas
– Stéphane Chazelas

2015-06-18 09:27:51 +00:00
Commented Jun 18, 2015 at 9:27
YES! This works. Thank you very much. I do have GNU sed here and the 1 must be replaced by 0. So what you're saying is, for the 1 case, pattern #1 must be preceded by some other data, or sed will fail. Of course, these things may happen :) Gladly the 0 variant will also work in both cases, just tried with one of my very large lists.

syntaxerror
– syntaxerror

2015-06-18 09:36:19 +00:00
Commented Jun 18, 2015 at 9:36
Congrats, you've just gained a 50 percent clarity boost with the update of your answer. :) Looks near-perfect now, well done. (Those freaking GNU-isms every time, grrrr. :-@)

syntaxerror
– syntaxerror

2015-06-18 14:31:04 +00:00
Commented Jun 18, 2015 at 14:31

Add a comment |

cuonglm · Accepted Answer · 2015-06-18 01:22:55Z

awk seems to be a good fit to this problem:

$ awk '/end_data/{f=0;};f{print;};/start_data/{f=1;}' myfile one two three four

The above uses the flag f to decide if a line should be printed. When start_data, the flag is set to true (1). When end_data is found, the flag is set to false (0). When f is true, the line is printed.

Why does sed choke at the attempt of combining the 1d and $d statements in curly braces?

It is not "choking." It is just that 1d and $d refer to the first and last lines in the file, not the first and last lines in the pattern.

Yes, that's what I had assumed! That is, when I match between two patterns, that the 1d and $d will refer to the resulting output after I "filtered" the data by restricting the content between start_data and end_data, not the source file as-is. — syntaxerror
– syntaxerror, Commented Jun 18, 2015 at 0:53

mikeserv · Accepted Answer · 2015-06-19 00:06:13Z

3

Well, this works:

sed -ne/start_data/!d\;:n -e'n;/end_data/q;p;bn' <in

It doesn't even attempt to print until it encounters /start_pattern/ and from that address on through to the last line, it will replace the current line w/ the next, quit input entirely if the newline pulled in matches /end_data/, or else print. And that's all.The output is, given your sample data:

one two three four

It won't recognize a line as an end_data match if it also matches the first start_data line which occurs in input.

edited Jun 19, 2015 at 0:06

answered Jun 18, 2015 at 4:52

mikeserv

59.4k10 gold badges122 silver badges242 bronze badges

@StéphaneChazelas Right, just noticed that by trial-and-error. :)

syntaxerror
– syntaxerror

2015-06-18 11:38:15 +00:00
Commented Jun 18, 2015 at 11:38
Great update, Mike. Good work.

syntaxerror
– syntaxerror

2015-06-18 23:55:33 +00:00
Commented Jun 18, 2015 at 23:55
@syntaxerror - I really couldn't follow what you said before, but I read it right after I woke up, so, in fairness, I don't think I was all there when I did.

mikeserv
– mikeserv

2015-06-19 00:03:42 +00:00
Commented Jun 19, 2015 at 0:03
1

Heh, never do that. A cup of coffee always works wonders (at least for me). :P

syntaxerror
– syntaxerror

2015-06-19 00:48:27 +00:00
Commented Jun 19, 2015 at 0:48
Thanks. :) Upvote done now, since you've begged for it. See, you got 20k rep, so my upvote is just like a water drop in a big sea. You can't be serious to require my upvote to be happy. ;-) With 20k, you've achieved everything imaginable that can be achieved on this site. I would accept tens of downvotes per week should I ever get to enter this rep zone...I'd just not care anymore.

syntaxerror
– syntaxerror

2015-06-19 08:33:28 +00:00
Commented Jun 19, 2015 at 8:33

| Show 1 more comment

kos · Accepted Answer · 2015-06-18 01:46:39Z

You have an answer to your question already; I'll throw in another way of doing this using Perl.

< inputfile perl -0777 -pe 's/^(.*\n)*?start_data.*\n((.*\n)*?)end_data(.*\n)*/$2/'

-0777: slurps the whole file at once instead of one line at the time
-p: places a while (<>) {[...]} loop around the script and prints the processed file
-e: reads the script from the arguments

Perl command breakdown:

s: asserts to perform a substitution
/: starts the pattern
^: matches the start of the file
(.*\n)*?: matches any number of any character greedily within the current line and a newline, zero or more times lazily within the current file (i.e. it matches the least times as possible, stopping when the following pattern starts to match)
start_data.*\n: matches a start_data string, any number of any character greedily within the current line and a newline
((.*\n)*?): groups and matches any number of any character greedily within the current line and a newline, zero or more times lazily within the current file (i.e. it matches the least times as possible, stopping when the following pattern starts to match)
end_data: matches an end_data string
(.*\n)*: matches any number of any character greedily within the current line and a newline, zero or more times greedily within the current file (i.e. it matches the the most times as possible)
/: stops the pattern / starts the replacement string
$2: replaces with the second captured group
/: stops the replacement string / starts the modifiers

+1 because of your great effort in explaining perl crypto-lingo to rookies ;-) — syntaxerror
– syntaxerror, Commented Jun 18, 2015 at 8:36

Scott - Слава Україні · Accepted Answer · 2015-06-18 18:02:20Z

Here, let me make a trivial, cosmetic modification to the input file provided in the question:

% cat myfile red orange start_data one two three four end_data yellow green

I have simply replaced the otherdata lines with distinct other data, so we can refer to every line in the input file uniquely, by content, without having to say “the first line”, since that is apparently subject to misinterpretation, or “the first otherdata line”, which is a little verbose (and, for all I know, also maybe subject to misinterpretation).

Now, probably the closest thing you're going to find to your first attempt is

% sed -n '/start_data/,/end_data/p' myfile | sed '1d;$d' one two three four

Your first attempt (sed -n '/start_data/,/end_data/{1d;$d;p}' myfile) "chokes" because (as John1024 said) line 1 is the red line^* and line $ is the green line^**. The 1d;$d; has no effect because those lines (along with, in fact, all of the otherdata/colordata lines) are already excluded by the /start_data/,/end_data/ range.
__________
^* i.e., the first line in the entire input file, not just the matched range
^** i.e., the last line in the entire input file, not just the matched range

By the way, are you saying that your command produced the following output?

one two three four end_data

Because that doesn't make sense, unless start_data was line 1 (i.e., if red and orange were absent).

"because those lines are already excluded by the /start_data/,/end_data/ range" Nope, in fact they are NOT. sed -n '/start_data/,/end_data/p' myfile WILL print both start_data and end_data patterns, each on their own line (first/last). If you don't believe me, try it out. :) — syntaxerror
– syntaxerror, Commented Jun 18, 2015 at 11:41
@syntaxerror: You're not reading what John and I are saying!!! As long as you do a single sed command (in contrast to my answer, which does sed … | sed …), line 1 is the red line and line $ is line 10 which is the green line. And the /start_data/,/end_data/ range excludes the color lines (a.k.a. the otherdata lines in your question). If you don't believe me, try sed -n '/start_data/,/end_data/{4d;p}' myfile (but first, guess what the output will be). — Scott - Слава Україні
– Scott - Слава Україні, Commented Jun 18, 2015 at 11:56
To begin with, I have no idea why you're always referring to red and green...is this some allusion to sports which I don't get perhaps? ;) Or to a traffic light? Symbolism is great, but it's always hard to grasp without explaining the symbols first... P.S. Nevertheless, the downvote is not from me. — syntaxerror
– syntaxerror, Commented Jun 18, 2015 at 14:33
(0) @syntaxerror: I have edited my answer to clarify the use of the spectrum. … … … … … … … … … Also, would somebody care to explain the downvote? My answer (1) solves the problem, in sed (as the OP requested), with a command that, as far as I can tell, works correctly for all reasonable variations of the input data (e.g., start_data on the first line or end_data on the last) without requiring any GNUisms, and also (2) takes another crack at answering the (explicit) question, “Why does sed choke” on the OP’s first attempt (sed -n '/start_data/,/end_data/{1d;$d;p}' myfile)? — Scott - Слава Україні
– Scott - Слава Україні, Commented Jun 18, 2015 at 18:05

Stack Exchange Network

Print everything between two patterns, then delete first and last line of the resulting output [duplicate]

5 Answers 5

Linked

Hot Network Questions

Print everything between two patterns, then delete first and last line of the resulting output [duplicate]

5 Answers 5

Linked

Related

Hot Network Questions