3
otherdata otherdata start_data one two three four end_data otherdata otherdata 

The resulting output should just be:

one two three four 

This looked like a job for sed to me:

sed -n '/start_data/,/end_data/{1d;$d;p}' myfile 

Did not work. First line was deleted, but not the last line! (for no reason that I could explain by logic so far)

OK, so let's try the ugly way:

sed -n '/start_data/,/end_data/{/start_data\|end_data/!p}' myfile 

Fair enough, this works. But I'd like to make the shorter method work as well, as the resulting output will always contain the two patterns on first and last line, since we're only extracting the data in between.

Why does sed choke at the attempt of combining the 1d and $d statements in curly braces?

7
  • 1
    Show desired output for that sample input. Commented Jun 18, 2015 at 3:15
  • Seriously.  Define "between".  Define "print, then delete".  What do you mean, "... the resulting output will always contain the two patterns on first and last line ..."?  Do you want start_data and end_data in your output or don't you? Commented Jun 18, 2015 at 5:44
  • @G-Man "Resulting output" refers to the output between the two pattern matches A and B (both of which I want to exclude). And from plain logic, what you get out will always have pattern A as first line and pattern B as the last, so a simple sed statement that does a d on first and last line will do. Commented Jun 18, 2015 at 8:47
  • @Cyrus Oops!! Good catch. Totally forgot to specify my output... Commented Jun 18, 2015 at 8:52
  • Thats sed FAQ 4.24 Commented Jun 18, 2015 at 9:22

5 Answers 5

4

You can reverse the logic:

sed '1,/start_data/d;/end_data/,$d' 

That assumes start_data is not on the first line. To work around that, if you have GNU sed, you can make it instead:

sed '0,/start_data/d;/end_data/Q' 

That 0 and Q are GNU-specific. Q quits sed without printing the pattern space, so that would also make it more efficient as it wouldn't keep reading and discarding the rest of the file as with the first solution.

5
  • Though this solution is very elegant, it gives a null string. :(( Have you actually tested your line? Commented Jun 18, 2015 at 8:43
  • @syntaxerror works fine on your example with GNU sed version 4.2.1. Commented Jun 18, 2015 at 9:26
  • 1
    @syntaxerror. It certainly works on the sample you provided. It would only give an empty output if start_data was not found in the input or it was only found on the first line (with GNU sed you can replace 1 with 0 to work around that), or if there's end_data on the next line after the first one containing start_data. Commented Jun 18, 2015 at 9:27
  • YES! This works. Thank you very much. I do have GNU sed here and the 1 must be replaced by 0. So what you're saying is, for the 1 case, pattern #1 must be preceded by some other data, or sed will fail. Of course, these things may happen :) Gladly the 0 variant will also work in both cases, just tried with one of my very large lists. Commented Jun 18, 2015 at 9:36
  • Congrats, you've just gained a 50 percent clarity boost with the update of your answer. :) Looks near-perfect now, well done. (Those freaking GNU-isms every time, grrrr. :-@) Commented Jun 18, 2015 at 14:31
3

awk seems to be a good fit to this problem:

$ awk '/end_data/{f=0;};f{print;};/start_data/{f=1;}' myfile one two three four 

The above uses the flag f to decide if a line should be printed. When start_data, the flag is set to true (1). When end_data is found, the flag is set to false (0). When f is true, the line is printed.

Why does sed choke at the attempt of combining the 1d and $d statements in curly braces?

It is not "choking." It is just that 1d and $d refer to the first and last lines in the file, not the first and last lines in the pattern.

1
  • Yes, that's what I had assumed! That is, when I match between two patterns, that the 1d and $d will refer to the resulting output after I "filtered" the data by restricting the content between start_data and end_data, not the source file as-is. Commented Jun 18, 2015 at 0:53
3

Well, this works:

sed -ne/start_data/!d\;:n -e'n;/end_data/q;p;bn' <in 

It doesn't even attempt to print until it encounters /start_pattern/ and from that address on through to the last line, it will replace the current line w/ the next, quit input entirely if the newline pulled in matches /end_data/, or else print. And that's all.The output is, given your sample data:

one two three four 

It won't recognize a line as an end_data match if it also matches the first start_data line which occurs in input.

6
  • @StéphaneChazelas Right, just noticed that by trial-and-error. :) Commented Jun 18, 2015 at 11:38
  • Great update, Mike. Good work. Commented Jun 18, 2015 at 23:55
  • @syntaxerror - I really couldn't follow what you said before, but I read it right after I woke up, so, in fairness, I don't think I was all there when I did. Commented Jun 19, 2015 at 0:03
  • 1
    Heh, never do that. A cup of coffee always works wonders (at least for me). :P Commented Jun 19, 2015 at 0:48
  • Thanks. :) Upvote done now, since you've begged for it. See, you got 20k rep, so my upvote is just like a water drop in a big sea. You can't be serious to require my upvote to be happy. ;-) With 20k, you've achieved everything imaginable that can be achieved on this site. I would accept tens of downvotes per week should I ever get to enter this rep zone...I'd just not care anymore. Commented Jun 19, 2015 at 8:33
1

You have an answer to your question already; I'll throw in another way of doing this using Perl.

< inputfile perl -0777 -pe 's/^(.*\n)*?start_data.*\n((.*\n)*?)end_data(.*\n)*/$2/' 
  • -0777: slurps the whole file at once instead of one line at the time
  • -p: places a while (<>) {[...]} loop around the script and prints the processed file
  • -e: reads the script from the arguments

Perl command breakdown:

  • s: asserts to perform a substitution
  • /: starts the pattern
  • ^: matches the start of the file
  • (.*\n)*?: matches any number of any character greedily within the current line and a newline, zero or more times lazily within the current file (i.e. it matches the least times as possible, stopping when the following pattern starts to match)
  • start_data.*\n: matches a start_data string, any number of any character greedily within the current line and a newline
  • ((.*\n)*?): groups and matches any number of any character greedily within the current line and a newline, zero or more times lazily within the current file (i.e. it matches the least times as possible, stopping when the following pattern starts to match)
  • end_data: matches an end_data string
  • (.*\n)*: matches any number of any character greedily within the current line and a newline, zero or more times greedily within the current file (i.e. it matches the the most times as possible)
  • /: stops the pattern / starts the replacement string
  • $2: replaces with the second captured group
  • /: stops the replacement string / starts the modifiers
1
  • +1 because of your great effort in explaining perl crypto-lingo to rookies ;-) Commented Jun 18, 2015 at 8:36
1

Here, let me make a trivial, cosmetic modification to the input file provided in the question:

% cat myfile red orange start_data one two three four end_data yellow green 

I have simply replaced the otherdata lines with distinct other data, so we can refer to every line in the input file uniquely, by content, without having to say “the first line”, since that is apparently subject to misinterpretation, or “the first otherdata line”, which is a little verbose (and, for all I know, also maybe subject to misinterpretation).

Now, probably the closest thing you're going to find to your first attempt is

% sed -n '/start_data/,/end_data/p' myfile | sed '1d;$d' one two three four 

Your first attempt (sed -n '/start_data/,/end_data/{1d;$d;p}' myfile) "chokes" because (as John1024 said) line 1 is the red line* and line $ is the green line**. The 1d;$d; has no effect because those lines (along with, in fact, all of the otherdata/colordata lines) are already excluded by the /start_data/,/end_data/ range.
__________
*  i.e., the first line in the entire input file, not just the matched range
** i.e., the last line in the entire input file, not just the matched range


By the way, are you saying that your command produced the following output?

one two three four end_data 

Because that doesn't make sense, unless start_data was line 1 (i.e., if red and orange were absent).

4
  • "because those lines are already excluded by the /start_data/,/end_data/ range" Nope, in fact they are NOT. sed -n '/start_data/,/end_data/p' myfile WILL print both start_data and end_data patterns, each on their own line (first/last). If you don't believe me, try it out. :) Commented Jun 18, 2015 at 11:41
  • @syntaxerror: You're not reading what John and I are saying!!!  As long as you do a single sed command (in contrast to my answer, which does sed … | sed …), line 1 is the red line and line $ is line 10 which is the green line.  And the /start_data/,/end_data/ range excludes the color lines (a.k.a. the otherdata lines in your question).  If you don't believe me, try sed -n '/start_data/,/end_data/{4d;p}' myfile (but first, guess what the output will be). Commented Jun 18, 2015 at 11:56
  • To begin with, I have no idea why you're always referring to red and green...is this some allusion to sports which I don't get perhaps? ;) Or to a traffic light? Symbolism is great, but it's always hard to grasp without explaining the symbols first... P.S. Nevertheless, the downvote is not from me. Commented Jun 18, 2015 at 14:33
  • (0) @syntaxerror: I have edited my answer to clarify the use of the spectrum.  … … … … … … … … …  Also, would somebody care to explain the downvote?   My answer (1) solves the problem, in sed (as the OP requested), with a command that, as far as I can tell, works correctly for all reasonable variations of the input data (e.g., start_data on the first line or end_data on the last) without requiring any GNUisms, and also (2) takes another crack at answering the (explicit) question, “Why does sed choke” on the OP’s first attempt (sed -n '/start_data/,/end_data/{1d;$d;p}' myfile)? Commented Jun 18, 2015 at 18:05

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.