How to append some line matched to previous line matched with sed?

Question

For example, transfer below

00:00:10.730 this presentation is delivered by the 00:00:13.230 Stanford center for professional 00:00:14.610 development okay so let's get started 00:00:25.500 with today's material so um welcome back 00:00:32.399 to the second lecture what I want to do

to

00:00:10.730 --> 00:00:13.230 this presentation is delivered by the 00:00:13.230 --> 00:00:14.610 Stanford center for professional 00:00:14.610 --> 00:00:25.500 development okay so let's get started 00:00:25.500 --> 00:00:32.399 with today's material so um welcome back 00:00:32.399 to the second lecture what I want to do

Use something like gaupol for munging subtitles?

Satō Katsura
– Satō Katsura

2017-06-08 10:58:23 +00:00
Commented Jun 8, 2017 at 10:58 — Satō Katsura
– Satō Katsura, Commented Jun 8, 2017 at 10:58

user218374 · Accepted Answer · 2017-06-08 14:39:14Z

For sake of code clarity, we are using GNU sed:

sed -nE ' /^([0-9][0-9]:){2}[0-9]+[.][0-9]+/!{p;d;} h;:a $bb;n;H /^([0-9][0-9]:){2}[0-9]+[.][0-9]+/!ba :b x y/\n_/_\n/ s/^([^_]*)_(.*)_([^_]*)$/\1 ---> \3_\2/ y/\n_/_\n/ p;g;$!s/^/\n/;D ' yourfile

Results

00:00:10.730 ---> 00:00:13.230 this presentation is delivered by the 00:00:13.230 ---> 00:00:14.610 Stanford center for professional 00:00:14.610 ---> 00:00:25.500 development okay so let's get started 00:00:25.500 ---> 00:00:32.399 with today's material so um welcome back 00:00:32.399 to the second lecture what I want to do

Explanation

We keep range of lines from number to next number.
Then at the end of range, the last portion is brought forward and the range printed, also the pattern space is cleared out and the end of range used to fill it and then using this value of pattern space, the control is transferred to the top of sed code for starting the cycle all over again from the current end of range till the next number or till we hit the eof.

RomanPerekhrest · Accepted Answer · 2017-06-08 12:00:59Z

With single gawk approach for relatively "small" (by size) files:

awk 'BEGIN{ RS=""; FS="[[:space:]]+" } { c++; a[c]["t"]=$1; a[c]["s"]=substr($0,length($1)+2) } END { len=length(a); for(i=1;i<=len;i++) { if((i+1)<=len){ printf("%s --> %s\n%s\n\n",a[i]["t"],a[i+1]["t"],a[i]["s"]) } else { printf("%s\n%s\n",a[i]["t"],a[i]["s"]) } } }' file

The output:

00:00:10.730 --> 00:00:13.230 this presentation is delivered by the 00:00:13.230 --> 00:00:14.610 Stanford center for professional 00:00:14.610 --> 00:00:25.500 development okay so let's get started 00:00:25.500 --> 00:00:32.399 with today's material so um welcome back 00:00:32.399 to the second lecture what I want to do

@SatoKatsura, added a note to my answer. Might be used for "small" files — RomanPerekhrest
– RomanPerekhrest, Commented Jun 8, 2017 at 12:02

Satō Katsura · Accepted Answer · 2017-06-09 07:48:17Z

With GNU sed and tac:

tac file | \ sed -E '/^[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}$/ { H; x; s/^\n//; s/\n/ --> /; }' | \ tac

The same could be written with traditional sed (i.e. without -E), but it would be more verbose.

With GNU awk and tac:

tac file | \ gawk --re-interval ' /^[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3} --> / { old = $1 } /^[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}$/ { if(old != "") $0 = $0 " --> " old; old = $1 } 1' | \ tac

Please note that the awk version can handle time intervals such as 00:00:14.610 --> 00:00:25.500 in the input file, while the sed version is fooled by them.

Note also that tac can be emulated with sed:

sed -n '1!G; $p; h'

or like this:

sed '1!G; h; $!d'

However both forms will load the entire input file in memory, so they aren't very efficient.

Result:

00:00:10.730 --> 00:00:13.230 this presentation is delivered by the 00:00:13.230 --> 00:00:14.610 Stanford center for professional 00:00:14.610 --> 00:00:25.500 development okay so let's get started 00:00:25.500 --> 00:00:32.399 with today's material so um welcome back 00:00:32.399 to the second lecture what I want to do

Philippos · Accepted Answer · 2019-08-28 18:10:46Z

I see loops or piping to other tools in the given answers, and I don't like that if it is not neccessary. I like one-liners:

sed -E '/^[0-9:.]+$/{x;G;s/(.*)\n(.*)\n(\n)(.*)/\1 --> \4\3\2\3/p;d;};H;$!d;x'

But let's go step by step:

I use ^[0-9:.]+$ as extended regular expression for the time stamp line. This should be sufficient in the real world, but feel free to make it more precise. I use this pattern as an address, so everything inside the {} pair is executed for the timestamp lines only.
Obviously we need to keep everything in mind until the next timestamp comes. Keeping in mind means appending to the hold space in sed
Thus, each time we meet a timestamp, we assume everything since the last time stamp resides in the hold space. So we append the current timestamp to the Hold space and exchange pattern and hold space. This way the current time stamp is already save in the hold space for the next cycle, while everything we need, is in the pattern space
We just need to reorganize it with substitute: s/(.*)\n(.*)\n(\n)(.*)/\1 --> \4\3\2\3/ -- \1 is the starting timestamp, \2 is the text line, \3 is a newline (we need that in the replacement, but POSIX doesn't define \n in the replacement) and \4 is the ending time stamp. Looks more complicated than it is.
Adding option p to the substitution and then deleting the pattern space keeps us from unwanted output for the first line when the hold space was still empty.
Now all what's left is to append other lines to the Hold space and
for the last line exchange buffers again, so lines collected in the hold space will get printed even without closing timestamp

If someone still feels sed is not elegant, I can't help.

Stack Exchange Network

How to append some line matched to previous line matched with sed?

4 Answers 4

Results

Explanation

You must log in to answer this question.

Hot Network Questions

How to append some line matched to previous line matched with sed?

4 Answers 4

Results

Explanation

You must log in to answer this question.

Related

Hot Network Questions