sed match either beginning of line or character

Question

I have a string of colon separated assignments, each of which is of the form a=b. I need to parse it to extract foo, where foo is ...:di=foo:.... The assignment di=foo could happen at the beginning, in the middle, or at the end of the string.

My idea was to match either a beginning of line or a colon, then the string di=, then every character except a colon, then a colon or an end of line.

I've only managed to get the "every character except a colon" part to work.

Some tests:

echo "di=a;b:*.di=c;d:ddi=e;f" | sed "s/.*di=\([^:]*\):.*/\1/" echo "ddi=a;b:di=c;d:*.di=e;f" | sed "s/.*di=\([^:]*\):.*/\1/" echo "*.di=a;b:ddi=c;d:di=e;f" | sed "s/.*di=\([^:]*\):.*/\1/"

the first one should return a;b, the second one c;d and the third one e;f, but for now they all return c;d.

Miller might be a good alternative here ex. mlr --fs : --onidx cut -f di — steeldriver
– steeldriver, Commented Dec 1, 2020 at 1:14

Freddy · Accepted Answer · 2020-12-01 01:24:00Z

My idea was to match either a beginning of line or a colon, then the string di=, then every character except a colon, then a colon or an end of line.

You don't need to match "then a colon or an end of line" (as in your example).

{ echo "di=a;b:*.di=c;d:ddi=e;f" echo "ddi=a;b:di=c;d:*.di=e;f" echo "*.di=a;b:ddi=c;d:di=e;f" } | sed 's/\(^\|.*:\)di=\([^:]*\).*/\2/'

Outputs:

a;b c;d e;f

\(^\|.*:\) matches the beginning of the line or any characters followed by a colon

Stephen Harris · Accepted Answer · 2020-12-01 01:21:32Z

For cases like this, I tend to cheat and add a : to the front and end, so remove the special cases; the matching is now always for :a=foo:

So:

sed -e 's/^/:/' -e 's/$/:/' -e 's/.*:di=\([^:]*\):.*/\1/'

It can be optimised

sed -e 's/^\(.*\)$/:\1:/' -e 's/.*:di=\([^:]*\):.*/\1/'

The results:

% echo "di=a;b:*.di=c;d:ddi=e;f" | sed -e 's/^/:/' -e 's/$/:/' -e 's/.*:di=\([^:]*\):.*/\1/' a;b % echo "ddi=a;b:di=c;d:*.di=e;f" | sed -e 's/^/:/' -e 's/$/:/' -e 's/.*:di=\([^:]*\):.*/\1/' c;d echo "*.di=a;b:ddi=c;d:di=e;f" | sed -e 's/^/:/' -e 's/$/:/' -e 's/.*:di=\([^:]*\):.*/\1/' e;f

Another cheat might be to convert the : to a newline and then it always matches a=foo without any :

tr : '\012' | sed -n 's/^di=//p'

guest_7 · Accepted Answer · 2020-12-01 08:33:30Z

Posixly, it can be done as shown. Transliterate all colons to newlines then by continuously chopping off the leading KV pair until the di= comes up.

{ echo "di=a;b:*.di=c;d:ddi=e;f" echo "ddi=a;b:di=c;d:*.di=e;f" echo "*.di=a;b:ddi=c;d:di=e;f" } \ | sed -n 'y/:/\n/;/^di=/!D;P' di=a;b di=c;d di=e;f

Kusalananda · Accepted Answer · 2020-12-01 08:43:51Z

Using awk instead of sed, with : and = as field delimiters, walking through each record and printing the next field if a field is found that is di:

$ awk -F '[=:]' '{ for (i = 1; i < NF; ++i) if ($i == "di") { print $(i+1); next } }' file a;b c;d e;f

Similarly, but instead using :, = and newlines as record separators:

$ awk -v RS='[=:\n]' '$0 == "di" { getline; print }' file a;b c;d e;f

This would only work if your awk treats a multi-character value in RS as a regular expression. This last variation would also print each di value on every original line if there are more than one such value (the first variation avoids this by calling next).

Stack Exchange Network

sed match either beginning of line or character

4 Answers 4

You must log in to answer this question.

Hot Network Questions

sed match either beginning of line or character

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions