2

I have a string of colon separated assignments, each of which is of the form a=b. I need to parse it to extract foo, where foo is ...:di=foo:.... The assignment di=foo could happen at the beginning, in the middle, or at the end of the string.

My idea was to match either a beginning of line or a colon, then the string di=, then every character except a colon, then a colon or an end of line.

I've only managed to get the "every character except a colon" part to work.

Some tests:

echo "di=a;b:*.di=c;d:ddi=e;f" | sed "s/.*di=\([^:]*\):.*/\1/" echo "ddi=a;b:di=c;d:*.di=e;f" | sed "s/.*di=\([^:]*\):.*/\1/" echo "*.di=a;b:ddi=c;d:di=e;f" | sed "s/.*di=\([^:]*\):.*/\1/" 

the first one should return a;b, the second one c;d and the third one e;f, but for now they all return c;d.

1
  • Miller might be a good alternative here ex. mlr --fs : --onidx cut -f di Commented Dec 1, 2020 at 1:14

4 Answers 4

3

My idea was to match either a beginning of line or a colon, then the string di=, then every character except a colon, then a colon or an end of line.

You don't need to match "then a colon or an end of line" (as in your example).

{ echo "di=a;b:*.di=c;d:ddi=e;f" echo "ddi=a;b:di=c;d:*.di=e;f" echo "*.di=a;b:ddi=c;d:di=e;f" } | sed 's/\(^\|.*:\)di=\([^:]*\).*/\2/' 

Outputs:

a;b c;d e;f 
  • \(^\|.*:\) matches the beginning of the line or any characters followed by a colon
1

For cases like this, I tend to cheat and add a : to the front and end, so remove the special cases; the matching is now always for :a=foo:

So:

sed -e 's/^/:/' -e 's/$/:/' -e 's/.*:di=\([^:]*\):.*/\1/' 

It can be optimised

sed -e 's/^\(.*\)$/:\1:/' -e 's/.*:di=\([^:]*\):.*/\1/' 

The results:

% echo "di=a;b:*.di=c;d:ddi=e;f" | sed -e 's/^/:/' -e 's/$/:/' -e 's/.*:di=\([^:]*\):.*/\1/' a;b % echo "ddi=a;b:di=c;d:*.di=e;f" | sed -e 's/^/:/' -e 's/$/:/' -e 's/.*:di=\([^:]*\):.*/\1/' c;d echo "*.di=a;b:ddi=c;d:di=e;f" | sed -e 's/^/:/' -e 's/$/:/' -e 's/.*:di=\([^:]*\):.*/\1/' e;f 

Another cheat might be to convert the : to a newline and then it always matches a=foo without any :

tr : '\012' | sed -n 's/^di=//p' 
1

Posixly, it can be done as shown. Transliterate all colons to newlines then by continuously chopping off the leading KV pair until the di= comes up.

{ echo "di=a;b:*.di=c;d:ddi=e;f" echo "ddi=a;b:di=c;d:*.di=e;f" echo "*.di=a;b:ddi=c;d:di=e;f" } \ | sed -n 'y/:/\n/;/^di=/!D;P' di=a;b di=c;d di=e;f 
1

Using awk instead of sed, with : and = as field delimiters, walking through each record and printing the next field if a field is found that is di:

$ awk -F '[=:]' '{ for (i = 1; i < NF; ++i) if ($i == "di") { print $(i+1); next } }' file a;b c;d e;f 

Similarly, but instead using :, = and newlines as record separators:

$ awk -v RS='[=:\n]' '$0 == "di" { getline; print }' file a;b c;d e;f 

This would only work if your awk treats a multi-character value in RS as a regular expression. This last variation would also print each di value on every original line if there are more than one such value (the first variation avoids this by calling next).

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.