sed regular expression failure

Question

I have a sample file with the contents:

Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted on /dev/disk0s2 467182912 419318824 47352088 90% 52478851 5919011 90% / devfs 419 419 0 100% 727 0 100% /dev /dev/disk1s2 975093952 673515008 301578944 70% 84189374 37697368 69% /Volumes/Local_Storage map -hosts 0 0 0 100% 0 0 100% /net map auto_home 0 0 0 100% 0 0 100% /home localhost:/l3ZTI82fIEDeEEIvUkf44A 467182912 467182912 0 100% 0 0 100% /Volumes/MobileBackups /dev/disk2s2 1952853344 1925763856 27089488 99% 240720480 3386186 99% /Volumes/SK Backup /dev/disk3s2 199328216 88909928 110418288 45% 11113739 13802286 45% /Volumes/Secure_Storage /dev/disk4s2 59328216 51456432 7871784 87% 6432052 983973 87% /Volumes/Secure /dev/disk5s2 60000000 12713448 47286552 22% 1589179 5910819 21% /Volumes/Secure_Personal //[email protected]/Storage 4294701048 1128302984 3166398064 27% 141037871 395799758 26% /Volumes/Storage /dev/disk6s2 200000 9952 190048 5% 1242 23756 5% /Volumes/VAULT //[email protected]/chris.s 467182912 437521864 29661048 94% 54690231 3707631 94% /Volumes/chris.schmitz //chris@hq-srv03/NET 167563256 50264576 117298680 30% 0 18446744073709551615 0% /Volumes/NETLOGON

And I'm working on pulling out just the ip addresses and host names from the file. Right now I'm working on grabbing the ips using the following pattern:

cat dfsample.txt | awk '/@/' | sed -E 's/.*([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/\1/g'

With unexpected results:

//[email protected]/Storage 4294701048 1128302984 3166398064 27% 141037871 395799758 26% /Volumes/Storage 2.20.1.76 //chris@hq-srv03/NET 167563256 50264576 117298680 30% 0 18446744073709551615 0% /Volumes/NETLOGON

My expectation for the sed section was that the .* before and after the pattern defined in the parens would select the entire line and when I substituted the line for the pattern found within the parens using the \1 it would substitute the entire line with the found pattern leaving only the ip address.

For some reason the first two digits of my ip address is getting cut off. When I try the pattern in the parens in sublime it finds the ip without an issue. What is it that I'm missing?

devnull · Accepted Answer · 2014-03-27 03:08:36Z

The problem is that you sed would, by default, print lines whether the pattern matched or not. Use -n to disable automatic printing of pattern space and p to print the current pattern space:

sed -En '/@/{s/.*([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/\1/p;}' inputfile

This would produce 2.20.1.76 for your input. Also note that you don't need the awk pipeline in order to filter the data.

+1; to make it work on OSX, a ; must go before the closing } (alternatively, don't enclose the s command in {...} at all).
:) The OP doesn't say so explicitly, but the use of -E suggests OSX / BSD.

jthill · Accepted Answer · 2014-03-27 18:54:41Z

1

sed -nr 's,^//[^@/]*@([^/]*)/.*,\1,p'

gets both hostnames and IP addresses and won't be fooled by "interesting" volume names.

If your sed doesn't have the r flag, the escaping isn't too ugly on this one, I probably should have given it just this way:

sed -n 's,^//[^@/]*@\([^/]*\)/.*,\1,p'

(edit: [^@] -> [^@/] safety play)

edited Mar 27, 2014 at 18:54

answered Mar 27, 2014 at 3:07

jthill

62.1k5 gold badges91 silver badges153 bronze badges

6 Comments

mklement0 Over a year ago

+1 for providing a complete solution; using -E instead of -r should work for the OP (incidentally, -E works with GNU sed too (as an alias for -r), but it's not documented).

Chris Schmitz Over a year ago

So if I understand this correctly, the regular expression in the first sed section is basically saying "find all lines beginning with a double forward slash, 0 or more characters that are not "@" until you get to "@", then zero or more characters that are not a forward slash (remember this pattern) until you get to a forward slash, then zero or more of any characters not including a new line, and substitute that with the stored pattern and print it", Right? It makes sense, I just want to make sure I fully understand the why behind the pattern.

Chris Schmitz Over a year ago

One quick additional question. When I put this into my actual bash script, set the result to a variable, and then check it by echoing the variable it returns the result all on one line. If I fire df | sed -nE 's|//[^@]*@([^/]*)/.*|\1|p' directly from the command line it returns the results on separate lines. I've tried inserting a new line after each \1 to force a new line but it doesn't seem to do it either. Is there a reason the variable gets everything inline?

jthill Over a year ago

You set the variable to the entire result so the newlines are just field separators. How newlines are treated is context-dependent and idiosyncratic with just about every tool, you have to get a feel for when they'll be treated as something more significant than ordinary whitespace. Think of it like english spelling, it's the way it is because it made sense to somebody, somewhere, and it's too late to fix it now.

Chris Schmitz Over a year ago

Ah, understood. I read up on how the internal field separator works and also on how I would pass the results of my command into an array. It turns out I didn't need to alter the IFS value since host names and ip addresses inherently wouldn't have spaces in between them, but it was still good to read about the IFS. I altered my variable declaration to externalMounts=($( df | sed -nE 's|//[^@]*@([^/]*)/.*[^$]|\1|p' )) and now it stacks the results neatly into an array that I can then walk over with the rest of my script. Thanks again!

|

Jotne · Accepted Answer · 2014-03-27 06:15:28Z

Here is how to do it with awk

awk '/@/ {split($1,a,"[@/]");print a[4]}' file SK-HQ-SRV05.internal.com 172.20.1.76 hq-srv03

This finds all line with @, then split the line by @ or /
It then prints part 4 from the split.

Collectives™ on Stack Overflow

sed regular expression failure

3 Answers 3

3 Comments

6 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

6 Comments

Comments

Related