4

I have tried this

sed -i '' 's/[0-9]*<>/g' 

But it didn't work.

Example file:

<Number1> </Number8> 

output:

<Number> </Number> 
4
  • you want remove number or line?!?! Your question is unclear Commented Oct 28, 2014 at 13:32
  • 1
    I think the title makes it fairly clear. "How do i remove every number thats surrounded by <>" I even made an example. Commented Oct 28, 2014 at 13:38
  • Can we assume that the < and > follow XML formatting? eg no nesting <<>>, and no un-matched < or > Commented Oct 29, 2014 at 14:48
  • Yeah, they are XML files. Commented Oct 29, 2014 at 14:52

4 Answers 4

3

This is really easy to do with sed, actually. You just get as many as you can in one go, then try, try again:

sed -e :t -e 's/\(<[^<]*\)[0-9]\{1,\}\([^>]*>\)/\1\2/g;tt' 

I tried it with the following random bits of input:

<Number1> 234234 </Nu994845mb6er8>' 234234 <000000000000000000000000000000000000>> <a1> 2 <34b5c> 6 7 def 

And the results were:

<Number> 234234 </Number> 234234 <>> <a> 2 <bc> 6 7 def 

The regex just matches at least one number between a < and a >. It continues to replace that number sequence with nothing at all until it can no longer successfully do so. This is the purpose of the test command.

Else you can do it without a loop like:

sed 's/^/>/;s/\(>[^<>]*\)*[0-9]*/\1/g;s/.//' <<\INPUT <Number1> 234234 </Nu994845mb6er8>' 234234 <000000000000000000000000000000000000>> <a1> 2 <34b5c> 6 7 def INPUT 

OUTPUT

<Number> 234234 </Number>' 234234 <>> <a> 2 <bc> 6 7 def 

It will always skip any > until it encounters a < - so it only affects <[^<>]*> groups. See this if you're interested in why.

0
2

The following works:

sed -i 's/\(<[^0-9>]*\)[0-9]*\([^0-9]*>\)/\1\2/g' filename 
3
  • your code have problem, test this: sed 's/\(<[^0-9]*\)[0-9]*\([^0-9]*>\)/\1\2/g' <<< "<sss>asa1</sss>" Commented Oct 28, 2014 at 13:45
  • 1
    Note that this works only if there is at most one sequence of digits within <...>. For <a1b>, it works, for <a1b2>, it doesn't. You need a loop within sed, if you want to handle the latter case. Commented Oct 28, 2014 at 14:13
  • @Uwe - you don't need a loop with sed - that's just one way. See my edited answer for an example of doing it another way., Commented Oct 31, 2014 at 21:14
2

You either need a loop around a substitution command (possible in both sed and perl), or a nested substitution command (perl only). I prefer the latter approach; it's a bit more general:

perl -pe 's/\<([^>]*)\>/do{$a = $1; $a =~ s,\d,,g; "\<" . $a . "\>"}/ge;' 

Example input:

<a1> 2 <34b5c> 6 7 def 

Output:

<a> 2 <bc> 6 7 def 

Explanation: The -p option says that we want to read the file line by line, execute the script for each line, and print the result (like in sed); -e means that the next argument is the script to be executed.

Essentially, the script is just a substitution command: We look for <, followed by any number of non->-characters, followed by >. The e modifier after the trailing / indicates a special feature of the substitution command: Its replacement part is not a string to be printed, but again a command sequence to be executed. In this command sequence, we first assign the string between < and > (i.e., $1) to a new variable $a, then execute another substitution command on $a that simply replaces every digit (\d) by nothing, and finally return <, followed by the modified string, followed by >. The g modifier (both after the trailing / and the trailing ,) means that the substitution commands should be executed for every matching string, not just for the first one.

If the opening < and the corresponding > can be in different lines, say,

<abc1 opt="def"> 

add the option -0777 (i.e., perl -0777 -pe '...'), so that perl reads the entire file before processing it instead of working line-by-line (slurp mode).

8
  • Do i save the script as script.pl and then run script.pl inputfile?. I am not that experienced with perl. Commented Oct 28, 2014 at 14:42
  • If you want to modify the input file, you can use the -i option as in sed, i.e., perl -i -pe 'PERLCOMMANDS' inputfile. Without the -i option, the modified contents are written to standard output. Commented Oct 28, 2014 at 14:47
  • It might be better to slurp the input, OP doesn't state that the < and > have to be on the same line. Commented Oct 28, 2014 at 20:23
  • That looks more difficult than sed... Commented Oct 29, 2014 at 6:32
  • @mikeserv You're right, in this particular case, a simple loop arond the substitution is sufficient. I hadn't thought about this approach when I wrote the (former) first sentence. For other"replace this by that, but only in the following context" problems, the loop method can get very nasty. Commented Oct 29, 2014 at 11:38
1

short sed way

sed 's/<\([^>]\+\)[0-9]\+>/<\1>/g' file 
3
  • @mikeserv This answer does work perfectly for OPs example data. Maybe saying it would only work for one number would be a better comment. Commented Oct 29, 2014 at 7:52
  • 1
    @mikeserv Oki doke Commented Oct 29, 2014 at 16:12
  • You know what - my answer doesn't work either if < > span line boundaries, in fact. I guess, on second thought, you make a very good point. Commented Oct 29, 2014 at 16:22

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.