How do I remove every number that's surrounded by <>

Question

I have tried this

sed -i '' 's/[0-9]*<>/g'

But it didn't work.

Example file:

<Number1> </Number8>

output:

<Number> </Number>

I think the title makes it fairly clear. "How do i remove every number thats surrounded by <>" I even made an example. — DisplayName
– DisplayName, Commented Oct 28, 2014 at 13:38
Can we assume that the < and > follow XML formatting? eg no nesting <<>>, and no un-matched < or > — Floegipoky
– Floegipoky, Commented Oct 29, 2014 at 14:48

Community · Accepted Answer · 2017-04-13 12:36:37Z

This is really easy to do with sed, actually. You just get as many as you can in one go, then try, try again:

sed -e :t -e 's/\(<[^<]*\)[0-9]\{1,\}\([^>]*>\)/\1\2/g;tt'

I tried it with the following random bits of input:

<Number1> 234234 </Nu994845mb6er8>' 234234 <000000000000000000000000000000000000>> <a1> 2 <34b5c> 6 7 def

And the results were:

<Number> 234234 </Number> 234234 <>> <a> 2 <bc> 6 7 def

The regex just matches at least one number between a < and a >. It continues to replace that number sequence with nothing at all until it can no longer successfully do so. This is the purpose of the test command.

Else you can do it without a loop like:

sed 's/^/>/;s/\(>[^<>]*\)*[0-9]*/\1/g;s/.//' <<\INPUT <Number1> 234234 </Nu994845mb6er8>' 234234 <000000000000000000000000000000000000>> <a1> 2 <34b5c> 6 7 def INPUT

OUTPUT

<Number> 234234 </Number>' 234234 <>> <a> 2 <bc> 6 7 def

It will always skip any > until it encounters a < - so it only affects <[^<>]*> groups. See this if you're interested in why.

unxnut · Accepted Answer · 2014-10-28 14:11:13Z

2

The following works:

sed -i 's/\(<[^0-9>]*\)[0-9]*\([^0-9]*>\)/\1\2/g' filename

edited Oct 28, 2014 at 14:11

answered Oct 28, 2014 at 13:40

unxnut

6,1242 gold badges22 silver badges28 bronze badges

your code have problem, test this: sed 's/$<[^0-9]*$[0-9]*$[^0-9]*>$/\1\2/g' <<< "<sss>asa1</sss>"

Baba
– Baba

2014-10-28 13:45:14 +00:00
Commented Oct 28, 2014 at 13:45
1

Note that this works only if there is at most one sequence of digits within <...>. For <a1b>, it works, for <a1b2>, it doesn't. You need a loop within sed, if you want to handle the latter case.

Uwe
– Uwe

2014-10-28 14:13:27 +00:00
Commented Oct 28, 2014 at 14:13
@Uwe - you don't need a loop with sed - that's just one way. See my edited answer for an example of doing it another way.,

mikeserv
– mikeserv

2014-10-31 21:14:20 +00:00
Commented Oct 31, 2014 at 21:14

Add a comment |

Uwe · Accepted Answer · 2014-10-29 11:30:47Z

You either need a loop around a substitution command (possible in both sed and perl), or a nested substitution command (perl only). I prefer the latter approach; it's a bit more general:

perl -pe 's/\<([^>]*)\>/do{$a = $1; $a =~ s,\d,,g; "\<" . $a . "\>"}/ge;'

Example input:

<a1> 2 <34b5c> 6 7 def

Output:

<a> 2 <bc> 6 7 def

Explanation: The -p option says that we want to read the file line by line, execute the script for each line, and print the result (like in sed); -e means that the next argument is the script to be executed.

Essentially, the script is just a substitution command: We look for <, followed by any number of non->-characters, followed by >. The e modifier after the trailing / indicates a special feature of the substitution command: Its replacement part is not a string to be printed, but again a command sequence to be executed. In this command sequence, we first assign the string between < and > (i.e., $1) to a new variable $a, then execute another substitution command on $a that simply replaces every digit (\d) by nothing, and finally return <, followed by the modified string, followed by >. The g modifier (both after the trailing / and the trailing ,) means that the substitution commands should be executed for every matching string, not just for the first one.

If the opening < and the corresponding > can be in different lines, say,

<abc1 opt="def">

add the option -0777 (i.e., perl -0777 -pe '...'), so that perl reads the entire file before processing it instead of working line-by-line (slurp mode).

Do i save the script as script.pl and then run script.pl inputfile?. I am not that experienced with perl. — DisplayName
– DisplayName, Commented Oct 28, 2014 at 14:42
If you want to modify the input file, you can use the -i option as in sed, i.e., perl -i -pe 'PERLCOMMANDS' inputfile. Without the -i option, the modified contents are written to standard output. — Uwe
– Uwe, Commented Oct 28, 2014 at 14:47
It might be better to slurp the input, OP doesn't state that the < and > have to be on the same line. — Floegipoky
– Floegipoky, Commented Oct 28, 2014 at 20:23
@mikeserv You're right, in this particular case, a simple loop arond the substitution is sufficient. I hadn't thought about this approach when I wrote the (former) first sentence. For other"replace this by that, but only in the following context" problems, the loop method can get very nasty. — Uwe
– Uwe, Commented Oct 29, 2014 at 11:38

user78605 · Accepted Answer · 2014-10-28 15:38:36Z

1

short sed way

sed 's/<\([^>]\+\)[0-9]\+>/<\1>/g' file

answered Oct 28, 2014 at 15:38

user78605

@mikeserv This answer does work perfectly for OPs example data. Maybe saying it would only work for one number would be a better comment.

user78605
– user78605

2014-10-29 07:52:24 +00:00
Commented Oct 29, 2014 at 7:52
1

@mikeserv Oki doke

user78605
– user78605

2014-10-29 16:12:00 +00:00
Commented Oct 29, 2014 at 16:12
You know what - my answer doesn't work either if < > span line boundaries, in fact. I guess, on second thought, you make a very good point.

mikeserv
– mikeserv

2014-10-29 16:22:04 +00:00
Commented Oct 29, 2014 at 16:22

Add a comment |

Stack Exchange Network

How do I remove every number that's surrounded by <>

4 Answers 4

OUTPUT

You must log in to answer this question.

Linked

Hot Network Questions

How do I remove every number that's surrounded by <>

4 Answers 4

OUTPUT

You must log in to answer this question.

Linked

Related

Hot Network Questions