Substitute values with ascii chars using sed

Question

We have files with some chars represented by decimal(!) ascii values enclosed in cid(#) as e.g. (cid:104) for h. The string hello is thus represented as (cid:104)(cid:101)(cid:108)(cid:108)(cid:111).

How can I substitute this with the corresponding ascii characters using sed?

Here is an example file:

$ cat input.txt first line pre (cid:104)(cid:101)(cid:108)(cid:108)(cid:111) post last line

What I've tried so far is:

$ x="(cid:104)(cid:101)(cid:108)(cid:108)(cid:111)" $ echo $x | sed 's/(cid:\([^\)]*\))/\1/g' 104101108108111

But wee need the output to be hello

$ cat output.txt first line pre hello post last line

I'm trying to use printf in sed. But cannot find out how to pass the backreference \1 to printf

sed 's/(cid:\([^\)]*\))/'`printf "\x$(printf %x \1)"`'/g'

given your updated question, what is the exact, desired output? Note it is important to provide a minimal reproducible example from the very beginning, since your update invalidates our current answers. — fedorqui
– fedorqui, Commented Jul 25, 2016 at 9:28
You might need to explain why 'using sed' is a requirement. That is much, much more difficult than using a more suitable tool such as awk or perl... — Toby Speight
– Toby Speight, Commented Jul 25, 2016 at 9:42

Community · Accepted Answer · 2017-05-23 12:22:36Z

3

$ cat input.txt first line pre (cid:104)(cid:101)(cid:108)(cid:108)(cid:111) post last line $ perl -pe 's/\(cid:(\d+)\)/chr($1)/ge' input.txt > output.txt $ cat output.txt first line pre hello post last line

Thanks @123 for suggesting to use chr($1) instead of sprintf "%c", $1. See chr for documentation

Reference: Integer ASCII value to character in BASH using printf

edited May 23, 2017 at 12:22

CommunityBot

11 silver badge

answered Jul 25, 2016 at 9:11

Sundeep

23.9k2 gold badges35 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

wolfrevo Over a year ago

in our special case there are also "normal" characters. i.e. not all characters are represented as (cid:#) only some of them. I edit my original question to show an example file

123 Over a year ago

You can use chr instead of sprintf, i.e perl -pe 's/$cid:(\d+)$/chr($1)/ge'

Sundeep Over a year ago

@123 thanks :) ... didn't know about that function.. will edit the answer after OP clarifies his requirement

123 Over a year ago

@wolfrevo That isn't going to happen.

Sundeep Over a year ago

@wolfrevo , I don't think that would be possible.. see stackoverflow.com/questions/22544044/…

|

Community · Accepted Answer · 2017-05-23 12:14:52Z

Using %c you can convert an ASCII code into its corresponding character:

$ awk 'BEGIN {printf "%c", 104}' h

So it is a matter of extracting the numbers from within (cid:XX). This I do by setting the FS to ( and looping through the fields:

awk -v FS='(' '{for (i=2; i<=NF; i++) { r=gensub(/cid:([0-9]+)\)/, "\\1", "g", $i); printf "%c", r+0 } }' file

This uses gensub() and accesses to the captured groups as described in GNU awk: accessing captured groups in replacement text. Hence dependent on a GNU awk.

For your given input it returns:

$ awk -v FS='(' '{for (i=2; i<=NF; i++) {r=gensub(/cid:([0-9]+)\)/, "\\1", "g", $i); printf "%c", r+0}}' file hello

Collectives™ on Stack Overflow

Substitute values with ascii chars using sed

2 Answers 2

8 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Linked

Related