Files created with roff and other "old-school" tools (for example man pages on many Unix systems) generate bold and underlined text in minimalistic terminals using tricks involving non-printable ASCII characters like "half-backspace" ^H to obtain bold and underlined text, for example:
b^Hbo^Hol^Hld^Hd and _^Hu_^Hn_^Hd_^He_^Hr_^Hl_^Hi_^Hn_^He_^Hd If I wish to convert this into the human readable plain text bold and underline (ignoring the formatting), I can easily achieve this in vim using something like :%s:\(.\)\b\1:\1:ge | %s:_\b\(.\):\1:ge.
I can also pipe the text through tr -dc and use some of perl's regex magic to look for words that are built entirely of pairs of repeated characters.
However, this looks like something that plain sed should be able to handle, which would make it much cleaner to use in scripts.
Question: Is it possible to do this translation only using POSIX
sed, i.e. without using GNU or BSD extensions?
What's giving me trouble here is only the non-printable character ^H (ASCII #8). There's a trick mentioned in Bruce Barnett's Sed - An Introduction, but somehow I was unable to get it to work.
col -bto remove overstruck characters.