3

I have the following text:

Since the 1-93 fragment contains additional residues 84–93. 

The first hypen found in "1-93" is fine when I process the text but the second one, am not sure it is a hypen or another character that caused my problems, so I need to replace this "–" in text so at the end to get:

84 to 93 instead. Kindly how to do that.

1
  • printf %s – | recode ..dump tells you what character it is (U+2013, en dash here) Commented Mar 20, 2014 at 14:58

2 Answers 2

2

You can always use a tool such as octal dump (od) or hexdump to confirm the ASCII code for a given character.

Example

$ echo 'Since the 1-93 fragment contains additional residues 84–93.' | hexdump -C 00000000 53 69 6e 63 65 20 74 68 65 20 31 2d 39 33 20 66 |Since the 1-93 f| 00000010 72 61 67 6d 65 6e 74 20 63 6f 6e 74 61 69 6e 73 |ragment contains| 00000020 20 61 64 64 69 74 69 6f 6e 61 6c 0a 72 65 73 69 | additional.resi| 00000030 64 75 65 73 20 38 34 e2 80 93 39 33 2e 0a |dues 84...93..| 0000003e 

So the first - is ASCII code 2d, while the second is not an ASCII code at all. So the dashes are clearly different. It's actually a UTF-8 character 0xe28093, an EN-DASH. (Thanks to @casey for clarifying this!)

To replace a character in a string such as this you can use either sed or put the string in a variable and do a search and replace on the string for this one character.

sed
$ var='Since the 1-93 fragment contains additional\nresidues 84–93.' $ echo -e $var | sed 's/–/-/g' Since the 1-93 fragment contains additional residues 84-93. 
bash
$ var='Since the 1-93 fragment contains additional\nresidues 84–93.' $ echo -e ${var/–/-} Since the 1-93 fragment contains additional residues 84-93. 
2
  • 2
    Note that the second dash is not ASCII e2, it is UTF-8 0xe28093, which is the EN-DASH character. Commented Mar 20, 2014 at 15:05
  • @casey - thanks that make a lot more sense, I wasn't sure what od was saying to me last night 8-). I've included your note in the A. Commented Mar 20, 2014 at 15:54
1

You can use sed:

$ echo "Since the 1-93 fragment contains additional > residues 84–93." | sed 's/–/ to /g' Since the 1-93 fragment contains additional residues 84 to 93. 

To edit multiple files in-place, we use GNU sed, you can do:

sed -i 's/–/ to /g' ./*.txt 
2
  • I need all of cases in text files not just in one sentence. I have many text files (*.txt) Commented Mar 20, 2014 at 14:22
  • Updated answer! Commented Mar 20, 2014 at 14:51

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.