7

I have a string that is a sentence like I don't want it, there'll be others

So the text looks like this I don\'t want it, there\'ll be other

for some reason a \ comes with the text next to the '. It was read in from another source. I want to remove it, but can't. I've tried. sentence.replace("\'","'")

sentence.replace(r"\'","'")

sentence.replace("\\","")

sentence.replace(r"\\","")

sentence.replace(r"\\\\","")

I know the \ is to escape something, so not sure how to do it with the quotes

5
  • Do you have the actual text 'I don\'t want it, there\'ll be other' in the source code? Or do you read the text from some file or input from the user? Commented Oct 16, 2015 at 11:51
  • How do you write text? Backslashes are automatically removed on print. Commented Oct 16, 2015 at 11:51
  • @JoachimPileborg It was read in from some file, not inputted Commented Oct 16, 2015 at 11:53
  • 3
    And the backslashes are actually in the text file? Commented Oct 16, 2015 at 11:54
  • crap, when I do print that variable it doesn't so up, so it is a nltk problem then? It is splitting don\'t, all i see is don Commented Oct 16, 2015 at 12:00

4 Answers 4

10

The \ is just there to escape the ' character. It is only visible in the representation (repr) of the string, it's not actually a character in the string. See the following demo

>>> repr("I don't want it, there'll be others") '"I don\'t want it, there\'ll be others"' >>> print("I don't want it, there'll be others") I don't want it, there'll be others 
Sign up to request clarification or add additional context in comments.

3 Comments

this doesn't help me, because I feed the string through nltk and it thinks don is a separate word, cutting off the word don't
i think this is a nltk problem then, thanks for the help
It's not an nltk "problem". The backslashes are how python is showing you that the string doesn't end at the apostrophe, as everyone has said. The usual NLTK tokenization intentionally breaks up words at the apostrophe; this has nothing to do with the backslashes.
2

Try to use:

sentence.replace("\\", "") 

You need two backslashes because first of them act as escape symbol, and second is symbol that you need to replace.

Comments

2

It is better to use regular expression to remove backslash:

>>> re.sub(u"u\005c'", r"'", "I don\'t want it, there\'ll be other") "I don't want it, there'll be other" 

Comments

0

If your text comes from crawled text and you didn't clean it up by unescaping before you process it with NLP tools, then you could easily unescape the HTML markups, e.g.:

In python2.x:

>>> import sys; sys.version '2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]' >>> import HTMLParser >>> txt = """I don\'t want it, there\'ll be other""" >>> HTMLParser.HTMLParser().unescape(txt) "I don't want it, there'll be other" 

In python3:

>>> import sys; sys.version '3.4.0 (default, Jun 19 2015, 14:20:21) \n[GCC 4.8.2]' >>> import html >>> txt = """I don\'t want it, there\'ll be other""" >>> html.unescape(txt) "I don't want it, there'll be other" 

See also: How do I unescape HTML entities in a string in Python 3.1?

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.