3

The following matches in Idle, but does not match when run in a method in a module file:

import re re.search('\\bשלום\\b','שלום עולם',re.UNICODE) 

while the following matches in both cases:

import re re.search('שלום','שלום עולם',re.UNICODE) 

(Notice that stackoverflow erroneously switches the first and second items in the line above as this is a right to left language)

How can I make the first code match inside a py file?

Update: What I should have written for the first segment is that it matches in Idle, but does not match when run in eclipse console with PyDev.

4
  • The first re.search() doesn't work for me in IDLE or a module. Commented Jun 15, 2010 at 15:30
  • Did you try re.LOCALE instead of UNICODE? I'd install locale-he but am kind of afraid I'd never get it switched back. Off-topic: some say google translate goes too far apt-get install user-he; hebrew-settings becomes apt-get install user-en; english-settings which is impressive, but wrong ;) Commented Jun 15, 2010 at 15:37
  • @Lee, nor does the first for me, although the second does. I mention this as we are probably both in a non-he locale and soooo many things depend upon it. Oddly, it got the paste order correct. Commented Jun 15, 2010 at 15:46
  • Thanks for checking guys. I don't think I have anything set specifically to Hebrew, just to unicode. Idle\Options\Configure IDLE\General\Default Source Encoding is set to UTF-8 and in C:\Python26\Lib\site.py I have encoding = "utf-8" instead of encoding = "ascii". Also make sure you're usign a unicode supporting font such as Courier or Courier New Commented Jun 17, 2010 at 14:07

1 Answer 1

2

Seems to work for me when I'm using unicode strings:

# -*- coding: utf-8 -*- import re match = re.search(u'\\bשלום\\b', u'שלום עולם', re.U) 

See it in action: http://codepad.org/xWz5cZj5

Sign up to request clarification or add additional context in comments.

5 Comments

Is the # coding=utf-8 notation the same as # -*- coding: utf-8 -*-? I'm asking because it's the first time I see it like this. If not, please correct it.
@ΤΖΩΤΖΙΟΥ - sorry to disappoint you, but I don't know. :| I don't know any Python, in fact, and learned every bit from Google and the documentations. I did that odd thing because I want to learn Python (one day), and I know Hebrew.
No disappointment here, don't worry; you did fine for someone not knowing Python :) It was possible that there was an alternative notation that I didn't know. I corrected it for you.
@ΤΖΩΤΖΙΟΥ - No problem. The code I posted doesn't work without it, but I guess codepad.org isn't an accurate representation. Thanks!
@ΤΖΩΤΖΙΟΥ: See docs.python.org/reference/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.