1

Python3:

import re k = "X" s = "X测试一Q测试二XQ测试三" print(re.split((r"\b" + k + r"\b"), s)) 

Output:

['X测试一Q测试二XQ测试三'] 

Expected:

['', '测试一Q测试二XQ测试三'] 
0

1 Answer 1

1

The is a letter belonging to the \p{Lo} class and there is no word boundary between X and .

A \b word boundary construct is Unicode-aware by default in Python 3.x re patterns, so you might switch this behavior off by using the re.ASCII / re.A option, or the inline (?a) flag:

import re k = "X" print( re.split(fr"(?a)\b{k}\b", "X测试一Q测试二XQ测试三") ) 

See the regex demo and the Python demo.

If you need to make sure there is no ASCII letter before and after X, use (?<![a-zA-Z])X(?![a-zA-Z]). Or, including digits, (?<![a-zA-Z0-9])X(?![a-zA-Z0-9]).

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.