How to split a string with many delimiter in python? [duplicate]

Question

I want to split a string by remove everything expect alphabetical characters.

By default, split only splits by whitespace between words. But I want to split by everything expect alphabetical characters. How can I add multiple delimiter to split?

For example:

word1 = input().lower().split() # if you input " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr." #the result will be ['has', '15', 'science@and^engineering--departments,', 'affiliated', 'centers,', 'bandar', 'abbas&&and', 'mahshahr.']

But I am looking for this kind of result:

['has', '15', 'science', 'and', 'engineering', 'departments', 'affiliated', 'centers', 'bandar', 'abbas', 'and', 'mahshahr']

You could do import re and words = re.findall(r"\w+", input().lower()). — trincot
– trincot, Commented Jul 15, 2018 at 14:40
@jonrsharpe, I think this is a different question. I believe OP is trying to split by all alphanumerical characters. Not split by selected characters only. There may be another dup but I couldn't find it. — jpp
– jpp, Commented Jul 15, 2018 at 14:41
@jpp, if problem is to split on alphanumeric, wouldn't there be non-alphanumeric characters in the result? It seems that splitting on multiple delimiters is a duplicate regardless of which set of delimiters are used for the split - the only difference in a regex solution would be the pattern used. — wwii
– wwii, Commented Jul 15, 2018 at 14:58
@wwii, See my answer, seems to solve the problem without being an answer to the proposed duplicate. Although everyone seems to prefer regex. Possibly the question needs more clarity, but then it's unclear / too broad rather than a dup. — jpp
– jpp, Commented Jul 15, 2018 at 15:00

jpp · Accepted Answer · 2018-07-15 15:27:55Z

For performance, you should use regex as per the marked duplicate. See benchmarking below.

groupby + str.isalnum

You can use itertools.groupby with str.isalnum to group by characters which are alphanumeric.

With this solution you do not have to worry about splitting by explicitly specified characters.

from itertools import groupby x = " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr." res = [''.join(j) for i, j in groupby(x, key=str.isalnum) if i] print(res) ['has', '15', 'science', 'and', 'engineering', 'departments', 'affiliated', 'centers', 'Bandar', 'Abbas', 'and', 'Mahshahr']

Benchmarking vs regex

Some performance benchmarking versus regex solutions (tested on Python 3.6.5):

from itertools import groupby import re x = " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr." z = x*10000 %timeit [''.join(j) for i, j in groupby(z, key=str.isalnum) if i] # 184 ms %timeit list(filter(None, re.sub(r'\W+', ',', z).split(','))) # 82.1 ms %timeit list(filter(None, re.split('\W+', z))) # 63.6 ms %timeit [_ for _ in re.split(r'\W', z) if _] # 62.9 ms

Ankit Jaiswal · Accepted Answer · 2018-07-15 14:47:31Z

You can replace all the non-alphanumeric characters with a single character (I'm using comma)

s = 'has15science@and^engineering--departments,affiliatedcenters,bandarabbas&&andmahshahr.' alphanumeric = re.sub(r'\W+', ',',s)

and then split it on comma:

splitted = alphanumeric.split(',')

Edit:

As suggested by, @DeepSpace, this can be done in a single statement:

splitted = re.split('\W+', s)

Collectives™ on Stack Overflow

How to split a string with many delimiter in python? [duplicate]

2 Answers 2

groupby + str.isalnum

Benchmarking vs regex

2 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

groupby + str.isalnum

Benchmarking vs regex

2 Comments

2 Comments

Linked

Related