How to split a string by any characters but letters? [duplicate]

Question

How to split a string by any characters but letters? In other words, I want only words from the text, nothing else.

s="(This# is an5example!)" what_i_want=['This', 'is', 'an', 'example']

The duplicate question's only difference is that the output is a space separated string and not a list. Your case is even simpler - no need for the join. Simply re.findall("[a-zA-Z]+", s) — Tomerikoo
– Tomerikoo, Commented Mar 17, 2024 at 15:05
@Matthias we assume he is talking about alphabetic characters in the Basic Latin block, ie the set [A-Za-z], in Posix notation as a Unicode set [[:alpha:]&[:ASCII:]]. Although technically, Letters could be anything in [\p{L}] or maybe more importantly [\p{Alphabetic}], although begs the question do you include ideographs, etc in the mix? Although in strictest sense it would be [\p{L}]. But then, I tend to expect a high level of imprecision in terminology from Python developers, since Python itself tends towards the same imprecisions. — Andj
– Andj, Commented Mar 17, 2024 at 22:38

Edo Akse · Accepted Answer · 2024-03-18 09:38:52Z

Please keep the content from @Andj below into account. I've copied the relevant part below:

"français".isalpha() will return True while "franc\u0327ais".isalpha() will return False. For many languages for longer strings it will always return False. Python's definition of Alphabetic differs from Unicode's definition.

Original answer:

You can iterate over the characters in the string and test if they are alphabetical characters. Add some logic so you don't put empty strings in the result list and you're done:

s = "(This# is an5example!)" word = "" word_list = [] for character in s: if character.isalpha(): word += character elif len(word) > 0: word_list.append(word) word = "" print(word_list)

output

['This', 'is', 'an', 'example']

@EdoAkse str.isalpha() is fine for English, but can be dangerous for many other languages. "français".isalpha() will return True while "franc\u0327ais".isalpha() will return False. For many languages for longer strings it will always return False. Python's definition of Alphabetic differs from Unicode's definition.
That I did not know, I'll update the answer to reflect this.

Collectives™ on Stack Overflow

How to split a string by any characters but letters? [duplicate]

1 Answer 1

Original answer:

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Original answer:

4 Comments

Linked

Related