-2

How to split a string by any characters but letters? In other words, I want only words from the text, nothing else.

s="(This# is an5example!)" what_i_want=['This', 'is', 'an', 'example'] 
4
  • The duplicate question's only difference is that the output is a space separated string and not a list. Your case is even simpler - no need for the join. Simply re.findall("[a-zA-Z]+", s) Commented Mar 17, 2024 at 15:05
  • ''.join([c if c.isalpha() else '\n' for c in s]).split() Commented Mar 17, 2024 at 15:26
  • 1
    Please define "letters". Commented Mar 17, 2024 at 15:27
  • @Matthias we assume he is talking about alphabetic characters in the Basic Latin block, ie the set [A-Za-z], in Posix notation as a Unicode set [[:alpha:]&[:ASCII:]]. Although technically, Letters could be anything in [\p{L}] or maybe more importantly [\p{Alphabetic}], although begs the question do you include ideographs, etc in the mix? Although in strictest sense it would be [\p{L}]. But then, I tend to expect a high level of imprecision in terminology from Python developers, since Python itself tends towards the same imprecisions. Commented Mar 17, 2024 at 22:38

1 Answer 1

0

Please keep the content from @Andj below into account. I've copied the relevant part below:


"français".isalpha() will return True while "franc\u0327ais".isalpha() will return False. For many languages for longer strings it will always return False. Python's definition of Alphabetic differs from Unicode's definition.


Original answer:

You can iterate over the characters in the string and test if they are alphabetical characters. Add some logic so you don't put empty strings in the result list and you're done:

s = "(This# is an5example!)" word = "" word_list = [] for character in s: if character.isalpha(): word += character elif len(word) > 0: word_list.append(word) word = "" print(word_list) 

output

['This', 'is', 'an', 'example'] 
Sign up to request clarification or add additional context in comments.

4 Comments

Of course, but is this the most pythonic way?
Not sure, but no imports and a single pass over the data
@EdoAkse str.isalpha() is fine for English, but can be dangerous for many other languages. "français".isalpha() will return True while "franc\u0327ais".isalpha() will return False. For many languages for longer strings it will always return False. Python's definition of Alphabetic differs from Unicode's definition.
That I did not know, I'll update the answer to reflect this.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.