40

In Python, I want to extract only the characters from a string.

Consider I have the following string,

input = "{('players',): 24, ('year',): 28, ('money',): 19, ('ipod',): 36, ('case',): 23, ('mini',): 46}" 

I want the result as,

output = "players year money ipod case mini" 

I tried to split considering only the alphabets,

word1 = st.split("[a-zA-Z]+") 

But the split is not happening.

3
  • 3
    Split does the opposite of what you are trying to do - it removes delimiters, and you've specified [a-zA-Z]+ as the delimiter, so it is removed. Commented Nov 20, 2011 at 4:18
  • 4
    Where are you getting this silly data format from? Commented Nov 20, 2011 at 4:22
  • 2
    Although you have picked chown's answer, take a look at sbery2A below. Where do you get this input data. It looks like a python dictionary except that it is quoted to make it a string. Commented Nov 20, 2011 at 13:42

7 Answers 7

73

You could do it with re, but the string split method doesnt take a regex, it takes a string.

Heres one way to do it with re:

import re word1 = " ".join(re.findall("[a-zA-Z]+", st)) 
Sign up to request clarification or add additional context in comments.

Comments

10

string.split() doesn't take regular expressions. You want something like:

re.split("[^a-zA-Z]*", "your string") 

and to get a string:

" ".join(re.split("[^a-zA-Z]*", "your string")) 

Comments

8

I think that you want all words, not characters.

result = re.findall(r"(?i)\b[a-z]+\b", subject) 

Explanation:

" \b # Assert position at a word boundary [a-z] # Match a single character in the range between “a” and “z” + # Between one and unlimited times, as many times as possible, giving back as needed (greedy) \b # Assert position at a word boundary " 

3 Comments

@julio.alegria Don't you see the (?i) in front of the regex?
didn't know nothing about (?i), that's why I asked :)
This is a beautiful solution!
2

What about doing this?

>>> import ast >>> " ".join([k[0] for k in ast.literal_eval("{('players',): 24, ('year',): 28, ('money',): 19, ('ipod',): 36, ('case',): 23, ('mini',): 46}").keys()]) 'case mini year money ipod players' 

4 Comments

Why does it change the ordering of the keys? What is it based on? (not value, not alphabetical...)?
evaluating strings to parse them? expensive and unsafe.
Unsafe? Do you understand what ast.literal_eval() does?
This answer seems the most thoughtful to me. The original data is a dictionary which is quoted. That's kinda strange to me. I wonder how it got that way. But, the answer here processes the dictionary to get the first value of tuple which is the key. It would be nice if the OP described where the data came from
2

You can take the approach of iterating through the string, and using the isalpha function to determine if it's a alpha character or not. If it is you can append it to the output string.

a = "Some57 996S/tr::--!!ing" q = "" for i in a: if i.isalpha(): q = "".join([q,i]) 

Comments

1

Or if you want all characters regardless of words or empty spaces

 a = "Some57 996S/tr::--!!ing" q = "" for i in a: if i.isalpha(): q = "".join([q,i]) 

print q 'SomeString'

Comments

0
import re string = ''.join([i for i in re.findall('[\w +/.]', string) if i.isalpha()]) #'[\w +/.]' -> it will give characters numbers and punctuation, then 'if i.isalpha()' this condition will only get alphabets out of it and then join list to get expected result. # It will remove spaces also. 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.