0

I have a data frame where numeric data is stored in String with some Prefix character which I need to remove. On top of this it has double quotes inside the quotes i.e. ' "" '.

dict_1 = {"Col1" : [1001, 1002, 1003, 1004, 1005], "Col2" : ['"Rs. 5131"', '"Rs. 0"', '"Rs 351157"', '"Rs 535391"', '"Rs. 6513"']} a = pd.DataFrame(dict_1) a.head(6) | | Col1 | Col2 | |----|----------|-------------| | 0 |1001 |"Rs. 5131" | | 1 |1002 |"Rs. 0" | | 2 |1003 |"Rs 351157" | | 3 |1004 |"Rs 535391" | | 4 |1005 |"Rs. 6513" | 

As you can see I want to remove Quotes defined inside Col2 and along with this I have to remove Rs.

I tried following code to subset

b = a['Col2'][0] b = b[5:] b = b[:-1] b 

But the issue in some observation it is defined as Rs. and in some Rs without period.

The result should be a column of integers.

1
  • All of the existing answers are to focused on prefix / suffix. The easiest solution is to extract the digits, and convert to int: a['Col2'] = a['Col2'].str.extract('(\d+)').astype(int) Commented Apr 21, 2022 at 17:17

3 Answers 3

2

You can simply use removeprefix and removesuffix methods for string after you get the value of the particular columns For a complete answer as comments are demanding

col3=[] lis = dic['col2'] for b in lis: b=b.removeprefix('"').removesuffix('"').removeprefix("Rs.").removeprefix("Rs ") col3.append(int(b)) dic['col2']=col3 

By this even if there will be Rs. with a period or without period both will be removed without any error. Edit: Change suggested by @Jhanzaib Humayun. I found an easier answer out there on this link for whole of the series alltogether extract number from string

Sign up to request clarification or add additional context in comments.

9 Comments

This is not correctly implemented. Check your code. (1) These are .str methods, (2) [0] should not be on b = a['Col2'][0], (3) the strings are not correct, (4) 'Rs ' and 'Rs. ' are both prefixes, and (5) this should be assigned back to the same column in the dataframe or a new dataframe column, not a separate Series.
The current implementation results in AttributeError: 'Series' object has no attribute 'removeprefix'
@TrentonMcKinney All the code I have written is after the person asking the question gets the particular cell and hence its value would be a string. As 'Rs ' and 'Rs. ' are both prefixes it will still work as removeprefix only removes and there is a prefix and doesn't written an error for clarity i have edited my answer
Test your code, it does not work. Also after the person asking the question gets the particular cell is not the correct way. It should be applied to the Series, not the cell.
@TrentonMcKinney any of the answers prescribed will not work on a series this is because you will have to implement a for loop to go in every cell of the column 1 by 1 and than remove its prefix
|
1

Given the sample data in the OP, use .replace

a['Col2'] = a['Col2'].replace({'"': ''}, regex=True) a['Col2'] = a['Col2'].replace({'Rs.': ''}, regex=True) a['Col2'] = a['Col2'].replace({'Rs': ''}, regex=True) a['Col2'] = a['Col2'].replace({' ': ''}, regex=True) 

Comments

1

Or use .str.replace():

a["Col2"] = a["Col2"].str.replace('Rs. ', '').replace('"', '') 

Update use replace:

a["Col2"].replace(r"Rs\.?\s+", '', regex=True, inplace=True).astype(int) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.