How to remove constant prefix and suffix character [duplicate]

Question

I have a data frame where numeric data is stored in String with some Prefix character which I need to remove. On top of this it has double quotes inside the quotes i.e. ' "" '.

dict_1 = {"Col1" : [1001, 1002, 1003, 1004, 1005], "Col2" : ['"Rs. 5131"', '"Rs. 0"', '"Rs 351157"', '"Rs 535391"', '"Rs. 6513"']} a = pd.DataFrame(dict_1) a.head(6) | | Col1 | Col2 | |----|----------|-------------| | 0 |1001 |"Rs. 5131" | | 1 |1002 |"Rs. 0" | | 2 |1003 |"Rs 351157" | | 3 |1004 |"Rs 535391" | | 4 |1005 |"Rs. 6513" |

As you can see I want to remove Quotes defined inside Col2 and along with this I have to remove Rs.

I tried following code to subset

b = a['Col2'][0] b = b[5:] b = b[:-1] b

But the issue in some observation it is defined as Rs. and in some Rs without period.

The result should be a column of integers.

All of the existing answers are to focused on prefix / suffix. The easiest solution is to extract the digits, and convert to int: a['Col2'] = a['Col2'].str.extract('(\d+)').astype(int) — Trenton McKinney
– Trenton McKinney, Commented Apr 21, 2022 at 17:17

ROOP AMBER · Accepted Answer · 2022-04-21 18:23:02Z

2

You can simply use removeprefix and removesuffix methods for string after you get the value of the particular columns For a complete answer as comments are demanding

col3=[] lis = dic['col2'] for b in lis: b=b.removeprefix('"').removesuffix('"').removeprefix("Rs.").removeprefix("Rs ") col3.append(int(b)) dic['col2']=col3

By this even if there will be Rs. with a period or without period both will be removed without any error. Edit: Change suggested by @Jhanzaib Humayun. I found an easier answer out there on this link for whole of the series alltogether extract number from string

edited Apr 21, 2022 at 18:23

answered Apr 21, 2022 at 13:03

ROOP AMBER

3271 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Trenton McKinney Over a year ago

This is not correctly implemented. Check your code. (1) These are .str methods, (2) [0] should not be on b = a['Col2'][0], (3) the strings are not correct, (4) 'Rs ' and 'Rs. ' are both prefixes, and (5) this should be assigned back to the same column in the dataframe or a new dataframe column, not a separate Series.

Trenton McKinney Over a year ago

The current implementation results in AttributeError: 'Series' object has no attribute 'removeprefix'

ROOP AMBER Over a year ago

@TrentonMcKinney All the code I have written is after the person asking the question gets the particular cell and hence its value would be a string. As 'Rs ' and 'Rs. ' are both prefixes it will still work as removeprefix only removes and there is a prefix and doesn't written an error for clarity i have edited my answer

Trenton McKinney Over a year ago

Test your code, it does not work. Also after the person asking the question gets the particular cell is not the correct way. It should be applied to the Series, not the cell.

ROOP AMBER Over a year ago

@TrentonMcKinney any of the answers prescribed will not work on a series this is because you will have to implement a for loop to go in every cell of the column 1 by 1 and than remove its prefix

|

Trenton McKinney · Accepted Answer · 2022-04-21 17:26:02Z

Given the sample data in the OP, use .replace

a['Col2'] = a['Col2'].replace({'"': ''}, regex=True) a['Col2'] = a['Col2'].replace({'Rs.': ''}, regex=True) a['Col2'] = a['Col2'].replace({'Rs': ''}, regex=True) a['Col2'] = a['Col2'].replace({' ': ''}, regex=True)

Jonathan · Accepted Answer · 2022-04-25 12:08:28Z

Or use .str.replace():

a["Col2"] = a["Col2"].str.replace('Rs. ', '').replace('"', '')

Update use replace:

a["Col2"].replace(r"Rs\.?\s+", '', regex=True, inplace=True).astype(int)

Collectives™ on Stack Overflow

How to remove constant prefix and suffix character [duplicate]

3 Answers 3

9 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

Comments

Comments

Linked

Related