I have a data frame where numeric data is stored in String with some Prefix character which I need to remove. On top of this it has double quotes inside the quotes i.e. ' "" '.
dict_1 = {"Col1" : [1001, 1002, 1003, 1004, 1005], "Col2" : ['"Rs. 5131"', '"Rs. 0"', '"Rs 351157"', '"Rs 535391"', '"Rs. 6513"']} a = pd.DataFrame(dict_1) a.head(6) | | Col1 | Col2 | |----|----------|-------------| | 0 |1001 |"Rs. 5131" | | 1 |1002 |"Rs. 0" | | 2 |1003 |"Rs 351157" | | 3 |1004 |"Rs 535391" | | 4 |1005 |"Rs. 6513" | As you can see I want to remove Quotes defined inside Col2 and along with this I have to remove Rs.
I tried following code to subset
b = a['Col2'][0] b = b[5:] b = b[:-1] b But the issue in some observation it is defined as Rs. and in some Rs without period.
The result should be a column of integers.
a['Col2'] = a['Col2'].str.extract('(\d+)').astype(int)