Pandas Extract Number from String

Question

Given the following data frame:

import pandas as pd import numpy as np df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'], }) df A 0 1a 1 NaN 2 10a 3 100b 4 0b

I'd like to extract the numbers from each cell (where they exist). The desired result is:

 A 0 1 1 NaN 2 10 3 100 4 0

I know it can be done with str.extract, but I'm not sure how.

cs95 · Accepted Answer · 2023-04-28 04:49:02Z

105

Give it a regex capture group:

df.A.str.extract('(\d+)')

Gives you:

0 1 1 NaN 2 10 3 100 4 0 Name: A, dtype: object

(\d+) is a regex capturing group, and \d+ specifies a regex pattern that matches only digits. Note that this will only work for whole numbers and not floats.

edited Apr 28, 2023 at 4:49

cs95

406k106 gold badges744 silver badges797 bronze badges

answered Jun 7, 2016 at 15:39

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Steven G Over a year ago

how could I do it when there is a comma like : 6,000 a

Jon Clements Over a year ago

@StevenG strip out commas first?

lebelinoz Over a year ago

As of 2020, this codes gives a FutureWarning. You get around it by adding the parameter expand=False to the extract

Upasana Mittal Over a year ago

This doesn't work if there is number after alphabets

mLstudent33 Over a year ago

This does not work for my column with number and units: 0.7 mg

|

Taming · Accepted Answer · 2017-07-07 00:32:28Z

To answer @Steven G 's question in the comment above, this should work:

df.A.str.extract('(^\d*)')

Mehdi Golzadeh · Accepted Answer · 2020-10-30 00:06:41Z

U can replace your column with your result using "assign" function:

df = df.assign(A = lambda x: x['A'].str.extract('(\d+)'))

Rostan · Accepted Answer · 2022-09-28 08:15:26Z

If you have cases where you have multiple disjoint sets of digits, as in 1a2b3c, in which you would like to extract 123, you can do it with Series.str.replace:

>>> df A 0 1a 1 b2 2 a1b2 3 1a2b3c >>> df['A'] = df['A'].str.replace('\D+', '') 0 1 1 2 2 12 3 123

You could also work this around with Series.str.extractall and groupby but I think that this one is easier.

Hope this helps!

Collectives™ on Stack Overflow

Pandas Extract Number from String

4 Answers 4

6 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

Comments

Comments

Linked

Related