-1

I'm not sure where to start with this one.

I have a list of obsolete items with a new item_code listed somewhere in the description column. Item codes are always between 8 & 12 characters so all other numbers in the description should be ignored.

import pandas as pd df1 = pd.DataFrame({'Item_Code': ['00001234', '00012345', '00123456', '01234567'], 'Desc': ['Widget1 - Obsolete Use Alternative 56789100', 'Obsolete Widget 2 - Use Alternative 56789100 - Blah Blah Blah', 'Alternative Use 9999999910 - Blah Blah Blah', 'Obsolete use 99999999911']}, index=[0, 1, 3, 4]) print(df1.head(10)) 

enter image description here

So ideally I'm looking to have the alternative codes in a new column.

enter image description here

0

1 Answer 1

1

You can use Series.str.extract like so:

df["Alternative"] = df["Desc"].str.extract(r"(\d{8,12})") 

This applies the regex r"(\d{8,12})" (explained here) over each row. The values in the resultant column will be strings unless you convert them to integers.

Sign up to request clarification or add additional context in comments.

2 Comments

This is awesome, but how would I deal with Multiple in the same cell? for example:- "Widget1 - Obsolete Use Alternative 56789100 or 56789101" Ideally, I'd want a new row for each replacement item if possible
You would need to use Series.str.extractall but this returns a DataFrame with a multi index so needs more work to join the rows back together.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.