2

I have a pandas dataframe, df:

 foo bar 0 Supplies Sample X 1 xyz A 2 xyz B 3 Supplies Sample Y 4 xyz C 5 Supplies Sample Z 6 xyz D 7 xyz E 8 xyz F 

I want to create a new df that looks something like this:

 bar 0 Sample X - A 1 Sample X - B 2 Sample Y - C 3 Sample Z - D 4 Sample Z - E 5 Sample Z - F 

I am new to Pandas so I don't know how to achieve this. Could someone please help?

I tried DataFrame.iterrows but no luck.

2
  • 1
    What is the exact logic? The supplies rows are the first chunk? Commented Dec 7, 2022 at 21:28
  • Apologies for the confusion. Basically I wanted to combine bar's "Sample" and "Sub-sample" together as strings, based on 'foo' data. The combined data should be displayed in the new column. F.e. 'Sample X' and its subs will be: ``` bar 0 Sample X - A 1 Sample X - B ``` Commented Dec 8, 2022 at 9:29

3 Answers 3

4

You can use boolean indexing and ffill:

m = df['foo'].ne('Supplies') out = (df['bar'].mask(m).ffill()[m] .add(' - '+df.loc[m, 'bar']) .to_frame().reset_index(drop=True) ) 

Output:

 bar 0 Sample X - A 1 Sample X - B 2 Sample Y - C 3 Sample Z - D 4 Sample Z - E 5 Sample Z - F 
Sign up to request clarification or add additional context in comments.

Comments

0

You can do:

s = (df["bar"].mask(df.foo == "xyz").ffill() + "-" + df["bar"]).reindex( df.loc[df.foo == "xyz"].index ) df = s.to_frame() 

print(df):

 bar 1 Sample X-A 2 Sample X-B 4 Sample Y-C 6 Sample Z-D 7 Sample Z-E 8 Sample Z-F 

Comments

0

Another possible solution:

g = df.groupby(np.cumsum(df.bar.str.startswith('Sample'))) pd.DataFrame([x[1].bar.values[0] + ' - ' + y for x in g for y in x[1].bar.values[1:]], columns=['bar']) 

Output:

 bar 0 Sample X - A 1 Sample X - B 2 Sample Y - C 3 Sample Z - D 4 Sample Z - E 5 Sample Z - F 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.