0

I have two dataframes for different periods of time - df_period_a with

Vendor Market VendorA MarketA VendorA MarketB VendorX MarketB VendorZ MarketB VendorC MarketX VendorB MarketX VendorB MarketA VendorD MarketA 

and df_period_b as -

Vendor Market VendorA MarketB VendorX MarketB VendorZ MarketB VendorC MarketB VendorB MarketX VendorD MarketX VendorE MarketB VendorF MarketC 

which means MarketA has closed down and a new market MarketC has come up, along with a couple of new vendors E and F. I want to show this and the movement (if any) of vendors among markets with a df_diff like -

Source Destination Value MarketX1 MarketX2 1 MarketA1 MarketX2 1 MarketB1 MarketX2 0 MarketX1 MarketB2 1 MarketB1 MarketB2 3 - MarketC2 1 - MarketB2 1 

The Value here equals the number of vendors who have moved from source market in period a to destination market in period b.

Something that I tried doesn't work very accurately -

def get_vendor_displacement_count(market_list, df_before, df_after): for market in market_list: df_moved_vendors = pd.merge(df_before, df_after, on=['Vendor'], how='inner') df_moved_vendors.rename(columns={'Market_x':'Source', 'Market_y':'Target'}, inplace=True) df_moved_vendors['Source'] = dict_periods[len(market_list)+1] + " " + df_moved_vendors['Source'].astype(str) df_moved_vendors['Target'] = dict_periods[len(market_list)] + " " + df_moved_vendors['Target'].astype(str) return df_moved_vendors 

Also, would a Sankey diagram (ipysankeywidget) be the most appropriate diagram to show this displacement, or can I also look at some other visualizations for this? Thanks!

3
  • 1
    what are those values in the 'Value' column indicating? I guess source and destination are for vendors in period_a and period_b respectively? And what is 'MarketX1' as opposed to 'MarketX2'? Commented Dec 7, 2019 at 0:29
  • 1
    Try writing some code for this, even if it seems bad or slow or whatever. Post that code so we can help you with it. The question "can I also look at some other visualizations" is off topic on this site, as it is opinion-based and a poll and a request for recommendations. Commented Dec 7, 2019 at 0:45
  • Thanks @JohnZwinck and jeremy_rutman .. I've made some edits accordingly Commented Dec 7, 2019 at 2:05

1 Answer 1

1

You could do something like that:

dfa1 = df_period_a.assign(Value=1).set_index(['Vendor','Market']) dfb1 = df_period_b.assign(Value=1).set_index(['Vendor','Market']) diff = dfa1.join(dfb1, how='outer', lsuffix='a', rsuffix='b').fillna(0).astype(int) res = (diff.Valueb - diff.Valuea).rename('Change').reset_index().query('Change != 0') 

Result:

 Vendor Market Change 0 VendorA MarketA -1 2 VendorB MarketA -1 4 VendorC MarketB 1 5 VendorC MarketX -1 6 VendorD MarketA -1 7 VendorD MarketX 1 8 VendorE MarketB 1 9 VendorF MarketC 1 

-1 means the vendor left that market, 1 means he entered it. Depending on what you'd put the focus on you could further sort the result by either of the three columns.


Update: simple visualization as heatmap (green = vendor entered market; yellow = no change, vendor stayed in market; red = vendor left market; white (background) = no data (vendor not active in this market, neither in period a nor in period b)):

import pandas as pd import matplotlib import seaborn as sns df_period_a = pd.DataFrame( {'Vendor': map('Vendor{}'.format, list('AAXZCBBD')),'Market': map('Market{}'.format, list('ABBBXXAA'))}) df_period_b = pd.DataFrame( {'Vendor': map('Vendor{}'.format, list('AXZCBDEF')),'Market': map('Market{}'.format, list('BBBBXXBC'))}) dfa1 = df_period_a.assign(Value=1).set_index(['Vendor','Market']) dfb1 = df_period_b.assign(Value=1).set_index(['Vendor','Market']) diff = dfa1.join(dfb1, how='outer', lsuffix='a', rsuffix='b').fillna(0).astype(int) res = (diff.Valueb - diff.Valuea).rename('Change').reset_index() cmap = matplotlib.colors.ListedColormap(['red','yellow','green']) ax = sns.heatmap(res.pivot_table('Change', 'Vendor', 'Market'), cmap=cmap) cb = ax.collections[0].colorbar cb.set_ticks([-.67, 0, .67]) cb.set_ticklabels(['left', 'stayed', 'entered']) sns.despine(left=False, bottom=False, top=False, right=False) matplotlib.pyplot.show() 

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

As for visualization, this survey might be useful.
Any advise on which of the visualization options might be useful to visualize the result above?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.