I have a dataframe with LDA topic distribution outputs along with other demographic information as below:
single_df = pd.DataFrame([{"department": 'marketing', 'LDA_1': 0.252, 'LDA_2':0.002, 'LDA_3':0.50}, {"department": 'engineering', 'LDA_1': 0.478, 'LDA_2':0.152, 'LDA_3':0.492}, {"department": 'cooperate', 'LDA_1': 0.52, 'LDA_2':0.780, 'LDA_3':0.50}, {"department": "marketing", 'LDA_1': 0.352, 'LDA_2':0.052, 'LDA_3':0.20}]) I would like to get to the below final dataframe. How do I write a function to calculate Jenson-Shannon distance between two rows (column name containing "LDA_") that returns below data frame?
i j same_department distance_LDA 0 1 0 0.23 0 2 0 0.43 0 3 1 0.26 1 2 0 0.24 1 3 0 0.11 2 3 0 0.29 I've written code to calculate JS distance between individual pairs as below. How do I turn it into a function?
array=single_df.filter(regex='LDA').to_numpy() distance.jensenshannon(array[0],array[1]) Then to calculate whether two people share the department, I have the code below:
def same_department(i,j): if i['department'] == j['department']: return 1 else: return 0 