The problem:
I am given a data frame. Somewhere in that dataframe there is 3*N number of columns that I need to modify based on a condition. The columns of interest look like this:
| names_1 | address_1 | description_1 | names_2 | address_2 | ... |
|---|---|---|---|---|---|
| Joe | joe_address | ... | George | ... | ... |
| Kate | kate_address | ... | Daphne | ... | ... |
| Bob | bob_address | ... | Jake | ... | ... |
I can generate this with the following code:
import pandas as pd names_dict = {'names_1':['Joe', 'Kate', 'Bob'], 'address_1':['a1', 'a2', 'a3'], 'description_1':['d1', 'd2', 'd3'], 'names_2':['George', 'Daphne', 'Jake'], 'address_2':['a4', 'a5', 'a6'], 'description_2':['d4', 'd5', 'd6']} df = pd.DataFrame(data=names_dict) There is also a dictionary that I need to use. The keys to that dictionary are names of some companies. Each key has a list of names attached. It looks like this:
companies_dict = {'company1': ['Kate', 'Mark', 'Ben'], 'company2':['Jacob', 'Michael', 'Ken'], 'company3':['Jake', 'Don', 'Joe']} I need to go over all names_k columns. If I encounter a name that is in one of the companies lists, I swap the name of that person with the name of that company. Moreover, I swap the address and description of that person with the address and the description of that company.
Here are dictionaries to use for this purpose:
companies_descriptions = {'company1': 'company1_desc', 'company2': 'company2_desc', 'company3': 'company3_desc'} companies_addresses = {'company1': 'company1_address', 'company2': 'company2_address', 'company3': 'company3_address'} Note: The columns are somewhere in the dataframe, but they are next to each other. That is, the names_1 all the way to description_N are next to each other.
My solution:
I wrote the following Python code.
N = 2 number_of_columns = N for k in range(1, number_of_columns+1): for index, name in enumerate(df[f'names_{k}']): for company, name_list in companies_dict.items(): if name in name_list: df.loc[index, f'names_{k}'] = company df.loc[index, f'address_{k}'] = companies_descriptions.get(company) df.loc[index, f'description_{k}'] = companies_addresses.get(company) Note:
- We can safely assume that each person's name is unique. So no two companies have the same employee.
- N = 2 is an arbitrary value. Should work for any int>=1. It dictates how many columns (named names_k) there are and is defined by a separate process. N = 2 is given here as an example.
My solution is ugly, but it solves the problem. How to write it better?
Here is the whole code to copy:
import pandas as pd names_dict = {'names_1':['Joe', 'Kate', 'Bob'], 'address_1':['a1', 'a2', 'a3'], 'description_1':['d1', 'd2', 'd3'], 'names_2':['George', 'Daphne', 'Jake'], 'address_2':['a4', 'a5', 'a6'], 'description_2':['d4', 'd5', 'd6']} df = pd.DataFrame(data=names_dict) companies_dict = {'company1': ['Kate', 'Mark', 'Ben'], 'company2':['Jacob', 'Michael', 'Ken'], 'company3':['Jake', 'Don', 'Joe']} companies_descriptions = {'company1': 'company1_desc', 'company2': 'company2_desc', 'company3': 'company3_desc'} companies_addresses = {'company1': 'company1_address', 'company2': 'company2_address', 'company3': 'company3_address'} N = 2 number_of_columns = N for k in range(1, number_of_columns+1): for index, name in enumerate(df[f'names_{k}']): for company, name_list in companies_dict.items(): if name in name_list: df.loc[index, f'names_{k}'] = company df.loc[index, f'address_{k}'] = companies_descriptions.get(company) df.loc[index, f'description_{k}'] = companies_addresses.get(company)