I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country. import pandas as pd # Reading original hotel reviews dataset. df = pd.read_csv(DATASET_PATH) # Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'. df = df.loc[:, ["Hotel_Address", "Average_Score"]] # List of tuples. countries_w_avg_list = [] for _, row in df.iterrows(): address = row[0].split() country_name = address[len(address) - 1] countries_w_avg_list.append( (country_name, row[1]) ) # Getting the sum of all 'Average_Score' values for each country. d = {} # Empty dictionary. It will be a dictionary with list values, like: {"Netherlands": [sum, counter]} counter = 0 for country, individual_average in countries_w_avg_list: if country not in d: d[country] = [0, 0] d[country][0] += individual_average d[country][1] += 1 # Getting the average of all 'Average_Score' values for each country. for key, value in d.items(): d[key] = round((d[key][0] / d[key][1]), 2) # print(d) # Now, I believe there are two ways to transform this dictionary in the df I want. # 1 - Transform d in a df, and then transpose it. Then rename the columns. # 2 - Create a dataframe with the column names "Country" and "Overall Average Score" # and their values as d's keys as the value for the first column and d's values as the # values for the second column. df = pd.DataFrame({"Country": list(d.keys()), "Overall Average Score": list(d.values())}) print(df)