Getting the average score for hotel scores in different countries using pandas

Question

I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.

For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.

I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?

Script:

# Average all scores that belong to a particular country. import pandas as pd # Reading original hotel reviews dataset. df = pd.read_csv(DATASET_PATH) # Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'. df = df.loc[:, ["Hotel_Address", "Average_Score"]] # List of tuples. countries_w_avg_list = [] for _, row in df.iterrows(): address = row[0].split() country_name = address[len(address) - 1] countries_w_avg_list.append( (country_name, row[1]) ) # Getting the sum of all 'Average_Score' values for each country. d = {} # Empty dictionary. It will be a dictionary with list values, like: {"Netherlands": [sum, counter]} counter = 0 for country, individual_average in countries_w_avg_list: if country not in d: d[country] = [0, 0] d[country][0] += individual_average d[country][1] += 1 # Getting the average of all 'Average_Score' values for each country. for key, value in d.items(): d[key] = round((d[key][0] / d[key][1]), 2) # print(d) # Now, I believe there are two ways to transform this dictionary in the df I want. # 1 - Transform d in a df, and then transpose it. Then rename the columns. # 2 - Create a dataframe with the column names "Country" and "Overall Average Score" # and their values as d's keys as the value for the first column and d's values as the # values for the second column. df = pd.DataFrame({"Country": list(d.keys()), "Overall Average Score": list(d.values())}) print(df)

Graipher · Accepted Answer · 2018-06-26 14:07:21Z

You should probably use pandas.DataFrame.groupby.

The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.

import pandas as pd # Reading original hotel reviews dataset. # Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'. df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]] # Extract country from address df["Country"] = df.Hotel_Address.str.split().str[-1] df.drop(columns=["Hotel_Address"], inplace=True) # Get average average score per country, rounded to two decimal places print(df.groupby("Country").mean().round(2))

That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) ) — maufcost
– maufcost, Commented Jun 26, 2018 at 19:18
@maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice. — Graipher
– Graipher, Commented Jun 26, 2018 at 19:21

Stack Exchange Network

Getting the average score for hotel scores in different countries using pandas

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Getting the average score for hotel scores in different countries using pandas

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions