1
| 1st Most Common Value | 2nd Most Common Value | 3rd Most Common Value | 4th Most Common Value | 5th Most Common Value | |-----------------------|-----------------------|-----------------------|-----------------------|-----------------------| | Grocery Store | Pub | Coffee Shop | Clothing Store | Park | | Pub | Grocery Store | Clothing Store | Park | Coffee Shop | | Hotel | Theatre | Bookstore | Plaza | Park | | Supermarket | Coffee Shop | Pub | Park | Cafe | | Pub | Supermarket | Coffee Shop | Cafe | Park | 

The name of the dataframe is df0. As you can see there are many values repeating in all the columns. So I want to create a dataframe which has all the unique values with their frequencies from all the columns. Can someone please help with the code since I want to create a Bar plot of it?

The Output should be as follows:

| Venues | Count | |----------------|-------| | Bookstore | 1 | | Cafe | 2 | | Coffee Shop | 4 | | Clothing Store | 2 | | Grocery Store | 2 | | Hotel | 1 | | Park | 5 | | Plaza | 1 | | Pub | 4 | | Supermarket | 2 | | Theatre | 1 | 
9
  • 6
    What's your expected output? Also would be nice if you could paste the data not as an image Commented Jun 2, 2020 at 21:16
  • Start by running fd0.describe() Commented Jun 2, 2020 at 21:20
  • So basically you want .value_counts() for each column? Commented Jun 2, 2020 at 21:24
  • @NYCCoder I have modified my code, please do check and let me know. Thank you. Commented Jun 3, 2020 at 7:20
  • 1
    @CeliusStingher I have modified my code, please do check and let me know. Thank you. Commented Jun 3, 2020 at 7:21

3 Answers 3

3

EDIT: I got ahead of myself in my original answer (also thanks OP for adding the edit/expected output). You want this post, I think the simplest answer:

new_df = pd.DataFrame(df0.stack().value_counts()) 

If you don't care about which column the values are coming from, and you just want their counts, then use value_counts() (as @Celius Stingher said in the comments), following this post.

If you do want to report the frequency of each value for each column, you can use value_counts() for each column, but you may end up with uneven entries (to get back into a DataFrame, you could do some sort of join).

I instead made a little function to count the occurrences of values in a df, and return a new one:

import pandas as pd import numpy as np def counted_entries(df, array): output = pd.DataFrame(columns=df.columns, index=array) for i in array: output.loc[i] = (df==i).sum() return output 

This works for a df filled with random animal value names. You just have to pass the unique entries in the df by getting the set of its values:

columns = ['Column ' + str(i+1) for i in range(10)] index = ['Row ' + str(i+1) for i in range(5)] df = pd.DataFrame(np.random.choice(['pig','cow','sheep','horse','dog'],size=(5,10)), columns=columns, index=index) unique_vals = list(set(df.stack())) #this is all the possible entries in the df df2 = counted_entries(df, unique_vals) 

df before:

 Column 1 Column 2 Column 3 Column 4 ... Column 7 Column 8 Column 9 Column 10 Row 1 pig pig cow cow ... cow pig dog pig Row 2 sheep cow pig sheep ... dog pig pig cow Row 3 cow cow cow sheep ... horse dog sheep sheep Row 4 sheep cow sheep cow ... cow horse pig pig Row 5 dog pig sheep sheep ... sheep sheep horse horse 

output of counted_entries()

 Column 1 Column 2 Column 3 ... Column 8 Column 9 Column 10 pig 1 2 1 ... 2 2 2 horse 0 0 0 ... 1 1 1 sheep 2 0 2 ... 1 1 1 dog 1 0 0 ... 1 1 0 cow 1 3 2 ... 0 0 1 
Sign up to request clarification or add additional context in comments.

1 Comment

I think pandas has enough functions so as to not need to define a custom function, but great answer all the same! +1 :)
1

Thank you for the edit, maybe this is what you are looking for, using value_counts for the full dataframe and then aggregating the output:

df0 = pd.DataFrame({'1st':['Grocery','Pub','Hotel','Supermarket','Pub'], '2nd':['Pub','Grocery','Theatre','Coffee','Supermarket'], '3rd':['Coffee','Clothing','Supermarket','Pub','Coffee'], '4th':['Clothing','Park','Plaza','Park','Cafe'], '5th':['Park','Coffee','Park','Cafe','Park']}) df1 = df0.apply(pd.Series.value_counts) df1['Count'] = df1.sum(axis=1) df1 = df1.reset_index().rename(columns={'index':'Venues'}).drop(columns=list(df0)) print(df1) 

Output:

 Venues Count 5 Park 5.0 2 Coffee 4.0 7 Pub 4.0 8 Supermarket 3.0 0 Cafe 2.0 1 Clothing 2.0 3 Grocery 2.0 4 Hotel 1.0 6 Plaza 1.0 9 Theatre 1.0 

Comments

1

You can also do this:

df = pd.read_csv('test.csv', sep=',') list_of_list = df.values.tolist() t_list = sum(list_of_list, []) df = pd.DataFrame(t_list) df.columns = ['Columns'] df = df.groupby(by=['Columns'], as_index=False).size().to_frame().reset_index().rename(columns={0: 'Count'}) print(df) Columns Count 0 Bookstore 1 1 Cafe 2 2 Clothing Store 2 3 Coffee Shop 4 4 Grocery Store 2 5 Hotel 1 6 Park 5 7 Plaza 1 8 Pub 4 9 Supermarket 2 10 Theatre 1 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.