3

i'm lookng to find a way to count the number of values in a column and its proving trickier than i originally thought.

 Percentile Percentile1 Percentile2 Percentile3 0 mediocre contender contender mediocre 69 mediocre bad mediocre mediocre 117 mediocre mediocre mediocre mediocre 144 mediocre none mediocre contender 171 mediocre mediocre contender mediocre 

i'm trying to create something looking like the following output. It takes the four options and counts them per column. It is essentially a pd.value.counts for each column. Any help would definitely be appreciated.

 Percentile Percentile1 Percentile2 Percentile3 mediocre: 5 2 3 4 contender: 0 1 2 1 bad: 0 1 0 0 none: 0 1 0 0 

1 Answer 1

9

It helps to make your data "tidy" (PDF) first. That means the columns should represent variables and the rows should represent observations.

In [98]: df Out[98]: Percentile Percentile1 Percentile2 Percentile3 0 mediocre contender contender mediocre 69 mediocre bad mediocre mediocre 117 mediocre mediocre mediocre mediocre 144 mediocre none mediocre contender 171 mediocre mediocre contender mediocre [5 rows x 4 columns] 

In this case, melting the DataFrame makes it tidy:

In [125]: melted = pd.melt(df); melted Out[125]: variable value 0 Percentile mediocre 1 Percentile mediocre 2 Percentile mediocre 3 Percentile mediocre 4 Percentile mediocre 5 Percentile1 contender 6 Percentile1 bad 7 Percentile1 mediocre 8 Percentile1 none 9 Percentile1 mediocre 10 Percentile2 contender 11 Percentile2 mediocre 12 Percentile2 mediocre 13 Percentile2 mediocre 14 Percentile2 contender 15 Percentile3 mediocre 16 Percentile3 mediocre 17 Percentile3 mediocre 18 Percentile3 contender 19 Percentile3 mediocre [20 rows x 2 columns] 

and then make a frequency table using crosstab:

In [127]: pd.crosstab(index=[melted['value']], columns=[melted['variable']]) Out[127]: variable Percentile Percentile1 Percentile2 Percentile3 value bad 0 1 0 0 contender 0 1 2 1 mediocre 5 2 3 4 none 0 1 0 0 [4 rows x 4 columns] 
Sign up to request clarification or add additional context in comments.

1 Comment

Update. Starting from pandas 0.16.0 in pd.crosstab use columns instead of cols, see documentation here

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.