1

I created a pandas DataFrame that holds various summary statistics for several variables in my dataset. I want to name the columns of the dataframe, but every time I try it deletes all my data. Here is what it looks like without column names:

MIN = df.min(axis=0, numeric_only=True) MAX = df.max(axis=0, numeric_only=True) RANGE = MAX-MIN MEAN = df.mean(axis=0, numeric_only=True) MED = df.median(axis=0, numeric_only=True) sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1) sum_stats = pd.DataFrame(data=sum_stats) sum_stats 

My output looks like this: enter image description here

But for some reason when I add column names:

sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1) columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED'] sum_stats = pd.DataFrame(data=sum_stats, columns=columns) sum_stats 

My output becomes this: enter image description here

Any idea why this is happening?

1 Answer 1

1

From the documentation for the columns parameter of the pd.DataFrame constructor:

[...] If data contains column labels, will perform column selection instead.

That means that, if the data passed is already a dataframe, for example, the columns parameter will act as a list of columns to select from the data.

If you change columns to equal a list of some columns that already exist in the dataframe that you're using, e.g. columns=[1, 4], you'll see that the resulting dataframe only contains those two columns, copied from the original dataframe.


Instead, you can assign the columns after you create the dataframe:

sum_stats.columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED'] 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.