When I name columns in dataframe it deletes my data

Question

I created a pandas DataFrame that holds various summary statistics for several variables in my dataset. I want to name the columns of the dataframe, but every time I try it deletes all my data. Here is what it looks like without column names:

MIN = df.min(axis=0, numeric_only=True) MAX = df.max(axis=0, numeric_only=True) RANGE = MAX-MIN MEAN = df.mean(axis=0, numeric_only=True) MED = df.median(axis=0, numeric_only=True) sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1) sum_stats = pd.DataFrame(data=sum_stats) sum_stats

My output looks like this:

But for some reason when I add column names:

sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1) columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED'] sum_stats = pd.DataFrame(data=sum_stats, columns=columns) sum_stats

My output becomes this:

Any idea why this is happening?

score 1 · Accepted Answer · 2021-11-21 22:20:55Z

From the documentation for the columns parameter of the pd.DataFrame constructor:

[...] If data contains column labels, will perform column selection instead.

That means that, if the data passed is already a dataframe, for example, the columns parameter will act as a list of columns to select from the data.

If you change columns to equal a list of some columns that already exist in the dataframe that you're using, e.g. columns=[1, 4], you'll see that the resulting dataframe only contains those two columns, copied from the original dataframe.

Instead, you can assign the columns after you create the dataframe:

sum_stats.columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED']

Collectives™ on Stack Overflow

When I name columns in dataframe it deletes my data

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related