I have a dataframe that look like this:
a b c d 0 0.418762 0.042369 0.869203 0.972314 1 0.991058 0.510228 0.594784 0.534366 2 0.407472 0.259811 0.396664 0.894202 3 0.726168 0.139531 0.324932 0.906575 How I can get all columns except b?
I have a dataframe that look like this:
a b c d 0 0.418762 0.042369 0.869203 0.972314 1 0.991058 0.510228 0.594784 0.534366 2 0.407472 0.259811 0.396664 0.894202 3 0.726168 0.139531 0.324932 0.906575 How I can get all columns except b?
When the columns are not a MultiIndex, df.columns is just an array of column names so you can do:
df.loc[:, df.columns != 'b'] a c d 0 0.561196 0.013768 0.772827 1 0.882641 0.615396 0.075381 2 0.368824 0.651378 0.397203 3 0.788730 0.568099 0.869127 drop is better IMO. A bit more readable and handles multiindexesdrop is better - I do think it's useful to discover that (single-level) columns are arrays you can work with, but specifically for dropping a column, drop is very readable and works well with complex indexes.Don't use ix. It's deprecated. The most readable and idiomatic way of doing this is df.drop():
>>> df.drop('b', axis=1) a c d 0 0.418762 0.869203 0.972314 1 0.991058 0.594784 0.534366 2 0.407472 0.396664 0.894202 3 0.726168 0.324932 0.906575 Note that by default, .drop() does not operate inplace; despite the ominous name, df is unharmed by this process. If you want to permanently remove b from df, do df.drop('b', inplace=True).
df.drop() also accepts a list of labels, e.g. df.drop(['a', 'b'], axis=1) will drop column a and b. You can use columns too, as in df.drop(columns='a') or df.drop(columns=['a', 'b']) (thanks @BallpointBen in the comments).
df.drop([('l1name', 'l2name'), 'anotherl1name'], axis=1). Seems to use list vs tuple to determine if you want multiple columns (list) or referring to a multiindex (tuple).df.drop(columns='a') or df.drop(columns=['a', 'b']). Can also replace columns= with index=.df[df.columns.difference(['b'])] Out: a c d 0 0.427809 0.459807 0.333869 1 0.678031 0.668346 0.645951 2 0.996573 0.673730 0.314911 3 0.786942 0.719665 0.330833 DataFrameGroupBy, which is what I was looking for, thanks! I used grouped[df.columns.difference(['b'])]...You can use df.columns.isin()
df.loc[:, ~df.columns.isin(['b'])] When you want to drop multiple columns, as simple as:
df.loc[:, ~df.columns.isin(['col1', 'col2'])] You can drop columns in index:
df[df.columns.drop('b')] or
df.loc[:, df.columns.drop('b')] If you need to drop multiple columns, use a list of labels instead of a single label.
Here is a one line lambda:
df[map(lambda x :x not in ['b'], list(df.columns))] before:
import pandas import numpy as np df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd')) df a b c d 0 0.774951 0.079351 0.118437 0.735799 1 0.615547 0.203062 0.437672 0.912781 2 0.804140 0.708514 0.156943 0.104416 3 0.226051 0.641862 0.739839 0.434230 after:
df[map(lambda x :x not in ['b'], list(df.columns))] a c d 0 0.774951 0.118437 0.735799 1 0.615547 0.437672 0.912781 2 0.804140 0.156943 0.104416 3 0.226051 0.739839 0.434230 I think the best way to do is the way mentioned by @Salvador Dali. Not that the others are wrong.
Because when you have a data set where you just want to select one column and put it into one variable and the rest of the columns into another for comparison or computational purposes. Then dropping the column of the data set might not help. Of course there are use cases for that as well.
x_cols = [x for x in data.columns if x != 'name of column to be excluded'] Then you can put those collection of columns in variable x_cols into another variable like x_cols1 for other computation.
ex: x_cols1 = data[x_cols] Similar to @Toms answer, it is also possible to select all columns except "b" without using .loc, like so:
df[df.columns[~df.columns.isin(['b'])]] I've tested speed and found that for me the .loc solution was the fastest
df_working_1.loc[:, df_working_1.columns != "market_id"] # 7.19 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) df_working_1.drop("market_id", axis=1) # 7.65 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) df_working_1[df_working_1.columns.difference(['market_id'])] # 7.58 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) df_working_1[[i for i in list(df_working_1.columns) if i != 'market_id']] # 7.57 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) I think a nice solution is with the function filter of pandas and regex (match everything except "b"):
df.filter(regex="^(?!b$)") df.filter(regex='[^b]') shaves off a little more. But even then, this solution isn't very readable...You can also pop() a column. It removes a column from a dataframe but returns it as a Series, which you assign to a value (y below). If you don't assign, it's just thrown away. One case where this is quite useful is to separate the target variable from the feature set in ML. For example:
X = pd.DataFrame({'feature1': range(5), 'feature2': range(6,11), 'target': [0,0,0,1,1]}) y = X.pop('target') It makes the following transformation:
This allows you to drop multiple columns even if you aren't sure they exist, and works for MultiIndex columns too.
df.drop(columns=[x for x in ('abc', ('foo', 'bar')) if x in df.columns]) In this example (assuming a 2-level MultiIndex) it will drop all columns with abc in the first level, and it will also drop the single column ('foo', 'bar')
I've added this answer as this is the first question that appears even when searching for MultiIndex.