2

I was passing an Index type variable (Pandas.Index) containing the labels of columns I want to drop from my DataFrame and it was working correctly. It was Index type because I was extracting the column names based on certain condition from the DataFrame itself.

Afterwards, I needed to add another column name to that list, so I converted the Index object to a Python list so I could append the additional label name. But on passing the list as columns parameter to the drop() method on the Dataframe, I now keep getting the error :

ValueError: Need to specify at least one of 'labels', 'index' or 'columns'

How to resolve this error?

The code I use is like this:

unique_count = df.apply(pd.Series.nunique) redundant_columns = unique_count[unique_count == 1].index.values.tolist() redundant_columns.append('DESCRIPTION') print(redundant_columns) df.drop(columns=redundant_columns, inplace=True) Out: None 

I found why the error is occurring. After the append() statement, redundant_columns is becoming None. I don't know why. I would love if someone can explain why this is happening?

6
  • L = ['col1','col2'] and then drop(L) does not work? Can you show how use drop ? Commented Apr 23, 2018 at 10:45
  • Hmmm, unfortuantely I cannot simulate it, so no idea how solve it. Problem should be in data or in pandas/python. Commented Apr 24, 2018 at 7:17
  • @jezrael It's a good solution, so why delete it? I am waiting, will mark an answer as accepted later Commented Apr 24, 2018 at 7:48
  • I only think you are not interested, but OK, not remove it. Commented Apr 24, 2018 at 7:49
  • @jezrael I am interested in your solution, very elegant. I haven't used it in my code because I am trying to avoid copying as much as possible, memory optimizing so i don't run out of memory. So I am doing inplace operations as much as possible. Commented Apr 24, 2018 at 7:58

3 Answers 3

2

For me your solution working nice.

Another solution for remove columns by boolean indexing:

df = pd.DataFrame({'A':list('bbbbbb'), 'DESCRIPTION':[4,5,4,5,5,4], 'C':[7,8,9,4,2,3], 'D':[1,3,5,7,1,0], 'E':[5,3,6,9,2,4], 'DESCRIPTION':list('aaabbb')}) print (df) A C D DESCRIPTION E 0 b 7 1 a 5 1 b 8 3 a 3 2 b 9 5 a 6 3 b 4 7 b 9 4 b 2 1 b 2 5 b 3 0 b 4 mask = df.nunique().ne(1) mask['DESCRIPTION'] = False df = df.loc[:, mask] print (df) C D E 0 7 1 5 1 8 3 3 2 9 5 6 3 4 7 9 4 2 1 2 5 3 0 4 

Explanation:

  1. First get length of unique values by nunique and compare by ne for not equal
  2. Change boolean mask - column DESCRIPTION to False for always remove
  3. Filter by boolean indexing

Details:

print (df.nunique()) A 1 C 6 D 5 DESCRIPTION 2 E 6 dtype: int64 mask = df.nunique().ne(1) print (mask) A False C True D True DESCRIPTION True E True mask['DESCRIPTION'] = False print (mask) A False C True D True DESCRIPTION False E True dtype: bool 
Sign up to request clarification or add additional context in comments.

8 Comments

@SushovanMandal - Just test your solution and working nice :(
Can you provide a description for your solution above?
@SushovanMandal - Sure, give me a sec.
Thanks for this. But the code i posted in question is still not working. I gave a print statement in middle and the redundant_columns is become None after the .append() statement. Why is that?
@SushovanMandal - It is really weird, can you check this L = [1,4] L.append(7) print (L) ?
|
1

After trying around, this got fixed by using numpy.ndarray instead of plain Python list, although I don't know why.

In my trials, using plain Python List is giving ValueError, pandas.Index or numpy.ndarray type object containing the labels is working fine. So I went with np.ndarray as that is appendable.

Current working code:

 unique_count = df.apply(pd.Series.nunique) redundant_columns: np.ndarray = unique_count[unique_count == 1].index.values redundant_columns = np.append(redundant_columns, 'DESCRIPTION') self.full_data.drop(columns=redundant_columns, inplace=True) 

Comments

0

I had the same error when using .remove in the line of initialization:

myNewList = [i for i in myOldList].remove('Last Item') 

myNewList would become none type. Using .tolist() in a separate column might help you:

redundant_columns = unique_count[unique_count == 1].index.values redundant_columns.tolist() redundant_columns.append('DESCRIPTION') 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.