4

let's say we have the following code example where we create two basic dataframes:

import pandas as pd # Creating Dataframes a = [{'Name': 'abc', 'Age': 8, 'Grade': 3}, {'Name': 'xyz', 'Age': 9, 'Grade': 3}] df1 = pd.DataFrame(a) b = [{'ID': 1,'Name': 'abc', 'Age': 8}, {'ID': 2,'Name': 'xyz', 'Age': 9}] df2 = pd.DataFrame(b) # Printing Dataframes display(df1) display(df2) 

We get the following datasets:

 Name Age Grade 0 abc 8 3 1 xyz 9 3 ID Name Age 0 1 abc 8 1 2 xyz 9 

How can I find the list of columns that are not repeated in these frames when they are intersected? That is, as a result, I want to get the names of the following columns: ['Grade', 'ID']

1 Answer 1

6

Use symmetric_difference

res = df2.columns.symmetric_difference(df1.columns) print(res) 

Output

Index(['Grade', 'ID'], dtype='object') 

Or as an alternative, use set.symmetric_difference

res = set(df2.columns).symmetric_difference(df1.columns) print(res) 

Output

{'Grade', 'ID'} 

A third alternative, suggested by @SashSinha, is to use the shortcut:

res = df2.columns ^ df1.columns 

but as of pandas 1.4.3 this issue a warning:

FutureWarning: Index.xor operating as a set operation is deprecated, in the future this will be a logical operation matching Series.xor. Use index.symmetric_difference(other) instead. res = df2.columns ^ df1.columns

Sign up to request clarification or add additional context in comments.

1 Comment

I didn't know about the warning my bad. In normal python, the shortcut helps a lot when you are limited to a line length of 80 characters...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.