I have two csv files like the following
File1
x1 10.00 a1 x2 10.00 a2 x3 11.00 a1 x4 10.50 a2 x5 10.00 a3 x6 12.00 a3 File2
x1 x4 x5 I would like to create a new file that contains
x2 x3 x6 using pandas or python
Use Series.isin with ~ for filtering of values not existing in df1[0] - in first column with DataFrame.loc and boolean indexing:
import pandas as pd #create DataFrame from first file df1 = pd.read_csv(file1, sep=";", header=None) print (df1) 0 1 2 0 x1 10.0 a1 1 x2 10.0 a2 2 x3 11.0 a1 3 x4 10.5 a2 4 x5 10.0 a3 5 x6 12.0 a3 #create DataFrame from second file df2 = pd.read_csv(file2, header=None, sep='|') print (df2) 0 0 x1 1 x4 2 x5 s = df1.loc[~df1[0].isin(df2[0]), 0] print (s) 1 x2 2 x3 5 x6 Name: 0, dtype: object #write to file s.to_csv('new.csv', index=False, header=False)