0

I have two csv files like the following

File1

x1 10.00 a1 x2 10.00 a2 x3 11.00 a1 x4 10.50 a2 x5 10.00 a3 x6 12.00 a3 

File2

x1 x4 x5 

I would like to create a new file that contains

x2 x3 x6 

using pandas or python

1 Answer 1

1

Use Series.isin with ~ for filtering of values not existing in df1[0] - in first column with DataFrame.loc and boolean indexing:

import pandas as pd #create DataFrame from first file df1 = pd.read_csv(file1, sep=";", header=None) print (df1) 0 1 2 0 x1 10.0 a1 1 x2 10.0 a2 2 x3 11.0 a1 3 x4 10.5 a2 4 x5 10.0 a3 5 x6 12.0 a3 #create DataFrame from second file df2 = pd.read_csv(file2, header=None, sep='|') print (df2) 0 0 x1 1 x4 2 x5 s = df1.loc[~df1[0].isin(df2[0]), 0] print (s) 1 x2 2 x3 5 x6 Name: 0, dtype: object #write to file s.to_csv('new.csv', index=False, header=False) 
Sign up to request clarification or add additional context in comments.

20 Comments

Do I have to import something i python?
I get an error on tokenizing data C error expected 1 field saw 2
I do import pandas as pd df1 = pd.read_csv('SKUS.csv', sep="\s+", header=None) df2 = pd.read_csv('retails_csv', header=None) s = df1.loc[~df1[0].isin(df2[0]), 0] s.to_csv('update.csv', index=False)
@Nikos - in first file is separator tab or space or comma?
There no headers
|