I am comparing two excel files by searching a column value in other file and if that its not present in other file, It will write that whole row to text file.
My excels files are very large, They contain about 2,90,000 rows
Here is what I have tried
import sys import pandas as pd orig_stdout = sys.stdout f = open('out.txt', 'w') sys.stdout = f` df0 = pd.ExcelFile('1.xlsx').parse('Sheet1') df1 = pd.ExcelFile('v2.xlsx').parse('Sheet1') print (df0[~df0['initial_id'].isin(df1['initial_id'])]) sys.stdout = orig_stdout f.close() print('Done.')' compare a value under initial_id column and if its not present in second excel file , print that whole row from first file to output file
Actual Result
21 EXCLAMATION MARK A1 INVERTED EXCLAMATION MARK 22 QUOTATION MARK A2 CENT SIGN 23 NUMBER SIGN A3 POUND SIGN 24 DOLLAR SIGN A4 CURRENCY SIGN 25 PERCENT SIGN A5 YEN SIGN 26 AMPERSAND A6 BROKEN BAR 27 APOSTROPHE A7 SECTION SIGN ... ... ... ... 3159 DIGIT NINE B9 SUPERSCRIPT ONE 3160 COLON BA MASCULINE ORDINAL INDICATOR 3161 SEMICOLON BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 3162 LESS-THAN SIGN BC VULGAR FRACTION ONE QUARTER 3163 EQUALS SIGN BD VULGAR FRACTION ONE HALF Expected Result
Missing lines after 27 should also be written to file. If It consumes RAM to store, Part files will also work
to_csvor something?