I am reading a document from mongodb and a csv file and merging them both to retrieve duplicate records. I have the following code. Now I want to compare the date (LastUpdate) between these records and return a row which has a latest date. can somebody help?
Code:
import json import pandas as pd import xlrd from pymongo import MongoClient from functools import reduce try: client = MongoClient() print("Connected successfully!!!") except: print("Could not connect to MongoDB") # database db = client.conn collection = db.contactReg df = pd.DataFrame(list(collection.find())) print(df) df1 = df[df.duplicated(['name'], keep = False)] print(df1) # reading the csv file df2 = pd.read_csv(r'C:\Users\swetha1\Desktop\rules.csv') print(df2) df3 = pd.merge(df1,df2,on="source") print(df3) print(df3.dtypes) Output:
Connected successfully!!! data from mongo LastUpdate _id name nameId source sourceId 0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10 1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11 2 9-Oct-2018 5bbc86e5c16a27f1e1bd39fa swathi 124.0 Mlore 11 fetching duplicates LastUpdate _id name nameId source sourceId 0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10 1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11 reading CSV file source P.weight N.weight Tolerance(days) Durability(Days) 0 Blore 100 -100 0 0 1 Mlore 200 -200 30 365 merging LastUpdate _id name nameId source sourceId P.weight N.weight Tolerance(days) Durability(Days) 0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10 100 -100 0 0 1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11 200 -200 30 365