1

I am reading a document from mongodb and a csv file and merging them both to retrieve duplicate records. I have the following code. Now I want to compare the date (LastUpdate) between these records and return a row which has a latest date. can somebody help?

Code:

import json import pandas as pd import xlrd from pymongo import MongoClient from functools import reduce try: client = MongoClient() print("Connected successfully!!!") except: print("Could not connect to MongoDB") # database db = client.conn collection = db.contactReg df = pd.DataFrame(list(collection.find())) print(df) df1 = df[df.duplicated(['name'], keep = False)] print(df1) # reading the csv file df2 = pd.read_csv(r'C:\Users\swetha1\Desktop\rules.csv') print(df2) df3 = pd.merge(df1,df2,on="source") print(df3) print(df3.dtypes) 

Output:

Connected successfully!!! data from mongo LastUpdate _id name nameId source sourceId 0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10 1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11 2 9-Oct-2018 5bbc86e5c16a27f1e1bd39fa swathi 124.0 Mlore 11 fetching duplicates LastUpdate _id name nameId source sourceId 0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10 1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11 reading CSV file source P.weight N.weight Tolerance(days) Durability(Days) 0 Blore 100 -100 0 0 1 Mlore 200 -200 30 365 merging LastUpdate _id name nameId source sourceId P.weight N.weight Tolerance(days) Durability(Days) 0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10 100 -100 0 0 1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11 200 -200 30 365 

1 Answer 1

1

First convert column to_datetime and then filter by boolean indexing:

df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate']) df4 = df3[df3['LastUpdate'] == df3['LastUpdate'].max()] 

Or use idxmax :

df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate']) df4 = df3.loc[[df3['LastUpdate'].idxmax()]] 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks it works. Extending the above question if i want to compare based on some conditions then how to do it? for eg: " if LastUpdate(Blore) older than Lastupdate(Mlore) then -100 in the N.weight. if LastUpdate(Mlore) older than the other then -200 in N.weight. can u help plz?
@swethareddy - Not sure if understand, for second need max_Mlore_val = df3.loc[(df3['source'] == 'Mlore'), 'LastUpdate'].max() df3['N.weight'] = np.where(df3['LastUpdate'] > max_Mlore_val, -200, df3['N.weight'] ) ?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.