2

For a Web-Scraping application, I'm comparing some data from my database with some data, which I scraped from a Website. I save the data in two different columns of my Dataframe.

Comparing works well! I get the amount of Rows which have the same value in the 2 columns. But as my project keeps growing, I also want to know on which index the comparison returns true. How can I do this?

Some additional information

My Dataframe:

df_single["Database"]: Schloss Haindorf Hotelbetriebs GmbH 1. Aichfelder Druck Gesellschaft m.b.H. Rössler Elektro Korbel Elektro Schefbänker AWESOME X e.U. df_single["Scraped"]: Schloss Haindorf Hotelbetriebs GmbH 1. Aichfelder Druck Gesellschaft m.b.H. Elektro Rössler OG Elektro Schefbänker KG AWESOME X e.U. 

My comparison with .eq()

same_single = df_single["Database"].str.lower().eq(df_single["Scraped"].str.lower()).sum() 

My Output:

[IN:] print(same_single) [OUT:] 3 

Wanted Output:

[IN:] print(index where comparison = true) [OUT:]Comparison was true at Index: 3,5 and 7 

1 Answer 1

2

First filter index values by mask to idx and then join values with separator together:

mask = df_single["Database"].str.lower().eq(df_single["Scraped"].str.lower()) idx = df_single.index[mask] print (f"Comparison was true at Index: {', '.join(idx.astype(str))}") 

Or:

print ("Comparison was true at Index: {}".format(', '.join(idx.astype(str)))) 
Sign up to request clarification or add additional context in comments.

1 Comment

Second time you give me a wonderful, easy to understand answer. I appreciate it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.