Scraping Multiple Web Pages using Python

Question

I want to scrape from multiple websites with similar url's such as https://woollahra.ljhooker.com.au/our-team, https://chinatown.ljhooker.com.au/our-team and https://bondibeach.ljhooker.com.au/our-team.

I have already written a script that works for the first website, however I am unsure how to tell it to scrape from the other two websites.

My code:

from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = "https://woollahra.ljhooker.com.au/our-team" page_soup = soup(page_html, "html.parser") containers = page_soup.findAll("div", {"class":"team-details"}) for container in containers: agent_name = container.findAll("div", {"class":"team-name"}) name = agent_name[0].text phone = container.findAll("span", {"class":"phone"}) mobile = phone[0].text print("name: " + name) print("mobile: " + mobile)

Is there a way that I can simply list the different part of the url (woollahra, chinatown, bondibeach), so that the script will loop through each webpage using the code I have already written?

Make a list of urls and iterate through them and put few seconds of sleep between them — Rachit kapadia
– Rachit kapadia, Commented Aug 4, 2017 at 0:30
I would suggest using lxml as the parser, to improve performance. You can also use SoupStrainer to only parse relevant segments of the source, to further improve performance. — Luke
– Luke, Commented Aug 4, 2017 at 0:37

José Garcia · Accepted Answer · 2017-08-04 00:26:40Z

2

locations = ['woollahra', 'chinatown', 'bondibeach'] for location in locations: my_url = 'https://' + location + '.ljhooker.com.au/our-team'

followed by the rest of your code, that will look over each element of the list, you can add more locations later

answered Aug 4, 2017 at 0:26

José Garcia

1369 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Oren Over a year ago

Thanks @JoséGarcia for the response, however my code is only printing the last location in the list (bondibeach). I am not sure why or how to fix.

José Garcia Over a year ago

This is not the question you asked, in order for us to see what is going on with your code, provide the working code, because this one doesn't even use the variable my_url. My guess is you found a code snippet on the internet and tried to replace things without looking how it worked, if that is the case, please read the documentation first, if not, please update your question so we could help you solve your problem.

OneCricketeer · Accepted Answer · 2017-08-04 00:27:22Z

2

You just want a loop

for team in ["woollahra", "chinatown", "bondibeach"]: my_url = "https://{}.ljhooker.com.au/our-team".format(team) page_soup = soup(page_html, "html.parser") # make sure you indent the rest of the code

answered Aug 4, 2017 at 0:27

OneCricketeer

193k20 gold badges146 silver badges276 bronze badges

2 Comments

Oren Over a year ago

Thanks @cricket_007 for the response, however my code is only printing the last location in the list (bondibeach). I am not sure why or how to fix.

OneCricketeer Over a year ago

This code is no different than the accepted answer... And a for team in [] will always loop over every team

Collectives™ on Stack Overflow

Scraping Multiple Web Pages using Python

2 Answers 2

2 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Related