0

I am scraping a web page with multiple pages. I would very much appreciate your help for my following problem:

I have built a loop around the URL of the web page. However, when looking for the tags in the HTML code only information from page one appears. It seems like the loop is not really flowing through. I unfortunately cannot find my mistake in the following code:

for pagenumber in range(1,50): url = "http://suchen.mobile.de/fahrzeuge/auto/search.html?zipcodeRadius=100&scopeId=C&ambitCountry=DE&makeModelVariant1.makeId=3500&makeModelVariant1.modelId=115%2C98%2C80%2C99%2C102%2C81%2C100%2C83%2C105%2C82%2C101%2C120%2C121&makeModelVariant1.modelGroupId=53&isSearchRequest=true&pageNumber + str(pageNumber)" r = requests.get(url) soup = BeautifulSoup(r.content,"lxml") # parsing the data from the webpage carTypeTemp=[] carTypeWeb = soup.find_all("span", {"class":"h3"}) # writing the car type/description in a list for i in range(0,len(carTypeWeb),2): carTypeTemp.extend((carTypeWeb[i])) 
1
  • pagenumber is not the same as pageNumber and your final double quote should come before the plus sign. Commented Jun 6, 2016 at 21:28

3 Answers 3

1

In your forloop you are doing:

url = "* + str(pageNumber)" 

This is literally what the url will be, and isn't concatenating as you think it is.

>>> "a url + str(pageNumber)" "a url + str(pageNumber)" 

You want:

url = "*" + str(pagenumber) 

Or you could use string formatters, whatever you prefer.

Edit: didn't catch the difference between names / capitalization as noted in the comment.

You want pagenumber not pageNumber. pageNumber doesn't exist.

Sign up to request clarification or add additional context in comments.

1 Comment

Many thanks for your help! This was indeed wrong. Were those lines correct: r = requests.get(url) soup = BeautifulSoup(r.content,"lxml") I still face the problem that my find_all function does not extract data for pages >1. Does my soup variable save all the html code from all 50 pages? Again many thanks for your help! I very much appreciate!
0

Try changing the first two lines in your code to this:

for pagenumber in range(1,50): url = "http://suchen.mobile.de/fahrzeuge/auto/search.html?zipcodeRadius=100&scopeId=C&ambitCountry=DE&makeModelVariant1.makeId=3500&makeModelVariant1.modelId=115%2C98%2C80%2C99%2C102%2C81%2C100%2C83%2C105%2C82%2C101%2C120%2C121&makeModelVariant1.modelGroupId=53&isSearchRequest=true&pageNumber={pagenumber}".format(pagenumber)) 

Right now you're not sending a GET request with a proper URL.

1 Comment

Many thanks for your help! This was indeed wrong. Were those lines correct: r = requests.get(url) soup = BeautifulSoup(r.content,"lxml") I still face the problem that my find_all function does not extract data for pages >1. Does my soup variable save all the html code from all 50 pages? Again many thanks for your help! I very much appreciate!
0

It seems like you forget to put "N" in 'pageNumber' instead of 'n' and change

 url = "https://.................. + str(pageNumber)" 

to

url = ("http://suchen.mobile.de/fahrzeuge..... " + str(pageNumber)) 

this give me a loop of

['BMW 430d xDrive Coupé M Sportpaket Head-Up ACC LED', 'BMW 425d Gran Coupé M-Sportpaket Sport-Aut. Navi Pro', 'BMW 420d xDrive Coupé M Sportpaket Navi Apps PDC'] 

and

['BMW 435i xDrive Gran Coupé M Sportpaket Navi Prof. A', 'BMW 420 Gran Coupé M Sportpaket NEUES MODELL Nav LED', 'BMW 435i Coupé Sport Line GSD Navi Speed Limit Info'] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.