0

I am struggling to get this while loop to work in python.

urlList = [] while True: for r in range(1, 5000): try: response = urllib.request.urlopen('www.somewebsite.com/v0/info/' + str(r) + '.json') html = response.read().decode('utf-8') data = json.loads(html) if 'url' in data: urlList.append(data['url']) if len(urlList) == 100: break except urllib.error.HTTPError as err: print (err) continue print (urlList) 

I currently have the if statement to break out of the while loop if the list length equals 100. which throws an odd error of urllib.error.URLError:

I also tried While len(urlList) != 100 which makes the process not run. Also While len(urlList) < 100 just makes it run through the entire range function.

10
  • Sorry I am unclear what you are asking. Is - 'www.somewebsite.com' + str(r) + '.json' a real url? Commented Aug 4, 2015 at 19:11
  • no its an internal URL somewebsite.com is a place holder and stores a bunch of json files which are all 1.json 2.json 3.json etc... Commented Aug 4, 2015 at 19:12
  • www.somewebsite.com' + str(r) + '.json will give you a malformed URL, you need a / between the domain and the file, no? Commented Aug 4, 2015 at 19:12
  • What is the exact issue you are facing? Commented Aug 4, 2015 at 19:13
  • ok i appended the URL to be more accurate Commented Aug 4, 2015 at 19:13

2 Answers 2

4

Your urls are invalid.

response = urllib.request.urlopen('www.somewebsite.com' + str(r) + '.json') 

This becomes:

www.somewebsite.com1.json www.somewebsite.com2.json www.somewebsite.com3.json ... 

These invalid URLs throw an urllib.error.HTTPError error.


Now that you've corrected the url, the above is invalid. The issue you have is because the break is breaking out of your inner loop (the for) and dropping you into the while loop, which repeats everything again.

Try changing the code to be more like this:

urlList = [] for r in range(1, 5000): response = ...... ... if 'url' in data: urlList.append(data['url']) if len(urlList) == 100: break 

This removes the while loop. It keeps the range, which seems to be important to your URLs. When the list reaches a size of 100, it'll break out of this single loop.

Sign up to request clarification or add additional context in comments.

1 Comment

perfect! thanks man I didnt know you can use a break on range...got a 2min timer on marking as answer.....
0

You have 2 loops but only break once.

You should keep your break as it is to break the for loop, but the while should also have condition as you wrote while len(urlList)<100

Both together should exit your loop correctly.

A one more comment, on the response object - call close()

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.