Return to Question

Tweeted twitter.com/StackCodeReview/status/1543610530371440645

occurred Jul 3, 2022 at 15:00

spelling

edited Jul 3, 2022 at 13:11

71.2k
5
76
256

The HTML page shows list of a friend network of a person (each Name has anchor <a> tag w. link to list of friend network). Since the page has a timer, I've written a pyPython code to scrapscrape the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url)

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html

Questions:

Is there a better way to approach this? (libraries, workarounds to delay the timer)
My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code, just suggestions and approaches will do.

The HTML page shows list of a friend network of a person (each Name has anchor <a> tag w. link to list of friend network). Since the page has a timer, I've written a py code to scrap the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url)

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html

Questions:

Is there a better way to approach this? (libraries, workarounds to delay the timer)
My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code, just suggestions and approaches will do.

The HTML page shows list of a friend network of a person (each Name has anchor <a> tag w. link to list of friend network). Since the page has a timer, I've written Python code to scrape the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url)

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html

Questions:

Is there a better way to approach this? (libraries, workarounds to delay the timer)
My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code, just suggestions and approaches will do.

deleted 63 characters in body

Source Link

edited Mar 21, 2019 at 0:51

Jamal

35.2k
13
134
238

Python beginner (no web dev know-how) here: The htmlThe HTML page shows list of a friend network of a person (each Name has anchor tag<a> tag w. link to list of friend network). Since Since the page has a timer, I've written a py code to scrap the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

Code:

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url)

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html

Questions:

Is there a better way to approach this? (libraries, workarounds to delay the timer lol)
My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code(s), just suggestions and approaches will do.

Python beginner (no web dev know-how) here: The html page shows list of a friend network of a person (each Name has anchor tag w. link to list of friend network). Since the page has a timer, I've written a py code to scrap the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

Code:

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url)

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html

Questions:

Is there a better way to approach this? (libraries, workarounds to delay the timer lol)
My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code(s), just suggestions and approaches will do.

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url)

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html

Questions:

Is there a better way to approach this? (libraries, workarounds to delay the timer)
My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code, just suggestions and approaches will do.

Source Link

asked Mar 20, 2019 at 18:34

Sumax

Data retrieval from Dynamic HTML page with time-out (Web scraping w. Python)

Code:

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url)

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html

Questions:

Is there a better way to approach this? (libraries, workarounds to delay the timer lol)
My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code(s), just suggestions and approaches will do.