Skip to main content
Tweeted twitter.com/StackCodeReview/status/1543610530371440645
spelling
Source Link
Reinderien
  • 71.2k
  • 5
  • 76
  • 256

The HTML page shows list of a friend network of a person (each Name has anchor <a> tag w. link to list of friend network). Since the page has a timer, I've written a pyPython code to scrapscrape the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url) 

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10 

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html 

Questions:

  1. Is there a better way to approach this? (libraries, workarounds to delay the timer)

  2. My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code, just suggestions and approaches will do.

The HTML page shows list of a friend network of a person (each Name has anchor <a> tag w. link to list of friend network). Since the page has a timer, I've written a py code to scrap the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url) 

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10 

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html 

Questions:

  1. Is there a better way to approach this? (libraries, workarounds to delay the timer)

  2. My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code, just suggestions and approaches will do.

The HTML page shows list of a friend network of a person (each Name has anchor <a> tag w. link to list of friend network). Since the page has a timer, I've written Python code to scrape the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url) 

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10 

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html 

Questions:

  1. Is there a better way to approach this? (libraries, workarounds to delay the timer)

  2. My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code, just suggestions and approaches will do.

deleted 63 characters in body
Source Link
Jamal
  • 35.2k
  • 13
  • 134
  • 238

Python beginner (no web dev know-how) here: The htmlThe HTML page shows list of a friend network of a person (each Name has anchor tag<a> tag w. link to list of friend network). Since Since the page has a timer, I've written a py code to scrap the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

Code:

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url) 

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10 

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html 

Questions:

  1. Is there a better way to approach this? (libraries, workarounds to delay the timer lol)

  2. My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code(s), just suggestions and approaches will do.

Python beginner (no web dev know-how) here: The html page shows list of a friend network of a person (each Name has anchor tag w. link to list of friend network). Since the page has a timer, I've written a py code to scrap the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

Code:

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url) 

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10 

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html 

Questions:

  1. Is there a better way to approach this? (libraries, workarounds to delay the timer lol)

  2. My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code(s), just suggestions and approaches will do.

The HTML page shows list of a friend network of a person (each Name has anchor <a> tag w. link to list of friend network). Since the page has a timer, I've written a py code to scrap the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url) 

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10 

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html 

Questions:

  1. Is there a better way to approach this? (libraries, workarounds to delay the timer)

  2. My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code, just suggestions and approaches will do.

Source Link
Sumax
  • 121
  • 1

Data retrieval from Dynamic HTML page with time-out (Web scraping w. Python)

Python beginner (no web dev know-how) here: The html page shows list of a friend network of a person (each Name has anchor tag w. link to list of friend network). Since the page has a timer, I've written a py code to scrap the mth position (friend) of the nth count (page) by traversing through the cycle: (m->n->m->n....). And it works!

Code:

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter URL: ') position = int(input('Enter position: ')) #Name/link Traverse count = int(input('Enter count: ')) #Page Traverse print("Retrieving:", url) for c in range(count): #returns range of indices html = urllib.request.urlopen(url, context=ctx).read() #opening URL soup = BeautifulSoup(html, 'html.parser') a_tags=soup('a') link=a_tags[position-1].get('href', None) #url = href(key) value pair content=a_tags[position-1].contents #name=a_tag.contents url=link print("Retrieving:", url) 

Input:

Enter URL: http://py4e-data.dr-chuck.net/known_by_Kory.html Enter position: 1 Enter count: 10 

Output:

Retrieving: http://py4e-data.dr-chuck.net/known_by_Kory.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Shaurya.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Raigen.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Dougal.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Aonghus.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Daryn.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Pauline.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Laia.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Iagan.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Leanna.html Retrieving: http://py4e-data.dr-chuck.net/known_by_Malakhy.html 

Questions:

  1. Is there a better way to approach this? (libraries, workarounds to delay the timer lol)

  2. My goals is to make an exhaustive 'list' of friends of all unique Names here; I don't want any code(s), just suggestions and approaches will do.