0

I developed a web scraper, which goes through the profiles of a Facebook-like website(Lang-8) and save the required data. However, I do not know how to develop a system so that, in case the PC crashes, the code resumes from the last profile it scanned

 import requests from bs4 import BeautifulSoup profile = 1 while profile <= max_profiles: url = "http://lang-8.com/" + str(profile) source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, features="html.parser") for lang in soup.findAll('dd', {'class':'studying_lang_name'}): lang1 = str(lang.string) if lang1 == "\n\nPolish\n": journal = str(url) + "/journals" open_article(journal) profile += 1 def open_article(url2): in_page = 1 while in_page < 5: source_code = requests.get(url2 + "?page=" + str(in_page)) plain_text = source_code.text soup = BeautifulSoup(plain_text, features="html.parser") for link in soup.findAll('h3', {'class':'journal_title'}): href1 = str(link.find('a').get("href")) file_create(href1) in_page += 1 def file_create(linked): source_code = requests.get(linked) plain_text = source_code.text soup = BeautifulSoup(plain_text, features="html.parser") for text in soup.findAll('li', {'class':'corrections_num'}): corrections = text.text for content in soup.findAll('div', {'id':'body_show_ori'}): text1 = content.text fout = open(linked[-1] + linked[-2] + linked[-3] + "_" + corrections + "_.txt", 'w', encoding='utf-8') fout.write(text1) fout.close() 
1

1 Answer 1

0

I would create and update a progress file as you complete a profile scrape.

After your profile += 1 add something like:

fprogress = open("progress.txt","w") fprogress.write("%d" % profile) fprogress.close() 

Then on load where you set profile to 1:

if os.path.isfile('progress.txt'): fprogress = open("progress.txt", "r") profile = int(fprogress.read()) else: profile = 1 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank-you so much. This seems to solve my problem

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.