What I am trying to do here:
I am trying to crawl yelp and get reviews from a particular page. However, I just want to modify this script to give "Restaurant name" as input.
For example:
User Input: dennys-san-jose-5
URL: http://www.yelp.com/biz/**dennys-san-jose-5** This is the actual script I am using right now:
from bs4 import BeautifulSoup from urllib import urlopen queries = 0 while queries <201: stringQ = str(queries) page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringQ) soup = BeautifulSoup(page) reviews = soup.findAll('p', attrs={'itemprop':'description'}) authors = soup.findAll('span', attrs={'itemprop':'author'}) flag = True indexOf = 1 for review in reviews: dirtyEntry = str(review) while dirtyEntry.index('<') != -1: indexOf = dirtyEntry.index('<') endOf = dirtyEntry.index('>') if flag: dirtyEntry = dirtyEntry[endOf+1:] flag = False else: if(endOf+1 == len(dirtyEntry)): cleanEntry = dirtyEntry[0:indexOf] break else: dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:] f=open("reviews.txt", "a") f.write(cleanEntry) f.write("\n") f.close for author in authors: dirty = str(author) closing = dirty.index('>') dirty = dirty[closing+1:] opening = dirty.index('<') cleanEntry = dirty[0:opening] f=open("bla.txt", "a") f.write(cleanEntry) f.write("\n") f.close queries = queries + 40 I am trying to read the restaurant name as parameter but it does not work somehow.
What i did:
while queries <201: stringQ = str(queries) page = urlopen('http://www.yelp.com/biz/' + stringQ) But it does not work. I am giving dennys-san-jose-5 as input from commandline (python script.py dennys-san-jose-5)
Please suggest me the issue here and how I can fix.
Regards,