0

What I am trying to do here:

I am trying to crawl yelp and get reviews from a particular page. However, I just want to modify this script to give "Restaurant name" as input.

For example:

User Input: dennys-san-jose-5

URL: http://www.yelp.com/biz/**dennys-san-jose-5** 

This is the actual script I am using right now:

from bs4 import BeautifulSoup from urllib import urlopen queries = 0 while queries <201: stringQ = str(queries) page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringQ) soup = BeautifulSoup(page) reviews = soup.findAll('p', attrs={'itemprop':'description'}) authors = soup.findAll('span', attrs={'itemprop':'author'}) flag = True indexOf = 1 for review in reviews: dirtyEntry = str(review) while dirtyEntry.index('<') != -1: indexOf = dirtyEntry.index('<') endOf = dirtyEntry.index('>') if flag: dirtyEntry = dirtyEntry[endOf+1:] flag = False else: if(endOf+1 == len(dirtyEntry)): cleanEntry = dirtyEntry[0:indexOf] break else: dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:] f=open("reviews.txt", "a") f.write(cleanEntry) f.write("\n") f.close for author in authors: dirty = str(author) closing = dirty.index('>') dirty = dirty[closing+1:] opening = dirty.index('<') cleanEntry = dirty[0:opening] f=open("bla.txt", "a") f.write(cleanEntry) f.write("\n") f.close queries = queries + 40 

I am trying to read the restaurant name as parameter but it does not work somehow.

What i did:

while queries <201: stringQ = str(queries) page = urlopen('http://www.yelp.com/biz/' + stringQ) 

But it does not work. I am giving dennys-san-jose-5 as input from commandline (python script.py dennys-san-jose-5)

Please suggest me the issue here and how I can fix.

Regards,

0

1 Answer 1

2

To read arguments from the commandline, you can use argparse.

import argparse #Define command line arguments parser = argparse.ArgumentParser(description='Get Yelp reviews.') parser.add_argument("-p", "--page", dest="page", required=True, help="the page to parse") #parse command line arguments args = parser.parse_args() 

Your page name will now be in args.page. In this example, you would run the script like this:

>python script.py -p dennys-san-jose-5 

or

>python script.py --page dennys-san-jose-5 


Edit:

  • If your don't need any fancy stuff, and just want the raw command line input (like in a program that only you will be using, no need to validate input, etc):

    import sys print sys.argv 
  • If you want to prompt the user for a page name as the program is running: Python: user input and commandline arguments

Sign up to request clarification or add additional context in comments.

5 Comments

thanks for your comment. Is there anyway I could directly pass it without using -p or anything? Like, in java can't I pass "dennys-san-jose-5" and it gets appended to the end of the url ("http://www.yelp.com/biz/"+Query+) like in java? Sorry I am a beginner in python
To add to my comment, I have used sys. import sys stringQ = sys.argv[1] page = urlopen('http://www.yelp.com/biz/' + stringQ) and it started to work!! Is it the right way?
If you don't need all the fancy functionality of argparse, you can get the raw command line arguments from sys.argv.
Thank you so much for enlightening! I learn't something today!! :) I have one more question, all the reviews were getting appended everytime. But, I want to overwrite everytime(or create the same file again) everytime I search. f=open("reviews.txt", "a") however, i have modified it to f=open("reviews.txt", "w") but it doesn't work. Can you kindly suggest me why? Any alternatives?
@RockyBalBoa Please post a new question about that (if it hasn'r already been answered somewere), so that others can find it too!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.