Web-Scraper

A simple web-scraper to get the first 10 products url's and start crawling the relevant data for those products. I have tried to incorporate minimal use of existing sources. I have used regular expressions and lxml library to scrap the data.

Product attributes fetched :

Product Name
Product ID
Product Category
Product Thumbnail
Product Selling price
Product discount
Product URL
Product Availablity at Pin code
Product Estimated delivery time at Pin code

Sample url.json

{ "url": [ [ "http://www.shopclues.com/phillps-lcd-18.5-193v5lsb23-94.html" ] ] }

Sample product.json

{ "CATEGORY": "Monitors", "DISCOUNT": "Rs. 901", "EST DELIVERY": { "575001": [ "Tue 13 Dec", "Fri 9 Dec" ], "560070": [ "Tue 13 Dec", "Fri 9 Dec" ], "671551": [ "Fri 16 Dec", "Mon 12 Dec" ] }, "NAME": "Phillps LCD 18.5 193V5LSB23/94", "SELLING_PRICE": "5099.00", "URL": "http://www.shopclues.com/phillps-lcd-18.5-193v5lsb23-94.html", "AVAILABILITY": [ [ 560070, 1 ], [ 575001, 1 ], [ 671551, 1 ] ], "THUMBNAIL": "http://cdn02.shopclues.net/images/thumbnails/16771/160/160/193V5LSB294RTPglobal0011430486062.jpg", "PRODUCT_ID": "76773178" }

Requirements

1. Python 2.7 2. Works on Linux Distros

Get the sources:

git clone https://github.com/guptaabhinav206/Web-Scraper.git

Setup

a. Automate:

1. chmod +x setup.sh 2. source setup.sh

b. Manual

1. Intall pip(`sudo apt-get/yum install pip`). 2. Install virtualenv(`sudo apt-get/yum install virtualenv`). 3. Create virtualenv and activate it(`virtualenv env && source env/bin/activate`). 4. Install required packages(`pip install -r requirements.txt`).

Execute

python app.py

Note : The execution time of the application will depend on the third party response time.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
Output.txt		Output.txt
README.md		README.md
app.py		app.py
products.json		products.json
requirements.txt		requirements.txt
setup.sh		setup.sh
urls.json		urls.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Scraper

Requirements

Get the sources:

Setup

Execute

Note : The execution time of the application will depend on the third party response time.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web-Scraper

Requirements

Get the sources:

Setup

Execute

Note : The execution time of the application will depend on the third party response time.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages