1

I am trying to parse the following website in order to get all addresses of stores (sorry for my Russian):
http://magnit-info.ru/buyers/adds/1258/14/243795

Here are addresses just for one city at the end of the page. The addresses are placed in the block .b-shops-list. This block is populated dynamically by POST request. When I tried to use requests module and get addresses, it does not work since the block is empty at the beginning (page source).

I am using Selenium right now, but it is really slow. To parse all cities and regions it takes about 2 hours (even with multiprocessing). I also have to use expected_conditions and wait about 4-5 seconds to be sure that POST requests are completed.

Are there any options to accelerate this process? Can I send POST requests somehow by using requests? If yes, how I figure out what kind of POST requests I should sent? This question is also related to websites which use Google maps.

Thank you!

3
  • 2 hours to get 3 addresses? Commented May 31, 2017 at 13:55
  • See stackoverflow.com/q/22168883/3462319 Commented May 31, 2017 at 14:00
  • @depperm, no :) This link is only for one city. There are actually about 64 regions and more than 15 cities per each region on the website. Commented May 31, 2017 at 14:05

1 Answer 1

2

I had a look at the AJAX request that this pages does to load the addresses and came up with this small code snippet:

import requests data = { 'op': 'get_shops', 'SECTION_ID': 1258, 'RID': 14, 'CID': 243795, } res = requests.post('http://magnit-info.ru/functions/bmap/func.php', data=data) addresses = res.json() 

If you check the data dictionary you can clearly see that you could easily generate it from the URL you linked.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.