0

I m using python requests to search the following site: https://www.investing.com/ for the terms "Durable Goods Orders US"

I check in the "Network" tab of the inspect panel, and it seems it is simply done with the following form: 'quotes_search_text':'Durable Goods Orders US'

So I tried with python:

URL = 'https://www.investing.com/' data = {'quotes_search_text':'Durable Goods Orders US'} resp = requests.post(URL, data=data, headers={ 'User-Agent': 'Mozilla/5.0', 'X-Requested-With': 'XMLHttpRequest'}) 

However this doesnt return the result that i can see while doing it manually. All the search results should have "gs-title" as a class attribute (as per the page inspection) but when I do:

soup = BeautifulSoup(resp.text, 'html.parser') soup.select(".gs-title") 

I see no results... Is there some aspect of POST request that I am not taking into account? (im a complete noob here)

9
  • I believe your find_all selector is looking for a class attribute when it's expecting an HTML tag. Commented Apr 18, 2017 at 16:45
  • @double_j no i'm looking for a class attribute... here what the target element looks like: <a class="gs-title" href="https://www.investing.com/economic-calendar/durable-goods-orders-86" target="_blank" dir="ltr" data-cturl="https://www.google.com/url?q=https://www.investing.com/economic-calendar/durable-goods-orders-86&amp;sa=U&amp;ved=0ahUKEwi28NG5tK7TAhWOa1AKHVhUBncQFggEMAA&amp;client=internal-uds-cse&amp;usg=AFQjCNEuRaJ1WI-VxrmeJ5VISPuraZ_Sug" data-ctorig="https://www.investing.com/economic-calendar/durable-goods-orders-86">United States <b>Durable Goods Orders</b> MoM</a> Commented Apr 18, 2017 at 16:47
  • That's okay, but BeautifulSoup will never find that tag the way you have it right now. You should write it like this: soup.find_all('a', {'class':'gs-title'}) Commented Apr 18, 2017 at 16:48
  • If you want to use CSS selectors then you need to use the select method. Commented Apr 18, 2017 at 16:51
  • @double_jThe correct syntax doesn't return anything either... I'll edit my question with it though. In fact I have been printing resp.text to manually search with ctrl-f in it, and I could see the correct page is not returned. So it's really with request that i need help. Commented Apr 18, 2017 at 16:51

1 Answer 1

1

After going over this in detail in the chat, there are many changes. In order to retrieve the information your looking for, you need to run the JS that's being run on their end. You can change the query variable to whatever you want.

import requests import json from urllib.parse import quote_plus URL = 'https://www.googleapis.com/customsearch/v1element' query = 'Durable Goods Orders US' query_formatted = quote_plus(query) data = { 'key':'AIzaSyCVAXiUzRYsML1Pv6RwSG1gunmMikTzQqY', 'num':10, 'hl':'en', 'prettyPrint':'true', 'source':'gcsc', 'gss':'.com', 'cx':'015447872197439536574:fy9sb1kxnp8', 'q':query_formatted, 'googlehost':'www.google.com' } headers = { 'User-Agent':'Mozilla/5.0', 'Referer':'https://www.investing.com/search?q=' + query_formatted, } resp = requests.get(URL, params=data, headers=headers) j = json.loads(resp.text) # print(resp.text) for r in j['results']: print(r['title'], r['url']) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.