Unable to get the expected html element details using Python

Question

I am trying to scrape a website using Python. I have been able to scrape it successfully, however the expected resulted is not fetching up. I think there is something to do with the JavaScript of the web page.

My Code below:

driver.get( "https://my website") soup=BeautifulSoup(driver.page_source,'lxml') all_text = soup.text ct = all_text.replace('\n', ' ') cl_text = ct.replace('\t', ' ') cln_text_t = cl_text.replace('\r', ' ') cln_text = re.sub(' +', ' ', cln_text_t) print(cln_text)

Instead of giving me the website details it is giving the below data. Any idea how could I fix this?

html, body {height:100%;margin:0;} You have to enable javascript in your browser to use an application built with Vaadin.........

can you share your url or html your are trying and mentioned your expected output as well — KunduK
– KunduK, Commented Jul 15, 2019 at 14:50
have you tried expected conditions to wait for your element to be present? — QHarr
– QHarr, Commented Jul 15, 2019 at 18:45

Dmitri T · Accepted Answer · 2019-07-15 15:26:47Z

Why do you need this BeautifulSoup at all? It doesn't seem to support JavaScript.

If you need to get web page text you can fetch the document root using simple XPath selector of //html and get innerText property of the resulting WebElement

Suggested code change:

driver.get( "my website") root = driver.find_element_by_xpath("//html") all_text = root.get_attribute("innerText")

Thanks Dmitri T. This worked. A good deal of learning as well for me today. :)

Collectives™ on Stack Overflow

Unable to get the expected html element details using Python

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related