0

I am trying to scrape a website using Python. I have been able to scrape it successfully, however the expected resulted is not fetching up. I think there is something to do with the JavaScript of the web page.

My Code below:

driver.get( "https://my website") soup=BeautifulSoup(driver.page_source,'lxml') all_text = soup.text ct = all_text.replace('\n', ' ') cl_text = ct.replace('\t', ' ') cln_text_t = cl_text.replace('\r', ' ') cln_text = re.sub(' +', ' ', cln_text_t) print(cln_text) 

Instead of giving me the website details it is giving the below data. Any idea how could I fix this?

html, body {height:100%;margin:0;} You have to enable javascript in your browser to use an application built with Vaadin......... 
2
  • can you share your url or html your are trying and mentioned your expected output as well Commented Jul 15, 2019 at 14:50
  • have you tried expected conditions to wait for your element to be present? Commented Jul 15, 2019 at 18:45

1 Answer 1

1

Why do you need this BeautifulSoup at all? It doesn't seem to support JavaScript.

If you need to get web page text you can fetch the document root using simple XPath selector of //html and get innerText property of the resulting WebElement

Suggested code change:

driver.get( "my website") root = driver.find_element_by_xpath("//html") all_text = root.get_attribute("innerText") 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Dmitri T. This worked. A good deal of learning as well for me today. :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.