1

I want to select contact information by selenium on the website below:

http://buyersguide.recyclingtoday.com/search.

For matching the right information one by one, I want to select the rows first, and then select information from the rows. The simple code as below, my question now is how to select the information from each row. For example, company name, email.

Code:

from time import sleep from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait as wait from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import NoSuchElementException import pandas as pd driver = webdriver.Chrome('D:\chromedriver_win32\chromedriver.exe') driver.get('http://buyersguide.recyclingtoday.com/search') rows = driver.find_elements_by_xpath('//*[@id="Body_tbl"]/tbody/tr') for row in rows: email = row.find_element_by_xpath('//*/tr/td[3]/a').text company=row.find_element_by_xpath('//*/tr/td[1]').text 

Run the code as answers below, but I still face problem?

from time import sleep from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait as wait from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import NoSuchElementException import pandas as pd driver = webdriver.Chrome('D:\chromedriver_win32\chromedriver.exe') driver.get('http://buyersguide.recyclingtoday.com/search') rows = driver.find_elements_by_xpath('//*[@id="Body_tbl"]/tbody/tr') records = [] for row in rows: company=row.find_element_by_xpath('./td[1]').text address = row.find_element_by_xpath('./td[2]').text contact= row.find_element_by_xpath('./td[3]//a').text number= row.find_element_by_xpath('./td[5]').text records.append((company,address,contact,number)) df = pd.DataFrame(records, columns=['company','number','address', 'contact']) 

No content selected

2
  • are telling that you want the data from all the pages? Commented Sep 7, 2018 at 6:43
  • yes, i need the whole data from all the pages,but the code i write seems no work, Commented Sep 7, 2018 at 12:54

3 Answers 3

3

You can get details like,

You have to locate number of Row available in the table without Table Header,

This is Example as according to your HTML.

Example using Python:

rows = driver.find_elements_by_xpath("//td[@style='font-weight:bold;']//parent::tr") for row in rows: company=row.find_element_by_xpath('./td[1]').text address = row.find_element_by_xpath('./td[2]').text contact= row.find_element_by_xpath('./td[3]//a').text number= row.find_element_by_xpath('./td[5]').text 

Example using Java:

List<WebElement> findData = driver.findElements("//td[@style='font-weight:bold;']//parent::tr"); for (WebElement webElement : findData) { String getValueofCompany = webElement.findElement(By.xpath("./td[1]")).getText(); String getValueofAddress = webElement.findElement(By.xpath("./td[2]")).getText(); String getValueofContact = webElement.findElement(By.xpath("./td[3]//a")).getText(); String getValueofPhoneNumber = webElement.findElement(By.xpath("./td[5]")).getText(); } 
Sign up to request clarification or add additional context in comments.

4 Comments

Email id text will not work because it does not contains email directly in text in many cases so we require value of href attribute e.g. '//tbody//tr[4]//td[3]/a'
@Amit Yes, I missed <a> for Emails. I have updated solution.
also we get an error as below File "<ipython-input-62-94e1b71ee87b>", line 10 rows = driver.find_elements_by_xpath('//td[@style='font-weight:bold;']//parent::tr') ^ SyntaxError: invalid syntax
Updated with Double " " ("//td[@style='font-weight:bold;']//parent::tr")
1

The data which you want starts from

tr[3]//td[1] - contains company Name as text

tr[3]//td[3] - contains email but in href attribute

So looping over tr starts from index 3 to rows WebElement length

 rows = driver.find_elements_by_xpath('//*[@id="Body_tbl"]/tbody/tr') for index, element in enumerate(rows,start=2): companyName = rows.find_element_by_xpath("//tr[" + index + "]//td[1]") if companyName is not None: companyName.getText(); companyEmail = driver.find_element_by_xpath("//tr[" + index + "]//td[3]/a") if companyEmail is not None: companyEmail.get_attribute("href"); // this will give exact if email is there 

Note - I was not able to test code, please take care of boundary conditions. Thanks

Comments

1

You can use something like this:

for row in rows: email = row.find_element_by_xpath('.//td[3]/a').text company = row.find_element_by_xpath('.//td[1]').text 

4 Comments

i print email or company to check,but return a name error can you tell me why, NameError: name 'company' is not defined
print("{} {}".format(email, company)) this works for me in the for loop, please share your code if you still face issue..
ok, i share full code for reference, this is still have problem, please try to run
As @Amit Jain pointed out, if you check the website, the first and last two rows do not have any data. So you need to run this rows=rows[2:-2] before going over your for loop. Hope it helps

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.