1

I'm trying to extract string content from and within multiple span tags. A snap shot of the HTML page is:

<div class="secondary-attributes"> <span class="neighborhood-str-list"> Southeast </span> <address> 1234 Python Blvd S<br>Somewhere, NV 98765 </address> <span class="biz-phone"> (555) 123-4567 </span> </div> 

Specifically, I'm trying to extract the phone number, nestled in between the <span class="biz-phone></span> tags. I attempted to do so with the following code:

import requests from bs4 import BeautifulSoup res = requests.get(url) soup = BeautifulSoup(res.text, "html.parser") phone_number_results = [phone_numbers for phone_numbers in soup.find_all('span','biz-phone')] 

The code compiled without any syntax errors, but it didn't quite give me the result I was hoping for:

['<span class="biz-phone">\n (702) 476-5050\n </span>', '<span class="biz-phone">\n (702) 253-7296\n </span>', '< span class="biz-phone">\n (702) 385-7912\n </span>', '<span class="biz-phone">\n (702) 776-7061\n </span>', '<spa n class="biz-phone">\n (702) 221-7296\n </span>', '<span class="biz-phone">\n (702) 252-7296\n </span>', '<span c lass="biz-phone">\n (702) 659-9101\n </span>', '<span class="biz-phone">\n (702) 355-9445\n </span>', '<span clas s="biz-phone">\n (702) 396-3333\n </span>', '<span class="biz-phone">\n (702) 643-9851\n </span>', '<span class=" biz-phone">\n (702) 222-1441\n </span>'] 

My question has two parts:

  1. Why are the span tags appearing when I run the program?
  2. How do I get rid of them? I could just do string editing, but I feel like I wouldn't be taking full advantage of the BeautifulSoup package. Is there a more elegant way?

NOTE: there are more snippets of HTML code like the one shown above throughout the page; there are more instances of the <span class="biz-phone"> (555) 123-4567 </span> code (i.e., more phone numbers) that need to be extracted, hence why I was thinking of using find_all().

Thank you in advance.

2
  • 2
    use phone_numbers.text or even phone_numbers.text.strip() Commented Oct 30, 2016 at 20:50
  • Thank you @furas, that did the trick! Commented Oct 31, 2016 at 4:07

1 Answer 1

2
  1. find_all() returns a list of tags (bs4.element.Tag), not strings.

  2. As @furas points out, you want to access the text property on each of the tags to extract the text within the tag:

    phone_number_results = [phone_numbers.text.strip() for phone_numbers in soup.find_all('span', 'biz-phone')]

(you may also want to call strip() on top of that)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, the .text did the trick! I wasn't aware of that property- I tried a few others (i.e., .contents) but that didn't seem to help. Your solution worked, though.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.