Extracting Content Within Multiple Span Tags in BeautifulSoup

Question

I'm trying to extract string content from and within multiple span tags. A snap shot of the HTML page is:

<div class="secondary-attributes"> <span class="neighborhood-str-list"> Southeast </span> <address> 1234 Python Blvd S<br>Somewhere, NV 98765 </address> <span class="biz-phone"> (555) 123-4567 </span> </div>

Specifically, I'm trying to extract the phone number, nestled in between the <span class="biz-phone></span> tags. I attempted to do so with the following code:

import requests from bs4 import BeautifulSoup res = requests.get(url) soup = BeautifulSoup(res.text, "html.parser") phone_number_results = [phone_numbers for phone_numbers in soup.find_all('span','biz-phone')]

The code compiled without any syntax errors, but it didn't quite give me the result I was hoping for:

['<span class="biz-phone">\n (702) 476-5050\n </span>', '<span class="biz-phone">\n (702) 253-7296\n </span>', '< span class="biz-phone">\n (702) 385-7912\n </span>', '<span class="biz-phone">\n (702) 776-7061\n </span>', '<spa n class="biz-phone">\n (702) 221-7296\n </span>', '<span class="biz-phone">\n (702) 252-7296\n </span>', '<span c lass="biz-phone">\n (702) 659-9101\n </span>', '<span class="biz-phone">\n (702) 355-9445\n </span>', '<span clas s="biz-phone">\n (702) 396-3333\n </span>', '<span class="biz-phone">\n (702) 643-9851\n </span>', '<span class=" biz-phone">\n (702) 222-1441\n </span>']

My question has two parts:

Why are the span tags appearing when I run the program?
How do I get rid of them? I could just do string editing, but I feel like I wouldn't be taking full advantage of the BeautifulSoup package. Is there a more elegant way?

NOTE: there are more snippets of HTML code like the one shown above throughout the page; there are more instances of the <span class="biz-phone"> (555) 123-4567 </span> code (i.e., more phone numbers) that need to be extracted, hence why I was thinking of using find_all().

Thank you in advance.

use phone_numbers.text or even phone_numbers.text.strip() — furas
– furas, Commented Oct 30, 2016 at 20:50

dmcc · Accepted Answer · 2016-10-30 20:53:58Z

find_all() returns a list of tags (bs4.element.Tag), not strings.
As @furas points out, you want to access the text property on each of the tags to extract the text within the tag:

phone_number_results = [phone_numbers.text.strip() for phone_numbers in soup.find_all('span', 'biz-phone')]

(you may also want to call strip() on top of that)

Thank you, the .text did the trick! I wasn't aware of that property- I tried a few others (i.e., .contents) but that didn't seem to help. Your solution worked, though.

Collectives™ on Stack Overflow

Extracting Content Within Multiple Span Tags in BeautifulSoup

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related