How find specific data attribute from html tag in BeautifulSoup4?

Question

Is there a way to find an element using only the data attribute in html, and then grab that value?

For example, with this line inside an html doc:

<ul data-bin="Sdafdo39">

How do I retrieve Sdafdo39 by searching the entire html doc for the element that has the data-bin attribute?

xecgr · Accepted Answer · 2014-06-13 05:26:38Z

A little bit more accurate

[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]

This way, the iterated list only has the ul elements that has the attr you want to find

from bs4 import BeautifulSoup bs = BeautifulSoup(html_doc) html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">""" [item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]

thefourtheye · Accepted Answer · 2014-06-13 04:56:51Z

You can use find_all method to get all the tags and filtering based on "data-bin" found in its attributes will get us the actual tag which has got it. Then we can simply extract the value corresponding to it, like this

from bs4 import BeautifulSoup html_doc = """<ul data-bin="Sdafdo39">""" bs = BeautifulSoup(html_doc) print [item["data-bin"] for item in bs.find_all() if "data-bin" in item.attrs] # ['Sdafdo39']

i cant make a html_doc variable because one of my element's attribute is not always the same. <section class='{random_characters_here}' data-type='word-definition-card'> I want to get the class by data-type I also tried to do this section = soup.find_all('section', data_type='word-definition-card') but it just doesnt work :P

emehex · Accepted Answer · 2020-10-09 22:48:10Z

You could solve this with gazpacho in just a couple of lines:

First, import and turn the html into a Soup object:

from gazpacho import Soup html = """<ul data-bin="Sdafdo39">""" soup = Soup(html)

Then you can just search for the "ul" tag and extract the href attribute:

soup.find("ul").attrs["data-bin"] # Sdafdo39

Maximosaic · Accepted Answer · 2022-08-15 10:12:00Z

As an alternative if one prefers to use CSS selectors via select() instead of find_all():

from bs4 import BeautifulSoup html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">""" soup = BeautifulSoup(html_doc) # Select soup.select('ul[data-bin]')

Collectives™ on Stack Overflow

How find specific data attribute from html tag in BeautifulSoup4?

4 Answers 4

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Linked

Related