28

Is there a way to find an element using only the data attribute in html, and then grab that value?

For example, with this line inside an html doc:

<ul data-bin="Sdafdo39"> 

How do I retrieve Sdafdo39 by searching the entire html doc for the element that has the data-bin attribute?

4 Answers 4

43

A little bit more accurate

[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})] 


This way, the iterated list only has the ul elements that has the attr you want to find

from bs4 import BeautifulSoup bs = BeautifulSoup(html_doc) html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">""" [item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})] 


Sign up to request clarification or add additional context in comments.

Comments

19

You can use find_all method to get all the tags and filtering based on "data-bin" found in its attributes will get us the actual tag which has got it. Then we can simply extract the value corresponding to it, like this

from bs4 import BeautifulSoup html_doc = """<ul data-bin="Sdafdo39">""" bs = BeautifulSoup(html_doc) print [item["data-bin"] for item in bs.find_all() if "data-bin" in item.attrs] # ['Sdafdo39'] 

1 Comment

i cant make a html_doc variable because one of my element's attribute is not always the same. <section class='{random_characters_here}' data-type='word-definition-card'> I want to get the class by data-type I also tried to do this section = soup.find_all('section', data_type='word-definition-card') but it just doesnt work :P
4

You could solve this with gazpacho in just a couple of lines:

First, import and turn the html into a Soup object:

from gazpacho import Soup html = """<ul data-bin="Sdafdo39">""" soup = Soup(html) 

Then you can just search for the "ul" tag and extract the href attribute:

soup.find("ul").attrs["data-bin"] # Sdafdo39 

Comments

4

As an alternative if one prefers to use CSS selectors via select() instead of find_all():

from bs4 import BeautifulSoup html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">""" soup = BeautifulSoup(html_doc) # Select soup.select('ul[data-bin]') 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.