Soup is good for you:
>>> from BeautifulSoup import BeautifulSoup >>> soup = BeautifulSoup('''<ul class="something"> ... <li id="li_id"> ... <a href="#" title="myurl">URL Text</a> ... </li> ... </ul>''')
There are many arguments you can pass to the findAll method; more here. The one line below will get you started by returning a list of all links matching some conditions.
>>> soup.findAll(href='#', title='myurl') [<a href="#" title="myurl">URL Text</a>]
Edit: based on OP's comment, added info included:
So let's say you're interested in only tags within list elements of a certain class <li class="li_class">. You could do something like this:
>>> soup = BeautifulSoup('''<li class="li_class"> <a href="#" title="myurl">URL Text</a> <a href="#" title="myurl2">URL Text2</a></li><li class="foo"> <a href="#" title="myurl3">URL Text3</a></li>''') # just some sample html >>> for elem in soup.findAll("li", "li_class"): ... pprint(elem.findAll('a')) # requires `from pprint import pprint` ... [<a href="#" title="myurl">URL Text</a>, <a href="#" title="myurl2">URL Text2</a>]
Soup recipe:
- Download the one file required.
- Place dl'd file in site-packages dir or similar.
- Enjoy your soup.