Extracting specific data with BeautifulSoup

Question

I want to extract a bit of data from this snippet:

<div id="information_content"> <b>Name:</b> file.rar <br> <b>Date Modified:</b> 2 days ago <br> <b>Size:</b> 212.19 MB <br> <b>Type:</b> Archive <br> <b>Permissions:</b> Public </div> </div>

I want to extract only 212.19 MB.

I have extracted the snippet using soup.find('div', attrs={'id': 'information_content'}) but I can't figure out how to drill further down to get what I need.

Can anybody help?

You can find answer here: stackoverflow.com/questions/21750979/… — WKPlus
– WKPlus, Commented Feb 13, 2014 at 10:47

l3aronsansgland · Accepted Answer · 2014-02-13 11:15:35Z

0

As BeautifulSoup doesn't support Xpath, the best way would be to use lxml.

answered Feb 13, 2014 at 11:15

l3aronsansgland

3442 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

combuilder · Accepted Answer · 2014-02-13 11:57:12Z

If the DIV has always the same structure, you can follow this instructions, using BeautifulSoup. Once you get the DIV extracted, create a new LIST with the text, splitted by '\n'. Then, just select the right element of the list.

I've done something similar and here I explained everything I did: Python and BeautifulSoup: extracting prizes from Quiniela - http://www.manejandodatos.es/2014/2/python-beautifulsoup-extracting-prizes-quiniela

I hope it helps!

dekkerr · Accepted Answer · 2014-02-13 12:22:32Z

As said previously, if the structure of these divs is always the same, the size will be in the third string if you split.

>>>> x = '<div id="information_content"> <b>Name:</b> file.rar <br> <b>Date Modified:</b> 2 days ago <br> <b>Size:</b> 212.19 MB <br> <b>Type:</b> Archive <br> <b>Permissions:</b> Public </div> </div>' >>>> x.split('<br>')[2] ' <b>Size:</b> 212.19 MB '

From there you can use regular expressions to get just the part you need. For example this pattern matches all values of this kind of formatting:

\d+.\d\d\s.B

it matches 10.00 kB as well as 1000.34 TB

Collectives™ on Stack Overflow

Extracting specific data with BeautifulSoup

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related