0

I have a html document with the structure:

<!DOCTYPE html> <html> <body> <p>One</p> <p>Two</p> <p>Three</p> </body> </html> 

Advise module for Python, with which I can make:

var = ModuleName.html.bode.p2 print(var) Two 
1
  • 3
    Use Beautifulsoup and CSS selectors or lxml Commented Nov 24, 2015 at 16:00

2 Answers 2

2

BeautifulSoup would make it quite close to what you are asking about:

from bs4 import BeautifulSoup soup = BeautifulSoup(data) print(soup.html.body("p")[1].text) # prints Two 

In other words, the dot here shortcuts to "find", the parenthesis shortcut to "find all".

Sign up to request clarification or add additional context in comments.

Comments

1

I would recommend you use BeautifulSoup to parse your HTML and extract the content you want with css selectors.

You can find an example of something very similar to what you want to do in the documentation : http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

Edit: Here is a snippet of code since the documentation has a typo and it ommits the ":" in the selector string.

from bs4 import BeautifulSoup data = "<!DOCTYPE html> <html> <body><p>One</p><p>Two</p><p>Three</p></body></html>" soup = BeautifulSoup(data, 'html.parser') print soup.body.select("p:nth-of-type(2)") 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.