2

Is there a way to extract the source of an image in an HTML file using only one struct (with encode/xml)? Now I have something like this

type XML struct { A Image `xml:"div>img"` } type Image struct { I string `xml:"src,attr"` } 

And would be great to only declare something like this :

type Image struct { I string `xml:"div>img,src,attr"` } 

This is the HTML :

<div><div><img src="hello.png"/></div></div> 
3
  • 1
    Practically No as HTML is not XML and in real HTML parser must cope with HTML bugs. But XML does not allow bugs and XML is not HTML <img src="hello.png"> is not valid in XML as there is no </img> tag or <img src="hello.png" /> Commented Sep 20, 2012 at 21:55
  • Well truth is that's a typo, now it's corrected (thank you for pointing this out). The question still remains the same. Commented Sep 22, 2012 at 13:26
  • This issue discusses exactly that for Go 1.2: code.google.com/p/go/issues/detail?id=3633 Commented Aug 29, 2013 at 19:02

1 Answer 1

1

Seems that a good way is to use the exp/html package, like this:

package main import ( "exp/html" "strings" ) func main() { a, _ := html.Parse(strings.NewReader(testString)) println(a.FirstChild.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild.Attr[0].Val) } var testString = `<div><div><img src="hello.png"/></div></div>` 

All this FirstChild and NextSibling are needed because exp/html constructs a "correct" html5 tree so this code is actually parsing this:

<html> <head></head> <body> <div> <div> <img src="hello.png"/> </div> </div> </body> </html> 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.