0

I need to parse/extract information from an html page. Basically what I'm doing is loading the page as a string using System.Net.WebClient and using HTML Agility Pack to get content inside html tags (forms, labels, imputs and so on).

However, some content is inside a javascript script tag, like this:

<script type="text/javascript"> //<![CDATA[ var itemCol = new Array(); itemCol[0] = { pid: "01010101", Desc: "Some desc", avail: "Available", price: "$10.00" }; itemCol[1] = { pid: "01010101", Desc: "Some desc", avail: "Available", price: "$10.00" }; //]]> </script> 

So, how could I parse it to a collection in .NET? Can HTML Agility Pack help with that? I really appreciate any help.

Thanks in advance.

3 Answers 3

1

The HAP will not parse out the javascript for you - the best it will do is parse out the contents of the element.

javascript.net may fit the bill.

Sign up to request clarification or add additional context in comments.

1 Comment

For some reason I was unable to install javascript.net (got some errors) but anyways, i was able to do the same with Jint. Thanks.
1

what part of the content inside the script tag do you want? What kind of collection are you expecting. You can always select script tags using below

 HtmlDocument document = new HtmlDocument(); document.Load(downloadedHtml); XPathNavigator n = document.CreateNavigator(); XPathNodeIterator scriptTags = n.Select("//script"); foreach (XPathNavigator nav in scriptTags) { string innerXml = nav.InnerXml; // Parse inner xml using regex } 

1 Comment

using javascript.net using (JavascriptContext context = new JavascriptContext()) { context.SetParameter("data", new MyObject()); StringBuilder s = new StringBuilder(); foreach (XPathNavigator nav in scriptTags) { s.Append(nav.InnerXml); } s.Append(";data.item = itemCol;"); context.Run(s.ToString()); MyObject o = context.GetParameter("data") as MyObject; Then just have a datastructure like class MyObject { public object item { get; set; } }
1

using the javascript.net library you can get a collection

 using (JavascriptContext context = new JavascriptContext()) { context.SetParameter("data", new MyObject()); StringBuilder s = new StringBuilder(); foreach (XPathNavigator nav in scriptTags) { s.Append(nav.InnerXml); } s.Append(";data.item = itemCol;"); context.Run(s.ToString()); MyObject o = context.GetParameter("data") as MyObject; 

Then just have a datastructure like

 class MyObject { public object item { get; set; } } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.