I want to get text and data from a webpage. when a page load completes inside web-browser control, I just want to extract text from the page by element id? please help me how can i achieve this like html-agility & c#. Sorry for my poor english.
- Are alternative (more modern) libraries like CsQueries allowed? Also, if you just want the whole text of everything you don't need any library.Benjamin Gruenbaum– Benjamin Gruenbaum2014-01-01 10:39:49 +00:00Commented Jan 1, 2014 at 10:39
- I just need few text by html id. example, <div id="getid">ID00123</div>. so i want to know how can i get "ID00123" from my program. I prefer to use c# windows app.mbdAli– mbdAli2014-01-01 10:43:44 +00:00Commented Jan 1, 2014 at 10:43
Add a comment |
1 Answer
You could use the GetElementbyId method on the HtmlDocument which allows you to retrieve some specific DOM element by its identifier:
string html = ... Read the HTML here var htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc.OptionFixNestedTags = true; htmlDoc.LoadHtml(html); var element = htmlDoc.GetElementbyId("someId"); if (element != null) { string data = element.InnerText; } 6 Comments
mbdAli
Thanks. for one element ok, but I need to get around 10 elements from one page url.?
Darin Dimitrov
How about using a loop? If there's some pattern for the element ids you could simply loop through them.
mbdAli
I can see element-id by viewing page source, but there are no patterns, element ids are looks completely different. can you please provide me an example.
Darin Dimitrov
In this case you cannot use the element
id to retrieve the values. You should use some other information that doesn't change. For example if there are some class values or even the DOM structure itself. It's impossible to say without having more details about the DOM structure you are dealing with. |