26

I'm trying to create a code snippet to remove all style attributes regardless of tag using HtmlAgilityPack.

Here's my code:

var elements = htmlDoc.DocumentNode.SelectNodes("//*"); if (elements!=null) { foreach (var element in elements) { element.Attributes.Remove("style"); } } 

However, I'm not getting it to stick? If I look at the element object immediately after Remove("style"). I can see that the style attribute has been removed, but it still appears in the DocumentNode object. :/

I'm feeling a bit stupid, but it seems off to me? Anyone done this using HtmlAgilityPack? Thanks!

Update

I changed my code to the following, and it works properly:

public static void RemoveStyleAttributes(this HtmlDocument html) { var elementsWithStyleAttribute = html.DocumentNode.SelectNodes("//@style"); if (elementsWithStyleAttribute!=null) { foreach (var element in elementsWithStyleAttribute) { element.Attributes["style"].Remove(); } } } 
2
  • Can you add a reproduction code? because I have tested this html <html style='style1'><body style='style2'></body></html> and it works Commented May 2, 2011 at 6:47
  • Do you use InnerHtml property? At the time of writing this it has a bug, use WriteContentTo method instead. Commented Jul 16, 2011 at 9:10

2 Answers 2

10

Your code snippet seems to be correct - it removes the attributes. The thing is, DocumentNode .InnerHtml(I assume you monitored this property) is a complex property, maybe it get updated after some unknown circumstances and you actually shouldn't use this property to get the document as a string. Instead of it HtmlDocument.Save method for this:

string result = null; using (StringWriter writer = new StringWriter()) { htmlDoc.Save(writer); result = writer.ToString(); } 

now result variable holds the string representation of your document.

One more thing: your code may be improved by changing your expression to "//*[@style]" which gets you only elements with style attribute.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for replying! Yeah, I had changed my code to the following to make it "stick": 'public static void RemoveStyleAttributes(this HtmlDocument html) { var elementsWithStyleAttribute = html.DocumentNode.SelectNodes("//@style"); if (elementsWithStyleAttribute!=null) { foreach (var element in elementsWithStyleAttribute) { element.Attributes["style"].Remove(); } } }' Not sure why my original code didn't work, but I think you're right in your guess. Thanks!
Wow, code formatting in comments isn't great. :) Updated my question with the modified code snippet. Thanks again!
9

Here is a very simple solution

VB.net

element.Attributes.Remove(element.Attributes("style")) 

c#

element.Attributes.Remove(element.Attributes["style"]) 

2 Comments

Thanks, one correction: element.Attributes("style") should be element.Attributes["style"]
You are right cause i don't make it clear : my code is for vb.net

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.