4

I'll start from the end:
In my C# program, I have a string containing HTML, and I'd like to remove from the elements in this string, all inline style attributes (style=".."), and all classes beginning with 'abc'.
I'm willing to use regular expressions for this, even though some people bitch about it :).

(an explanation, for those wishing to berate me for parsing HTML strings:
I'm forced to use some less-than-optimal web control for my project. the control is designed to be used server-side (i.e with postbacks and all that stuff), but I'm required to use it in ajax calls.
which means that I have to configure it in code, call its Render() method which gives me the HTML string, and pass that string to the client-side, where it's inserted into the DOM at the appropriate place. Unfortunately, I wasn't able to find the correct configuration of the control to stop it from rendering itself with these useless styles and classes, so I'm forced to remove them by hand. Please don't hate me.)

2 Answers 2

10

Try this:

string html; string cleaned = new Regex("style=\"[^\"]*\"").Replace(html, ""); string cleaned = new Regex("(?<=class=\")([^\"]*)\\babc\\w*\\b([^\"]*)(?=\")").Replace(cleaned, "$1$2"); 
Sign up to request clarification or add additional context in comments.

1 Comment

not working for me, source and result same. not effect
8

To anyone interested- I've solved this without using RegEx;
Rather, I used XDocument to parse the html-

private string MakeHtmlGood(string html) { var xmlDoc = XDocument.Parse(html); // Remove all inline styles xmlDoc.Descendants().Attributes("style").Remove(); // Remove all classes inserted by 3rd party, without removing our own lovely classes foreach (var node in xmlDoc.Descendants()) { var classAttribute = node.Attributes("class").SingleOrDefault(); if (classAttribute == null) { continue; } var classesThatShouldStay = classAttribute.Value.Split(' ').Where(className => !className.StartsWith("abc")); classAttribute.SetValue(string.Join(" ", classesThatShouldStay)); } return xmlDoc.ToString(); } 

3 Comments

Make HTML Good I got a great laugh out of that one. Thanks for the humor
error: There are multiple root elements. Line 1, position 126.
You have to put a dummy root in to work, but the HTML has to be absolutely perfect or it won't work at all. HTMLAgilityPack can parse bad HTML (99.99% of html on the web!).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.