25

I have a string from which I have to remove following char: '\r', '\n', and '\t'. I have tried three different ways of removing these char and benchmarked them so I can get the fastest solution.

Following are the methods and there execution time when I ran them 1000000 times:

It should be fastest solution if I have 1 or 2 char to remove. But as I put in more char, it starts to take more time

str = str.Replace("\r", string.Empty).Replace("\n", string.Empty).Replace("\t", string.Empty); 

Execution time = 1695

For 1 or 2 char, this was slower then String.Replace, but for 3 char it showed better performance.

string[] split = str.Split(new char[] { '\t', '\r', '\n' }, StringSplitOptions.None); str = split.Aggregate<string>((str1, str2) => str1 + str2); 

Execution time = 1030

The slowest of all, even with 1 char. Maybe my regular expression is not the best.

str = Regex.Replace(str, "[\r\n\t]", string.Empty, RegexOptions.Compiled); 

Execution time = 3500

These are the three solutions I came up with. Is there any better and faster solution that anyone here know, or any improvements I can do in this code?

String that I used for benchmarking:

StringBuilder builder = new StringBuilder(); builder.AppendFormat("{0}\r\n{1}\t\t\t\r\n{2}\t\r\n{3}\r\n{4}\t\t\r\n{5}\r\n{6}\r\n{7}\r\n{8}\r\n{9}", "SELECT ", "[Extent1].[CustomerID] AS [CustomerID], ", "[Extent1].[NameStyle] AS [NameStyle], ", "[Extent1].[Title] AS [Title], ", "[Extent1].[FirstName] AS [FirstName], ", "[Extent1].[MiddleName] AS [MiddleName], ", "[Extent1].[LastName] AS [LastName], ", "[Extent1].[Suffix] AS [Suffix], ", "[Extent1].[CompanyName] AS [CompanyName], ", "[Extent1].[SalesPerson] AS [SalesPerson], "); string str = builder.ToString(); 

7 Answers 7

22

Here's the uber-fast unsafe version, version 2.

 public static unsafe string StripTabsAndNewlines(string s) { int len = s.Length; char* newChars = stackalloc char[len]; char* currentChar = newChars; for (int i = 0; i < len; ++i) { char c = s[i]; switch (c) { case '\r': case '\n': case '\t': continue; default: *currentChar++ = c; break; } } return new string(newChars, 0, (int)(currentChar - newChars)); } 

And here are the benchmarks (time to strip 1000000 strings in ms)

 cornerback84's String.Replace: 9433 Andy West's String.Concat: 4756 AviJ's char array: 1374 Matt Howells' char pointers: 1163
Sign up to request clarification or add additional context in comments.

4 Comments

Yes it is. Execution time = 195
btw, you need a new machine :P
It's a recent Xeon - probably our benchmarks are just set up differently.
I am surprised how long this has sat here without someone mentioning you can easily get a stack overflow exception when using this on a large string. I really like the feature of not allocating on the heap for something that could get called a lot (people looking for the fastest way are probably calling it a lot...), and if you consider the heap cleanup the performance difference is probably a bit greater than the benchmarks are showing. This needs to conditionally use the heap for large strings so that it can be both fast and reliable.
10

I believe you'll get the best possible performance by composing the new string as a char array and only convert it to a string when you're done, like so:

string s = "abc"; int len = s.Length; char[] s2 = new char[len]; int i2 = 0; for (int i = 0; i < len; i++) { char c = s[i]; if (c != '\r' && c != '\n' && c != '\t') s2[i2++] = c; } return new String(s2, 0, i2); 

EDIT: using String(s2, 0, i2) instead of Trim(), per suggestion

2 Comments

One correction, you have to do return new String(s2).TrimEnd('\0'); And the Execution time = 309. Great
Infact I made a little modification. You are already keeping the length of new array i.e. i2. So rather then trimming, you can use return new String(s2, 0, i2); That brings the execution time to 255
6
String.Join(null, str.Split(new char[] { '\t', '\r', '\n' }, StringSplitOptions.None)); 

might give you a performance increase over using Aggregate() since Join() is designed for strings.

EDIT:

Actually, this might be even better:

String.Concat(str.Split(new char[] { '\t', '\r', '\n' }, StringSplitOptions.None)); 

2 Comments

Nice! I updated my answer to use Concat() instead. Might be worth a try.
There was slight improvement when using String.Concat. Now, Execution time = 734
2

Looping through the string and using (just one) StringBuilder (with the proper constructor argument, to avoid unnecessary memory allocations) to create a new string could be faster.

Comments

2

Even faster:

public static string RemoveMultipleWhiteSpaces(string s) { char[] sResultChars = new char[s.Length]; bool isWhiteSpace = false; int sResultCharsIndex = 0; for (int i = 0; i < s.Length; i++) { if (s[i] == ' ') { if (!isWhiteSpace) { sResultChars[sResultCharsIndex] = s[i]; sResultCharsIndex++; isWhiteSpace = true; } } else { sResultChars[sResultCharsIndex] = s[i]; sResultCharsIndex++; isWhiteSpace = false; } } return new string(sResultChars, 0, sResultCharsIndex); } 

Comments

1

try this

string str = "something \tis \nbetter than nothing"; string removeChars = new String(new Char[]{'\n', '\t'}); string newStr = new string(str.ToCharArray().Where(c => !removeChars.Contains(c)).ToArray()); 

Comments

0
string str; str = str.Replace(Environment.NewLine, string.Empty).Replace("\t", string.Empty); 

1 Comment

This is no different than the SLOW version in the accepted answer. The OP is asking for the fastest.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.