28

I'm trying to fetch multiple email addresses seperated by "," within string from database table, but it's also returning me whitespaces, and I want to remove the whitespace quickly.

The following code does remove whitespace, but it also becomes slow whenever I try to fetch large number email addresses in a string like to 30000, and then try to remove whitespace between them. It takes more than four to five minutes to remove those spaces.

 Regex Spaces = new Regex(@"\s+", RegexOptions.Compiled); txtEmailID.Text = MultipleSpaces.Replace(emailaddress),""); 

Could anyone please tell me how can I remove the whitespace within a second even for large number of email address?

5
  • Whitespace != spaces (the latter is broader and includes e.g. line breaks). Commented Mar 5, 2011 at 11:49
  • stackoverflow.com/q/1120198/102112 Commented Mar 5, 2011 at 12:37
  • I'm having a doubt... Are you removing spaces from the whole string (i.e. the one containing comma-separated emails), or from any single address one by one ? Commented Mar 5, 2011 at 13:18
  • stackoverflow.com/questions/3501721/… Commented Oct 26, 2016 at 8:52
  • enter link description here You can find best answer here. Visit to see the solution. Commented Oct 26, 2016 at 8:54

13 Answers 13

48

I would build a custom extension method using StringBuilder, like:

public static string ExceptChars(this string str, IEnumerable<char> toExclude) { StringBuilder sb = new StringBuilder(str.Length); for (int i = 0; i < str.Length; i++) { char c = str[i]; if (!toExclude.Contains(c)) sb.Append(c); } return sb.ToString(); } 

Usage:

var str = s.ExceptChars(new[] { ' ', '\t', '\n', '\r' }); 

or to be even faster:

var str = s.ExceptChars(new HashSet<char>(new[] { ' ', '\t', '\n', '\r' })); 

With the hashset version, a string of 11 millions of chars takes less than 700 ms (and I'm in debug mode)

EDIT :

Previous code is generic and allows to exclude any char, but if you want to remove just blanks in the fastest possible way you can use:

public static string ExceptBlanks(this string str) { StringBuilder sb = new StringBuilder(str.Length); for (int i = 0; i < str.Length; i++) { char c = str[i]; switch (c) { case '\r': case '\n': case '\t': case ' ': continue; default: sb.Append(c); break; } } return sb.ToString(); } 

EDIT 2 :

as correctly pointed out in the comments, the correct way to remove all the blanks is using char.IsWhiteSpace method :

public static string ExceptBlanks(this string str) { StringBuilder sb = new StringBuilder(str.Length); for (int i = 0; i < str.Length; i++) { char c = str[i]; if(!char.IsWhiteSpace(c)) sb.Append(c); } return sb.ToString(); } 
Sign up to request clarification or add additional context in comments.

11 Comments

you can create lightspeed hash for this solution: byte[] hash = new byte[255]; if you want to exclude \t you do b[(int)'\t'] = 1 and then check the same way. but it will work only for ascii :)
Yes, that would be really fast. Otherwise, if you just want to remove blanks from a string, you can directly use a switch in the function and skip IEnumerable.Contains :)
Hmmm...nice solution, better than mine :)
Even better would be to use StringBuilder sb = new StringBuilder(str.Length);
Another thing which can be considered is that if given string dose not contain any of those white space chars, then we are not required to append each char to a StringBuilder! So we can use a flag to test if the first white space char is found, then start to adding to a StringBuilder. otherwise we can just return the input string itself. This can improve performance specially when given strings usually does not contain search strings.
|
15

Given the implementation of string.Replaceis written in C++ and part of the CLR runtime I'm willing to bet

email.Replace(" ","").Replace("\t","").Replace("\n","").Replace("\r","");

will be the fastest implementation. If you need every type of whitespace, you can supply the hex value the of unicode equivalent.

3 Comments

Yes, that's really fast, but this creates 4 strings instead of 1. This slows down a bit in case of long strings, custom implementation using StringBuilder is faster than this.
@digEmAll It's an email address though, so not really memory intensive. I'd agree if it was a large 1k text file
As far as I understood it, it's a single string with a lot of emails comma separated... but to be honest, I'm not sure...
5

With linq you can do it simply:

emailaddress = new String(emailaddress .Where(x=>x!=' ' && x!='\r' && x!='\n') .ToArray()); 

I didn't compare it with stringbuilder approaches, but is much more faster than string based approaches. Because it does not create many copy of strings (string is immutable and using it directly causes to dramatically memory and speed problems), so it's not going to use very big memory and not going to slow down the speed (except one extra pass through the string at first).

7 Comments

I really doubt that it is fast
@Andrey: It should have linear running time, and construct the array only once. Common Regex problems involve non-linear running time, and common string replacement problems involve repeatedly copying the string. Why wouldn't this solution be fast, compared to a Regex w/ string replace? The only thing I can think of would be function call overhead. Without profiling both, it's speculation.
@digEmAll, yes, I'd fix it:) funny mistake.
@Andrey: yes, it's really fast. The only little problem is that it needs to pass through a throw-away array.
As mentioned elsewhere, x => !Char.IsWhiteSpace(x), is preferred. This linq command is my chosen solution for a similar problem. Thanks!
|
4

You should try String.Trim(). It will trim all spaces from start to end of a string

Or you can try this method from linked topic: [link]

 public static unsafe string StripTabsAndNewlines(string s) { int len = s.Length; char* newChars = stackalloc char[len]; char* currentChar = newChars; for (int i = 0; i < len; ++i) { char c = s[i]; switch (c) { case '\r': case '\n': case '\t': continue; default: *currentChar++ = c; break; } } return new string(newChars, 0, (int)(currentChar - newChars)); } 

3 Comments

Well, you should have very serious reasons for introducing unsafe code in safe code. Cleaning string is definitely not the one.
I think that 4-5 minutes to perform a simple action - is unacceptable. It can be much faster.
Theres no need to use pointers, also Char.IsWhiteSpace(c) instead of a switch is a better solution.
4
emailaddress.Replace(" ", string.Empty); 

Comments

2

There are many diffrent ways, some faster then others:

public static string StripTabsAndNewlines(this string str) { //string builder (fast) StringBuilder sb = new StringBuilder(str.Length); for (int i = 0; i < str.Length; i++) { if ( ! Char.IsWhiteSpace(s[i])) { sb.Append(); } } return sb.tostring(); //linq (faster ?) return new string(str.ToCharArray().Where(c => !Char.IsWhiteSpace(c)).ToArray()); //regex (slow) return Regex.Replace(str, @"\s+", "") } 

Comments

2

Please use the TrimEnd() method of the String class. You can find a great example here.

1 Comment

the Trim* methods off String class only trim off beginning or end
2

You should consider replacing spaces on the record-set within your stored procedure or query using the REPLACE( ) function if possible & even better fix your DB records since a space in an email address is invalid anyways.

As mentioned by others you would need to profile the different approaches. If you are using Regex you should minimally make it a class-level static variable:

public static Regex MultipleSpaces = new Regex(@"\s+", RegexOptions.Compiled); 

emailAddress.Where(x=>{ return x != ' ';}).ToString( ) is likely to have function overhead although it could be optimized to inline by Microsoft -- again profiling will give you the answer.

The most efficient method would be to allocate a buffer and copy character by character to a new buffer and skip the spaces as you do that. C# does support pointers so you could use unsafe code, allocate a raw buffer and use pointer arithmetic to copy just like in C and that is as fast as this can possibly be done. The REPLACE( ) in SQL will handle it like that for you.

Comments

1
string str = "Hi!! this is a bunch of text with spaces"; MessageBox.Show(new String(str.Where(c => c != ' ').ToArray())); 

Comments

1

I haven't done performance testing on this, but it's simpler than most of the other answers.

var s1 = "\tstring \r with \t\t \nwhitespace\r\n"; var s2 = string.Join("", s1.Split()); 

The result is

stringwithwhitespace 

Comments

0
string input =Yourinputstring; string[] strings = input.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries); foreach (string value in strings) { string newv= value.Trim(); if (newv.Length > 0) newline += value + "\r\n"; } 

Comments

0
string s = " Your Text "; string new = s.Replace(" ", string.empty); // Output: // "YourText" 

Comments

0

Fastest and general way to do this (line terminators, tabs will be processed as well). Regex powerful facilities don't really needed to solve this problem, but Regex can decrease performance.

new string (stringToRemoveWhiteSpaces .Where ( c => !char.IsWhiteSpace(c) ) .ToArray<char>() ) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.