5

I'm looking for an elegant way to transform an std::string from something like:

std::string text = " a\t very \t ugly \t\t\t\t string "; 

To:

std::string text = "a very ugly string"; 

I've already trimmed the external whitespace with boost::trim(text);

[edit] Thus, multiple whitespaces, and tabs, are reduced to just one space [/edit]

Removing the external whitespace is trivial. But is there an elegant way of removing the internal whitespace that doesn't involve manual iteration and comparison of previous and next characters? Perhaps something in boost I have missed?

3
  • Just a note, I've not really used boost::split and boost::join, but the obvious way to write this in Python is ' '.join(text.split()), and something similar should be possible. It's not necessarily as efficient as something that copies the bytes straight to their final location, but it's concise and clear. Commented Feb 19, 2012 at 19:01
  • Yeah; split and join work great if you don't mind copying; if you are worried about efficiency (in this case), writing your own loop is probably best. Commented Feb 19, 2012 at 19:29
  • @Marshall: I'm working on the basis that the question says, "elegant", not "fast but ugly" ;-) Commented Feb 20, 2012 at 9:50

6 Answers 6

8

You can use std::unique with std::remove along with ::isspace to compress multiple whitespace characters into single spaces:

std::remove(std::unique(std::begin(text), std::end(text), [](char c, char c2) { return ::isspace(c) && ::isspace(c2); }), std::end(text)); 
Sign up to request clarification or add additional context in comments.

5 Comments

It will not solve his problem. test also contains '\t' which is not equal to ' '.
Won't this also do things like "letting" -> "leting" and skip over ` \t` pairs?
Whoops fixed it again, previously it wouldn't combine, for instance, a space and a tab next to each other, but now it does.
Doesn't this result in "a\tvery ugly string" for the sample input, which is wrong? You could add a pass of transform (or maybe a boost::transform_iterator?) to replace all whitespace with space characters, but sometimes it's OK to give up and write a loop ;-)
Why std::remove? You need std::replace_if after std::unique to replace \t characters with ' ' and it still wouldn't remove the leading and trailing whitespaces. This answer doesn't do what the OP asked.
7
std::istringstream iss(text); text = ""; std::string s; while(iss >> s){ if ( text != "" ) text += " " + s; else text = s; } //use text, extra whitespaces are removed from it 

6 Comments

Ah, interesting way of doing it, +1, though I've no idea which is more efficient between yours and mine (or that it matters for small strings or "cold" areas of code)
I think, in the else-block text.append(" " + s); would be little bit faster.
That wouldn't do the same thing would it? (Right now it overwrites what was there before with operator= but append would be like changing it to +=; I think it might be a typo in the original code)
@SethCarnegie: But that is what we want. Sorry, it was supposed to be +=, rather than +. I don't know why people voted it when it was not entirely correct :P
Also a pedantic note, it'd probably be better to do if (!text.empty()) than if (text != "")
|
5
#include <boost/algorithm/string/trim_all.hpp> string s; boost::algorithm::trim_all(s); 

Comments

4

Most of what I'd do is similar to what @Nawaz already posted -- read strings from an istringstream to get the data without whitespace, and then insert a single space between each of those strings. However, I'd use an infix_ostream_iterator from a previous answer to get (IMO) slightly cleaner/clearer code.

std::istringstream buffer(input); std::copy(std::istream_iterator<std::string>(buffer), std::istream_iterator<std::string>(), infix_ostream_iterator<std::string>(result, " ")); 

Comments

1

If you check out https://svn.boost.org/trac/boost/ticket/1808, you'll see a request for (almost) this exact functionality, and a suggested implementation:

std::string trim_all ( const std::string &str ) { return boost::algorithm::find_format_all_copy( boost::trim_copy(str), boost::algorithm::token_finder (boost::is_space(),boost::algorithm::token_compress_on), boost::algorithm::const_formatter(" ")); } 

1 Comment

Tried adding a code block but no luck.. adding an answer, but this is the right track I think.
0

Here is a possible version using regular expressions. My GCC 4.6 doesn't have regex_replace yet, but Boost.Regex can serve as a drop-in replacement:

#include <string> #include <iostream> // #include <regex> #include <boost/regex.hpp> #include <boost/algorithm/string/trim.hpp> int main() { using namespace std; using namespace boost; string text = " a\t very \t ugly \t\t\t\t string "; trim(text); regex pattern{"[[:space:]]+", regex_constants::egrep}; string result = regex_replace(text, pattern, " "); cout << result << endl; } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.