19

I have a series of strings stored in a single array, separated by nulls (for example ['f', 'o', 'o', '\0', 'b', 'a', 'r', '\0'...]), and I need to split this into a std::vector<std::string> or similar.

I could just write a 10-line loop to do this using std::find or strlen (in fact I just did), but I'm wondering if there is a simpler/more elegant way to do it, for example some STL algorithm I've overlooked, which can be coaxed into doing this.

It is a fairly simple task, and it wouldn't surprise me if there's some clever STL trickery that can be applied to make it even simpler.

Any takers?

4
  • some of the answers found here (stackoverflow.com/questions/236129/how-to-split-a-string-in-c) can be applied to your problem, be sure to have a look. Commented Aug 30, 2011 at 13:17
  • This isn't really a duplicate since these strings are null-terminated, and all c-string algorithms apply (calling constructors on raw pointers in the buffer, using strlen, etc.) Commented Aug 30, 2011 at 13:18
  • Problem: how do you know when to stop? Or do you know the length of the array / have a sentinel (e.g. two consecutive zero chars)? Commented Aug 30, 2011 at 13:22
  • @Konrad: I know where the last string ends. You can assume that it is the end of the buffer, or two consecutive nulls. Both can be arranged trivially. :) Commented Aug 30, 2011 at 13:27

7 Answers 7

37

My two cents :

const char* p = str; std::vector<std::string> vector; do { vector.push_back(std::string(p)); p += vector.back().size() + 1; } while ( // whatever condition applies ); 
Sign up to request clarification or add additional context in comments.

2 Comments

That's really nice! Just the choice of vector as a variable name is, IMO, not so clever.
Since the list is double-NUL terminated, it can't contain empty strings (since that would look like the end of the list: ii.e. "abc\0\0def\0ghi\0"). So, you can just put the while at the top, to correctly deal with an empty list: while (*p) { ... }
9

Boost solution:

#include <boost/algorithm/string.hpp> std::vector<std::string> strs; //input_array must be a Range containing the input. boost::split( strs, input_array, boost::is_any_of(boost::as_array("\0"))); 

6 Comments

Yeah, that should work, but relies on Boost. For something as simple as this, I'd prefer a standard-library-only solution
In cases where a pistol would be enough, a bazooka would likely get you both killed.
@Xaade - A lot of projects are using boost already, so it is not a big deal. A solution with similar code complexity that just uses Standard C++ components would obviously be better. (As a side note, this was actually quite difficult to get right, due the requirement for boost::as_array.)
@Xaade: If a Leopard 2 tank is approaching you with an open hatch, a pistol would be enough, too. But it would require a lot of skill to do it right, the bazooka is more safe.
@phresnel Obviously a pistol isn't enough if the user isn't skilled. However, this is code, and if the code works every time, then no need for something bigger.
|
6

The following relies on std::string having an implicit constructor taking a const char*, making the loop a very simple two-liner:

#include <iostream> #include <string> #include <vector> template< std::size_t N > std::vector<std::string> split_buffer(const char (&buf)[N]) { std::vector<std::string> result; for(const char* p=buf; p!=buf+sizeof(buf); p+=result.back().size()+1) result.push_back(p); return result; } int main() { std::vector<std::string> test = split_buffer("wrgl\0brgl\0frgl\0srgl\0zrgl"); for (auto it = test.begin(); it != test.end(); ++it) std::cout << '"' << *it << "\"\n"; return 0; } 

This solution assumes the buffer's size is known and the criterion for the end of the list of strings. If the list is terminated by "\0\0" instead, the condition in the loop needs to be changed from p!=foo+sizeof(foo) to *p.

Comments

2

A more elegant and actual solution (compared to my other answer) uses getline and boils down to 2 lines with only C++2003, and no manual loop bookkeeping and conditioning is required:

#include <iostream> #include <sstream> #include <string> int main() { const char foo[] = "meh\0heh\0foo\0bar\0frob"; std::istringstream ss (std::string(foo, foo + sizeof foo)); std::string str; while (getline (ss, str, '\0')) std::cout << str << '\n'; } 

However, note how the range based string constructor already indicates an inherent problem with splitting-at-'\0's: You must know the exact size, or find some other char-combo for the Ultimate Terminator.

3 Comments

Yep, fortunately I do know the exact size. The data comes from a Microsoft API, so I'm stuck with the null-separated format. :)
I don't like the fact that this initializes and reads from a stream, but I like the simplicity of the loop. I wish we'd find a loop as simple as that that operates directly on the data, rather than copying into a stream.
True :) On the other hand, you could see the stream as a (slightly overblown) holder of the state we had to maintain manually.
2

Here's the solution I came up with myself, assuming the buffer ends immediately after the last string:

std::vector<std::string> split(const std::vector<char>& buf) { auto cur = buf.begin(); while (cur != buf.end()) { auto next = std::find(cur, buf.end(), '\0'); drives.push_back(std::string(cur, next)); cur = next + 1; } return drives; } 

Comments

1

A bad answer, actually, but I doubted your claim of a 10 line loop for manual splitting. 4 Lines do it for me:

#include <vector> #include <iostream> int main() { using std::vector; const char foo[] = "meh\0heh\0foo\0bar\0frob"; vector<vector<char> > strings(1); for (const char *it=foo, *end=foo+sizeof(foo); it!=end; ++it) { strings.back().push_back(*it); if (*it == '\0') strings.push_back(vector<char>()); } std::cout << "number of strings: " << strings.size() << '\n'; for (vector<vector<char> >::iterator it=strings.begin(), end=strings.end(); it!=end; ++it) std::cout << it->data() << '\n'; } 

Comments

-9

In C, string.h has this guy:

char * strtok ( char * str, const char * delimiters ); 

the example on cplusplus.com :

/* strtok example */ #include <stdio.h> #include <string.h> int main () { char str[] ="- This, a sample string."; char * pch; printf ("Splitting string \"%s\" into tokens:\n",str); pch = strtok (str," ,.-"); while (pch != NULL) { printf ("%s\n",pch); pch = strtok (NULL, " ,.-"); } return 0; } 

It's not C++, but it will work

3 Comments

strtok is really, really bad. Don't ever use it.
It will not work because the strings are NULL-separated. strtok only works on string without null-separation.
How can you tell strtok to split at \0's ?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.