91

I would simply like to split a string into an array using a character as the split delimiter. (Much like the C#'s famous .Split() function. I can of course apply the brute-force approach but I wonder if there anything better than that.

So far the I've searched and probably the closest solution approach is the usage of strtok(), however due to it's inconvenience(converting your string to a char array etc.) I do not like using it. Is there any easier way to implement this?

Note: I wanted to emphasize this because people might ask "How come brute-force doesn't work". My brute-force solution was to create a loop, and use the substr() function inside. However since it requires the starting point and the length, it fails when I want to split a date. Because user might enter it as 7/12/2012 or 07/3/2011, where I can really tell the length before calculating the next location of '/' delimiter.

3
  • possible duplicate of Splitting String C++ Commented Apr 8, 2012 at 6:56
  • Does this answer your question? How do I iterate over the words of a string? Commented Jul 28, 2021 at 10:00
  • A bit late but: Don't focus much on your own way of parsing dates as it fails when you want to internationalize your product. Written 2024-12-03. Commented Dec 3, 2024 at 14:32

18 Answers 18

181

Using vectors, strings and stringstream. A tad cumbersome but it does the trick.

#include <string> #include <vector> #include <sstream> std::stringstream test("this_is_a_test_string"); std::string segment; std::vector<std::string> seglist; while(std::getline(test, segment, '_')) { seglist.push_back(segment); } 

Which results in a vector with the same contents as

std::vector<std::string> seglist{ "this", "is", "a", "test", "string" }; 
Sign up to request clarification or add additional context in comments.

5 Comments

Actually this kind of approach exactly what I'm looking for. Quite easy to understand, no usage of external libraries, just very straight-forward. Thanks @thelazydeveloper !
If you want to improve performace, you can add seglist.reserve(std::count_if(str.begin(), str.end(), [&](char c) { return c == splitChar; }) + (str.empty() ? 1 : 0)); If original string to split is stored in str.
Instead of while (std::getline(test, segment, '_')) it might be better to do while (!std::getline(test, segment, '_').eof()).
if I put "__", it would add a empty string in the vector. thats how I wanted. Thanks.
This does not play well with empty sections. Splitting the empty string like this results in an empty vector, i.e. "" -> {} instead of {""}. If the string ends with a delimiter, the last empty string won't be part of the result, i.e. "a_b_" -> {"a", "b"} instead of {"a", "b", ""}. Both of those things may or may not be what you want, but they are definitely unexpected.
30

Boost has the split() you are seeking in algorithm/string.hpp:

#include <boost/algorithm/string.hpp> std::string sample = "07/3/2011"; std::vector<std::string> strs; boost::split(strs, sample, boost::is_any_of("/")); 

Comments

18

Another way (C++11/boost) for people who like RegEx. Personally I'm a big fan of RegEx for this kind of data. IMO it's far more powerful than simply splitting strings using a delimiter since you can choose to be be a lot smarter about what constitutes "valid" data if you wish.

#include <string> #include <algorithm> // copy #include <iterator> // back_inserter #include <regex> // regex, sregex_token_iterator #include <vector> int main() { std::string str = "08/04/2012"; std::vector<std::string> tokens; std::regex re("\\d+"); //start/end points of tokens in str std::sregex_token_iterator begin(str.begin(), str.end(), re), end; std::copy(begin, end, std::back_inserter(tokens)); } 

7 Comments

So you're including the entirety of a regex matcher in your code just to split a string. Sad...
@Dev No, including a regex matcher to be more intelligent about what constitutes valid data - e.g. select numbers, and also allowing other separators like dots or hyphens
This is bad both in terms of binary size and overall efficiency, but since those both aren't concerns whatsoever in this case I won't go on.
@Dev If one has such extreme constraints over binary size, then they should reconsider even using C++ at all, or at least its standard libraries like string/vector/etc because they will all have a similar effect. As for efficiency, the best advice would be from Donald Knuth -- "Premature optimisation is the root of all evil"; in other words, before making optimisations, the first task is to identify whether a problem even exists, and then identify the cause by objective means such as profiling rather than wasting time trying to hunt down every possible micro-optimisation.
@Dev Then I have to wonder what the purpose is in even bringing them up.
|
11

Since nobody has posted this yet: The solution is very simple using ranges. You can use a std::ranges::views::split to break up the input, and then transform the input into std::string or std::string_view elements.

#include <ranges> ... // The input to transform const auto str = std::string{"Hello World"}; // Function to transform a range into a std::string // Replace this with 'std::string_view' to make it a view instead. auto to_string = [](auto&& r) -> std::string { const auto data = &*r.begin(); const auto size = static_cast<std::size_t>(std::ranges::distance(r)); return std::string{data, size}; }; const auto range = str | std::ranges::views::split(' ') | std::ranges::views::transform(to_string); for (auto&& token : str | range) { // each 'token' is the split string } 

This approach can realistically compose into just about anything, even a simple split function that returns a std::vector<std::string>:

auto split(const std::string& str, char delimiter) -> std::vector<std::string> { const auto range = str | std::ranges::views::split(delimiter) | std::ranges::views::transform(to_string); return {std::ranges::begin(range), std::ranges::end(range)}; } 

Live Example

4 Comments

1. Why do you use str | range instead of range? 2. Is transform with to_string necessary? It seems token can be declared as string_view so that transform is unnecessary. 3. split_view's begin and end functions are non-const, so it seems the program is ill-formed as the range for loop uses a const range.
Oh, for 2 I see, constructing a string_view from a range is a C++23 feature.
This is somewhat hard to read, not clear at all compared to the other answers
This is a UB if the range is empty: &*r.begin(). Better use std::string(r.data(), r.size()).
6

I inherently dislike stringstream, although I'm not sure why. Today, I wrote this function to allow splitting a std::string by any arbitrary character or string into a vector. I know this question is old, but I wanted to share an alternative way of splitting std::string.

This code omits the part of the string you split by from the results altogether, although it could be easily modified to include them.

#include <string> #include <vector> void split(std::string str, std::string splitBy, std::vector<std::string>& tokens) { /* Store the original string in the array, so we can loop the rest * of the algorithm. */ tokens.push_back(str); // Store the split index in a 'size_t' (unsigned integer) type. size_t splitAt; // Store the size of what we're splicing out. size_t splitLen = splitBy.size(); // Create a string for temporarily storing the fragment we're processing. std::string frag; // Loop infinitely - break is internal. while(true) { /* Store the last string in the vector, which is the only logical * candidate for processing. */ frag = tokens.back(); /* The index where the split is. */ splitAt = frag.find(splitBy); // If we didn't find a new split point... if(splitAt == std::string::npos) { // Break the loop and (implicitly) return. break; } /* Put everything from the left side of the split where the string * being processed used to be. */ tokens.back() = frag.substr(0, splitAt); /* Push everything from the right side of the split to the next empty * index in the vector. */ tokens.push_back(frag.substr(splitAt+splitLen, frag.size()-(splitAt+splitLen))); } } 

To use, just call like so...

std::string foo = "This is some string I want to split by spaces."; std::vector<std::string> results; split(foo, " ", results); 

You can now access all the results in the vector at will. Simple as that - no stringstream, no third-party libraries, no dropping back to C!

2 Comments

Do you have any argument for why this would be better?
I'm not a big fan of some thing in standard C++ as well (such as the hideously verbose streams but they're being replaced with fmtlib so I'm happy). But I tend to put those feelings aside when I can write much fewer lines of code - the chances for bugs is greatly reduced for a start.
5

Another possibility is to imbue a stream with a locale that uses a special ctype facet. A stream uses the ctype facet to determine what's "whitespace", which it treats as separators. With a ctype facet that classifies your separator character as whitespace, the reading can be pretty trivial. Here's one way to implement the facet:

struct field_reader: std::ctype<char> { field_reader(): std::ctype<char>(get_table()) {} static std::ctype_base::mask const* get_table() { static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask()); // we'll assume dates are either a/b/c or a-b-c: rc['/'] = std::ctype_base::space; rc['-'] = std::ctype_base::space; return &rc[0]; } }; 

We use that by using imbue to tell a stream to use a locale that includes it, then read the data from that stream:

std::istringstream in("07/3/2011"); in.imbue(std::locale(std::locale(), new field_reader); 

With that in place, the splitting becomes almost trivial -- just initialize a vector using a couple of istream_iterators to read the pieces from the string (that's embedded in the istringstream):

std::vector<std::string>((std::istream_iterator<std::string>(in), std::istream_iterator<std::string>()); 

Obviously this tends toward overkill if you only use it in one place. If you use it much, however, it can go a long ways toward keeping the rest of the code quite clean.

Comments

4

Take a look at boost::tokenizer

If you'd like to roll up your own method, you can use std::string::find() to determine the splitting points.

1 Comment

Thank you for the string find tip. Always love hearing std solutions!
3

This code works for me, easier to understand, using a vector and work of strings. In this method, we use the find() and substr() functions to split the string. The find() function searches for a delimiter and returns the position of the first occurrence. The substr() function extracts a substring from the input string based on the given start and end positions. We loop through the input string, finding each occurrence of the delimiter, and then extracting the substring from the start of the input string up to the delimiter. This substring is then pushed back into a vector of strings. Finally, we print out each token from the vector.

#include <iostream> #include <vector> #include <string> using namespace std; vector<string> split(string input, string delimiter){ vector<string> tokens; size_t pos = 0; string token; while((pos = input.find(delimiter)) != string::npos){ token = input.substr(0, pos); tokens.push_back(token); input.erase(0, pos + 1); } tokens.push_back(input); return tokens; } 

2 Comments

I like this simplicity of this solution. However, it has a bug. To determine how many characters to erase from string input, it takes pos + 1. Why the plus one? Because it also has to erase the found delimiter. And here's the rub: the delimiter is not necessarily one character long. To fix this, increment pos with delimiter.size().
With just one more size_t you can avoid costly erase operations and also avoid copying input string altogether. You can just jump around the input string and take parts of the input to put into vector. I'll try posting modified version if thread is still open
1

One solution I have been using quite a while is a split that can be used with vectors and lists alike

#include <vector> #include <string> #include <list> template< template<typename,typename> class Container, typename Separator > Container<std::string,std::allocator<std::string> > split( const std::string& line, Separator sep ) { std::size_t pos = 0; std::size_t next = 0; Container<std::string,std::allocator<std::string> > fields; while ( next != std::string::npos ) { next = line.find_first_of( sep, pos ); std::string field = next == std::string::npos ? line.substr(pos) : line.substr(pos,next-pos); fields.push_back( field ); pos = next + 1; } return fields; } int main() { auto res1 = split<std::vector>( "abc,def", ",:" ); auto res2 = split<std::list>( "abc,def", ',' ); } 

Comments

1
 std::string str = "some_string_from_my_head"; std::vector<std::string> Lines; std::string line = ""; for (int i = 0; i < str.length(); i++) { if (str[i] == '_') { Lines.push_back(line); line = ""; continue; } line += str[i]; } 

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
1

Wasn't really satisfied with how much logically unnecessary copying some other answers involved, and locale-aware stringstreams seem like a bit of an overkill for what should be a pretty quick task, so here is an implementation using string views.

We can just scan through the string view once, while keeping track of the last segment between two delimiters, no unnecessary work is done, the only place where we do a copy is during emplacement into the vector where it's unavoidable.

[[nodiscard]] std::vector<std::string> split_by_delimiter(std::string_view str, std::string_view delimiter) { if (delimiter.empty()) return {std::string(str)}; // handle empty delimiters explicitly so we can't fall into an infinite loop std::vector<std::string> tokens; std::size_t cursor = 0; std::size_t segment_start = cursor; while ((cursor = str.find(delimiter, cursor)) != std::string_view::npos) { if (segment_start != cursor) tokens.emplace_back(str.substr(segment_start, cursor - segment_start)); // don't emplace empty tokens in case of leading/trailing/repeated delimiters cursor += delimiter.size(); segment_start = cursor; } if (segment_start != str.size()) tokens.emplace_back(str.substr(segment_start)); // 'cursor' is now at 'npos', so we compare to the size instead return tokens; } 

Requires C++17, should handle all the nasty cases like leading/trailing/repeating delimiters, empty string/delimiters and etc.

Comments

0

Is there a reason you don't want to convert a string to a character array (char*) ? It's rather easy to call .c_str(). You can also use a loop and the .find() function.

string class
string .find()
string .c_str()

Comments

0

For those who don't have (want, need) C++20 this C++11 solution might be an option.

It is templated on an output iterator so you can supply your own destination where the split items should be appended to and provides a choice of how to handle multiple consecutive separation characters.

Yes it uses std::regex but well, if you're already in C++11 happy land why not use it.

//////////////////////////////////////////////////////////////////////////// // // Split string "s" into substrings delimited by the character "sep" // skip_empty indicates what to do with multiple consecutive separation // characters: // // Given s="aap,,noot,,,mies" // sep=',' // // then output gets the following written into it: // skip_empty=true => "aap" "noot" "mies" // skip_empty=false => "aap" "" "noot" "" "" "mies" // //////////////////////////////////////////////////////////////////////////// template <typename OutputIterator> void string_split(std::string const& s, char sep, OutputIterator output, bool skip_empty=true) { std::regex rxSplit( std::string("\\")+sep+(skip_empty ? "+" : "") ); std::copy(std::sregex_token_iterator(std::begin(s), std::end(s), rxSplit, -1), std::sregex_token_iterator(), output); } 

Comments

0

I know this solution is not rational, but it is effective. This method is provided here in order to be a variant of the solution of the current problem.

#include <iostream> #include <vector> #include <string> using namespace std; const int maximumSize=40; vector<int> visited(maximumSize, 0); string word; void showContentVectorString(vector<string>& input) { for(int i=0; i<input.size(); ++i) { cout<<input[i]<<", "; } return; } void dfs(int current, int previous, string& input, vector<string>& output, char symbol) { if(visited[current]==1) { return; } visited[current]=1; string stringSymbol; stringSymbol.push_back(symbol); if(input[current]!=stringSymbol[0]) { word.push_back(input[current]); } else { output.push_back(word); word.clear(); } if(current==(input.size()-1)) { output.push_back(word); word.clear(); } for(int next=(current+1); next<input.size(); ++next) { if(next==previous) { continue; } dfs(next, current, input, output, symbol); } return; } void solve() { string testString="this_is_a_test_string"; vector<string> vectorOfStrings; dfs(0, -1, testString, vectorOfStrings, '_'); cout<<"vectorOfStrings <- "; showContentVectorString(vectorOfStrings); return; } int main() { solve(); return 0; } 

Here is the result:

vectorOfStrings <- this, is, a, test, string, 

Comments

0

This is another way to split a string in C++, in this case working with a wstring and only using the find and substr functions.

#include <iostream> #include <vector> std::vector<std::wstring> SplitWstring(const std::wstring& text, const std::wstring& subText) { std::vector<std::wstring> result; size_t left = 0; size_t right = text.find(subText); size_t textSize = text.size(); size_t subTextSize = subText.size(); while (right != std::wstring::npos) { if (right > left) { size_t size = right - left; result.push_back(text.substr(left, size)); left = right + subTextSize; } else left += subTextSize; right = text.find(subText, left); } if (left < textSize) result.push_back(text.substr(left)); return result; } int main() { //std::wstring text = L""; // Result: {} //std::wstring text = L"-"; // Result: {"-"} //std::wstring text = L"ONE"; // Result: {"ONE"} //std::wstring text = L"ONE---TWO---THREE"; // Result: {"ONE", "TWO", "THREE"} std::wstring text = L"---ONE---TWO---THREE---"; // Result: {"ONE", "TWO", "THREE"} std::wstring subText = L"---"; std::vector<std::wstring> splitted = SplitWstring(text, subText); if (splitted.size() > 0) return 1; return 0; } 

Comments

0
vector<string> split(string s, string delimiter) { vector<string> vec; if (s.length() == 0) { return vec; } string tmp = s; while (tmp.length() > 0) { int pos = tmp.find(delimiter); if (pos == tmp.npos) { vec.push_back(tmp); break; } string out = tmp.substr(0, pos); vec.push_back(out); tmp = tmp.substr(pos + delimiter.length()); } return vec; }; 

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
0

A quite elegant and solution posted by @Andfernan with small optimization. I traded costly input string copy and (what I though was costly erase*) for one more size_t and some position calculations

std::vector<std::string> split (std::string const& input, char const delimiter) { std::vector<std::string> tokens; size_t prev_pos = 0; size_t pos = 0; std::string token; while ((pos = input.find(delimiter, prev_pos)) != std::string::npos){ // substring in cpp takes position and length as parameters // so we need to calculate lenght from positions token = input.substr(prev_pos, pos - prev_pos); tokens.push_back(token); prev_pos = pos+1; } // last token after last delimiter tokens.push_back(input.substr(prev_pos, std::string::npos)); return tokens; } 

*I now realize that erase just moves internal pointer in a string so there is little cost to it. Still, this version does not alter the input string and allows for parameters to be const so there is a benefit to it.

Comments

-2

What about erase() function? If you know exakt position in string where to split, then you can "extract" fields in string with erase().

std::string date("01/02/2019"); std::string day(date); std::string month(date); std::string year(date); day.erase(2, string::npos); // "01" month.erase(0, 3).erase(2); // "02" year.erase(0,6); // "2019" 

1 Comment

This worked in my case. I just wanted to extract characters after certain number of positions every time and this did the job!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.