2

I'm trying to get all words inside a string using Boost::regex in C++.

Here's my input :

"Hello there | network - bla bla hoho"

using this code :

 regex rgx("[a-z]+",boost::regex::perl|boost::regex::icase); regex_search(input, result, rgx); for(unsigned int j=0; j<result.size(); ++j) { cout << result[j] << endl; } 

I only get the first word "Hello".. whats wrong with my code ? result.size() returns 1.

thank you.

6 Answers 6

5

regex_search only finds the first match. To iterate over all matches, use regex_iterator

Sign up to request clarification or add additional context in comments.

Comments

1

Try rgx("(?:(\\w+)\\W+)+"); as your regex. (?: will start a non-marking group which is finished by the matching )+ which will match the words in the string 1 or more times (\\w+) will match alpha, digits and underscores 1 or more times as a marked group, i.e. typical word like characters which are returned to you in result[i] \\W+ will match one or more contiguous non-word characters, i.e. whitespace, |, - etc.

Comments

0

You're only searching for alphabetic characters, not spaces, pipes or hyphens. regex_search() probably just returns the first match.

Comments

0

Perhaps you could try using repeated captures with the following regex "(?:([a-z]+)\\b\\s*)+".

Comments

0

To match words, try this regex:

regex rgx("\\<[a-z]+\\>",boost::regex::perl|boost::regex::icase); 

According to the docs, \< denotes the start of a word and \> denotes the end of a word in the Perl variety of Boost regex matching.

I'm afraid someone else has to explain how to iterate the matches. The Boost documentation makes my brain hurt.

1 Comment

Agreed that the Boost.Regex documentation was fairly bad.
0

You would need to capture any set of [a-z]+ (or some other regex for matching "words") bound by spaces or string boundaries. You could try something like this:

^(\s*.+\s*)+$ 

In any event, this isn't really a boost::regex problem, it's just a regex problem. use perl or the bash shell (or any number of web tools) to get your regex figured out, then use in your code.

Comments