How can I capture multiple repeated groups?

Question

I need to capture multiple groups of the same pattern. Suppose, I have the following string:

HELLO,THERE,WORLD

And I've written the following pattern

^(?:([A-Z]+),?)+$

I want it to capture every single word, so that Group 1 is: "HELLO", Group 2 is "THERE" and Group 3 is "WORLD". My regex is actually capturing only the last one, which is "WORLD".

I'm testing my regular expression here, and I want to use it with Swift (maybe there's a way in Swift to get intermediate results somehow, so that I can use them?)

I don't want to use split. I just need to now how to capture all the groups that match the pattern, not only the last one.

Either match all the required patterns in one group (to first ensure a valid match) and then use split() to split them, or if the regex implementation allows it, supply a function and stash each matching group as a side effect (e.g. Javascript's String.replace()) — Jason S
– Jason S, Commented Aug 22, 2022 at 22:19

InSync · Accepted Answer · 2023-10-03 18:09:12Z

127

With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).

Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:

^([A-Z]+),([A-Z]+),([A-Z]+)$

edited Oct 3, 2023 at 18:09

InSync

12.2k5 gold badges22 silver badges60 bronze badges

answered May 3, 2016 at 12:31

Byte Commander

6,8467 gold badges48 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Chris Over a year ago

How would this be adjusted to account for a varying number of strings? e.g. HELLO,WORLD and HELLO,THERE,MY,WORLD. I'm looking for just one expression to handle both examples and with flexibility built in for even longer string arrays

Barmar Over a year ago

@Chris It can't be generalized. As the answer states, a capture group can only capture one thing, and there's no way to create a dynamic number of capture groups.

zdim Over a year ago

Re "How would this be adjusted to account for a varying number of strings?" -- For those who still come to this page -- build it dynamically using the tools of the language at hand. Take the subpattern (([A-Z]+) here) as a string or as a regex pattern (depending on the language) and join N of them (with commas in this case), and then turn that into a regex pattern or just use it in regex (again, depending on the language). It's usually fairly simple. (I assumed this answer to take that for granted, that one can build it dynamically.)

dumbledad Over a year ago

It's a real shame you only list the code for the Alternatively and not for the answer itself.

InSync · Accepted Answer · 2023-10-03 18:08:59Z

54

The key distinction is repeating a captured group instead of capturing a repeated group.

As you have already found out, the difference is that repeating a captured group captures only the last iteration. Capturing a repeated group captures all iterations.

In PCRE (PHP):

((?:\w+)+),?

Match 1, Group 1. 0-5 HELLO Match 2, Group 1. 6-11 THERE Match 3, Group 1. 12-20 BRUTALLY Match 4, Group 1. 21-26 CRUEL Match 5, Group 1. 27-32 WORLD

Since all captures are in Group 1, you only need $1 for substitution.

I used the following general form of this regular expression:

((?:{{RE}})+)

Example at regex101

edited Oct 3, 2023 at 18:08

InSync

12.2k5 gold badges22 silver badges60 bronze badges

answered Dec 11, 2020 at 2:27

ssent1

8797 silver badges4 bronze badges

9 Comments

Thomas LAURENT Over a year ago

"Capturing a repeated group captures all iterations." In your regex101 try to replace your regex with (\w+),? and it will give you the same result. The key here is the g flag which repeats your pattern to match into multiple groups.

Pierre Over a year ago

This is so wrong. "Capturing a repeated group captures all iterations": yes but it will capture ALL of them in only ONE match (containing them all). Your example should be ((?:\w,?)+) . You have multiple matches here only because of the g flag as @thomas-laurent stated. There is no way to have multiple matches from one capturing group. You have to extract and preg_match_all (or equivalent function) the repeating group.

Pierre Over a year ago

@ssent1 Your ((?:\w+)+),? is equivalent to (\w+),?. Your enclosing anonymous group is never repeated. This misleading, there is nothing like "capturing a repetated group [in multiple matches]". Unfortunately, nothing in regexp can match multiple times the same group. There is only the g flag and preg_match_all that executes the regexp iteratively on the remaining unmatched string.

ssent1 Over a year ago

@Pierre You're correct. And yet it seems like there is still a distinction to be made between [Repeating a Capturing Group vs. Capturing a Repeated Group])(regular-expressions.info/captureall.html). On a practical level, it could be part of a functional solution. Ultimately, if a 'bulletproof' solution is needed, it's probably better to do it programmatically.

Pierre Over a year ago

@ssent1 please fix your answer. Replace ((?:\w+)+),? with (\w+),? which is strictly equivalent, since the anonymous group is never repeated. Your general solution ((?:{{RE}})+) is absolutely wrong too obviously. THIS. DOES. NOT. WORK. AS. INTENDED. This is missleading. How can you say I'm correct and yet not fix your answer?

|

bad_coder · Accepted Answer · 2023-09-26 08:13:14Z

12

I think you need something like this:

b = "HELLO,THERE,WORLD" re.findall('[\w]+',b)

Which in Python 3 will return:

['HELLO', 'THERE', 'WORLD']

edited Sep 26, 2023 at 8:13

bad_coder

13.2k20 gold badges59 silver badges95 bronze badges

answered Dec 13, 2018 at 1:46

Tim Seed

5,2872 gold badges32 silver badges27 bronze badges

3 Comments

Jean-François Fabre Over a year ago

re.findall('\w+',b) is 2 characters shorter. No need for a character class since you have only one expression

pythonian29033 Over a year ago

question doesn't have a python tag though

Frozenfrank Over a year ago

As a matter of fact, the question is specifically tagged with #swift! (That's not python)

InSync · Accepted Answer · 2023-10-03 18:08:30Z

After reading Byte Commander's answer, I want to introduce a tiny possible improvement:

You can generate a regexp that will match either n words, as long as your n is predetermined. For instance, if I want to match between 1 and 3 words, the regexp:

^([A-Z]+)(?:,([A-Z]+))?(?:,([A-Z]+))?$

will match the next sentences, with one, two or three capturing groups.

HELLO,LITTLE,WORLD HELLO,WORLD HELLO

You can see a fully detailed explanation about this regular expression on Regex101.

As I said, it is pretty easy to generate this regexp for any groups you want using your favorite language. Since I'm not much of a swift guy, here's a ruby example:

def make_regexp(group_regexp, count: 3, delimiter: ",") regexp_str = "^(#{group_regexp})" (count - 1).times.each do regexp_str += "(?:#{delimiter}(#{group_regexp}))?" end regexp_str += "$" return regexp_str end puts make_regexp("[A-Z]+")

That being said, I'd suggest not using regular expression in that case, there are many other great tools from a simple split to some tokenization patterns depending on your needs. IMHO, a regular expression is not one of them. For instance in ruby I'd use something like str.split(",") or str.scan(/[A-Z]+/)

what about this: ([A-Z]+)((?:,([A-Z]+))?)+ ... I just checked it ^this one would be perfect for n captures
@pythonian29033 this doesn't capture every word as OP asked, it would capture the first and last group only. At least using PCRE

zdim · Accepted Answer · 2025-05-28 18:32:46Z

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, one can also remove spaces with [^,\s]+. Or, to capture words separated by any of the listed punctuation or spaces

[^,.!\s-]+

This extracts all words from hello-again, Mrs. Y!, without any punctuation or spaces. (The - itself must be given first or last in a character class, unless it's used in a range like a-z or 0-9.) Note that this pattern, with -, breaks up hyphenated words, what may not be desired. Further discussion would bring us to the difficulties and subtleties of the natural language parsing.

In Python

import re string = "HELLO,THERE,WORLD" pattern = r"([^,]+)" matches = re.findall(pattern,string) print(matches)

In Perl (and many other compatible systems)

use warnings; use strict; use feature 'say'; my $string = 'HELLO,THERE,WORLD'; my @matches = $string =~ /([^,]+)/g; say "@matches";

(In this specific example, the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

Just in case, I'd like to add: The specific text shown in the question is in the CSV (comma-separated-values) format. When all text is indeed like that one is far better off using a library for CSV parsing.

The approach above works as it stands for other patterns as well, including the one attempted in the question (as long as you remove the anchors which make it too specific). The most common one is to capture all words (usually meaning [a-zA-Z0-9_]), with the pattern \w+. Or, as in the question, get only the substrings of uppercase ASCII letters[A-Z]+.

Peter Mortensen · Accepted Answer · 2025-05-23 21:14:31Z

Just to provide additional example of paragraph 2 in the answer. I'm not sure how critical it is for you to get three groups in one match rather than three matches using one group. E.g., in Groovy:

def subject = "HELLO,THERE,WORLD" def pat = "([A-Z]+)" def m = (subject =~ pat) m.eachWithIndex{ g,i -> println "Match #$i: ${g[1]}" }

Match #0: HELLO Match #1: THERE Match #2: WORLD

Peter Mortensen · Accepted Answer · 2025-05-23 21:15:17Z

It happened to me today, and I solved it with the following approach:

^(([A-Z]+),)+([A-Z]+)$

So the first group (([A-Z]+),)+ will match all the repeated patterns except the final one ([A-Z]+) that will match the final one. And this will be dynamic, no matter how many repeated groups in the string.

This is not a solution to the problem. The question is not about matching the string, but about capturing all the groups. This regex still only captures the last match for the first, repeating group (with comma), plus the match in the final group (without comma).

Peter Mortensen · Accepted Answer · 2025-05-23 21:17:16Z

You actually have one capture group that will match multiple times. Not multiple capture groups.

JavaScript solution:

let string = "HI,THERE,TOM"; let myRegexp = /([A-Z]+),?/g; // Modify as you like let match = myRegexp.exec(string); // JavaScript function. The output is described below while (match != null) { // Loops through matches console.log(match[1]); // Do whatever you want with each match match = myRegexp.exec(string); // Find next match }

Syntax:

// Matched text: match[0] // Match start: match.index // Capturing group n: match[n]

As you can see, this will work for any number of matches.

Peter Mortensen · Accepted Answer · 2025-05-23 21:18:11Z

Sorry, not Swift, just a proof of concept in the closest language at hand (JavaScript).

// JavaScript POC. Output: // Matches: ["GOODBYE","CRUEL","WORLD","IM","LEAVING","U","TODAY"] let str = `GOODBYE,CRUEL,WORLD,IM,LEAVING,U,TODAY` let matches = []; function recurse(str, matches) { let regex = /^((,?([A-Z]+))+)$/gm let m while ((m = regex.exec(str)) !== null) { matches.unshift(m[3]) return str.replace(m[2], '') } return "bzzt!" } while ((str = recurse(str, matches)) != "bzzt!") ; console.log("Matches: ", JSON.stringify(matches))

Note: If you were really going to use this, you would use the position of the match as given by the regex match function, not a string replace.

Peter Mortensen · Accepted Answer · 2025-05-23 21:20:20Z

Design a regex that matches each particular element of the list rather then a list as a whole. Apply it with /g
Iterate through the matches, cleaning them from any garbage such as list separators that got mixed in. You may require another regex, or you can get by with simple replace substring method.

The sample code is in JavaScript, sorry :) The idea must be clear enough.

const string = 'HELLO,THERE,WORLD'; // First use following regex matches each of the list items separately: const captureListElement = /^[^,]+|,\w+/g; const matches = string.match(captureListElement); // Some of the matches may include the separator, so we have to clean them: const cleanMatches = matches.map(match => match.replace(',', '')); console.log(cleanMatches);

Peter Mortensen · Accepted Answer · 2025-05-23 21:21:11Z

Repeat the A-Z pattern in the group for the regular expression.

data = "HELLO,THERE,WORLD" pattern = r"([a-zA-Z]+)" matches = re.findall(pattern, data) print(matches)

Output

['HELLO', 'THERE', 'WORLD']

Peter Mortensen · Accepted Answer · 2025-05-23 21:27:59Z

I had the same problem. No answers to this question helped me. I created a regex that works for me.

(\w+)\W*(\w+)\W*(\w+)\W*(\w+)\W*(\w+)\W*(\w+)*\W*(\w+)* Example HELLO,THERE,BRUTALLY,CRUEL,WORLD $1=HELLO $2=THERE $3=BRUTALLY $4=CRUEL $5=WORLD ($6="") ($7="")

Peter Mortensen · Accepted Answer · 2025-05-23 21:31:25Z

The groups are created within the match, so to make each word a group you have to make that number of matches. I tried the below RegEx with RegexBuddy .NET flavour and I got the expected result.

However, with this approach, you will get multiple matches and within each match, Group 1 will hold the value of the captured word.

([A-Z]+),?

Collectives™ on Stack Overflow

How can I capture multiple repeated groups?

13 Answers 13

4 Comments

9 Comments

3 Comments

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

4 Comments

9 Comments

3 Comments

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Comments

Linked

Related