18
$\begingroup$

I have a list of strings called mylist:

mylist = {"[a]", "a", "a", "[b]", "b", "b", "[ c ]", "c", "c"}; 

I would like to split mylist by "section headers." Strings that begin with the character [ are section headers in my application. Thus, I would like to split mylist in such a way as to obtain this output:

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}} 

(The as, bs, and cs represent any characters; the string inside the section header does not necessarily match the strings that follow in that section. Also, the number of strings in each section can vary.

I have tried:

SplitBy[mylist, StringMatchQ[#, "[" ~~ ___] &] 

But this is not correct; I obtain:

{{"[a]"}, {"a", "a"}, {"[b]"}, {"b", "b"}, {"[ c ]"}, {"c", "c"}} 

Likewise, using Split (since it applies the test function only to adjacent elements) does not work. The command:

Split[mylist, StringMatchQ[#, "[" ~~ ___] &] 

yields:

{{"[a]", "a"}, {"a"}, {"[b]", "b"}, {"b"}, {"[ c ]", "c"}, {"c"}} 

Do you have any advice? Thanks.

$\endgroup$
1

9 Answers 9

19
$\begingroup$

Here's my suggestion:

mylist = {"[a]", "a", "a", "[b]", "b", "b", "[ c ]", "c", "c"}; Split[mylist, ! StringMatchQ[#2, "[*"] &] 

and we get:

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}} 
$\endgroup$
1
  • $\begingroup$ Looks like the right way to me. +1 $\endgroup$ Commented Oct 12, 2012 at 20:21
9
$\begingroup$

At the risk of being annoying, I will pitch the linked lists again. Here is the code using linked lists:

ClearAll[split]; split[{}] = {}; split[l_List] := Reap[split[{}, Fold[{#2, #1} &, {}, Reverse@l]]][[2, 1]]; split[accum_, {h_, tail : {_?sectionQ, _} | {}}] := split[Sow[Flatten[{accum, h}]]; {}, tail]; split[accum_, {h_, tail_}] := split[{accum, h}, tail]; 

The function sectionQ has been stolen from the answer of @rm-rf. The usage is

split[mylist] (* {{[a],a,a},{[b],b,b},{[ c ],c,c}} *) 

The advantages I see in using linked lists is that they allow one to produce solutions which are

  • Easily generalizable to more complex problems
  • Straightforward to implement
  • Easy to argue about (in terms of algorithmic complexity etc)

They may not be the fastest though, so may not always be suitable for performance-critical applications.

$\endgroup$
8
$\begingroup$

Here's one method, using a slightly modified example:

mylist = {"[a]", "a", "[b]", "b", "b", "b", "[ c ]", "c", "c"}; pos = Append[Flatten[Position[mylist, s_String /; StringMatchQ[s, "[" ~~ ___]]], Length[mylist] + 1] {1, 3, 7, 10} Take[mylist, {#1, #2 - 1}] & @@@ Partition[pos, 2, 1] {{"[a]", "a"}, {"[b]", "b", "b", "b"}, {"[ c ]", "c", "c"}} 
$\endgroup$
7
$\begingroup$

Here's one approach using FixedPoint and Replace:

sectionQ := ! StringFreeQ[#, "["] &; FixedPoint[ Replace[#, {h___, sec_?sectionQ, Longest[x___?(! sectionQ@# &)], t___} :> {h, t, {sec, x}}] &, mylist] (* {{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}} *) 
$\endgroup$
1
  • 1
    $\begingroup$ One can also define sectionQ = StringMatchQ[#, "[" ~~ ___] &, as in the question to allow for [ occurring somewhere (except the first position) in the strings following the section headers. $\endgroup$ Commented Oct 12, 2012 at 18:55
4
$\begingroup$

Here's an answer based on the solution of Murta that parses recursively a list based on different delimiters that can be patterns or string patterns. This can be useful for example to parse a debug output where loops are involved.

splitByPattern[l_List,p_?System`Dump`validStringExpressionQ]:=splitByPattern[l, _String?(StringMatchQ[#, p] &)]; splitByPattern[l_List,p_]:=Split[l,!MatchQ[#2,p]&]; splitByPatternFold[l_,{},True|False]:=l; splitByPatternFold[l_,{p_},False]:=splitByPattern[l,p]; splitByPatternFold[l_,{p_},True]:=Join[{First@l},splitByPattern[Rest@l,p]]; splitByPatternFold[l_,{p_,rest__},False]:=splitByPatternFold[#,{rest},True]&/@splitByPattern[l,p]; splitByPatternFold[l_,{p_,rest__},True]:=Join[{First@l},splitByPatternFold[#,{rest},True]&/@splitByPattern[Rest@l,p]]; splitByPatternFold[l_List,patterns_List,hasHeader_:False]:=splitByPatternFold[l,patterns,hasHeader]; 

To access the split elements you can use this function

splitAccess[l_, indices_] := Module[{offsets = Table[1, {Length@indices}]}, offsets[[1]] = 0; l[[Sequence @@ (indices + offsets)]] ] 

Example

l={a, b, c, d, e, f, a, b, c, d, e, f}; x = splitByPatternFold[l,{a,b,c,d,e}] > {{a,{b,{c,{d,{e,f}}}}},{a,{b,{c,{d,{e,f}}}}}} splitAccess[x,{2,1}] > {b, {c, {d, {e, f}}}} 

The answer to the question would be written as

mylist={"[a]",a,"a","[b]",b,"b","[ c ]",c,"c"}; splitByPattern[mylist,"[*"] 

Note that all elements don't need to be strings when giving a string pattern as argument.

$\endgroup$
2
$\begingroup$

Here's my version based on Position.

mylist = {"[a]", "a", "a", "[b]", "b", "b", "[ c ]", "c", "c"}; split[lst_List, pat_String] := Module[{len, pos}, len = Length[lst]; pos = Partition[Flatten[{Position[lst, _String?(StringMatchQ[#, pat ~~ __] &)],len + 1}], 2, 1]; lst[[#[[1]] ;; #[[2]] - 1]] & /@ pos] 

usage

split[mylist, "["] 

Out

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

$\endgroup$
1
$\begingroup$
Split[mylist, StringFreeQ["["] @ #2 &] 

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

SequenceCases[mylist, a:{_?(!StringFreeQ[ "["]@#&),__?(StringFreeQ[ "["])}:> {a}] 

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

$\endgroup$
1
  • $\begingroup$ +1 - I think you should remove the curly brackets around ... :>{a} $\endgroup$ Commented Apr 18, 2024 at 15:14
1
$\begingroup$
Clear["Global`*"]; mylist = {"[a]", "a", "a", "[b]", "b", "b", "[ c ]", "c", "c"}; patt = StartOfString ~~ Whitespace ... ~~ "[" ~~ Whitespace ... ~~ _ ~~ Whitespace ... ~~ "]" ~~ Whitespace ... ~~ EndOfString; 

Testing:

StringMatchQ[patt] /@ mylist 

{True, False, False, True, False, False, True, False, False}

Finally:

Split[mylist, StringMatchQ[#1, patt] || StringFreeQ[#2, patt] &] 

This solution is tolerant of any combination of (additionally) inserted whitespace around "[" and "]".


Result

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

$\endgroup$
1
$\begingroup$

A variant of kglr's answer using SequenceSplit (new in 11.3) and StringStartsQ (new in 10.1)

SequenceSplit[list, x : {_?(StringStartsQ["["]), __?(StringFreeQ["["])} :> x] 

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.