6
$\begingroup$

I am not able to figure the StringPattern to use to remove markers in string.

This is the input.

lst = {{1, 2, "this is a test", 4}, {Pi, 20, xy, 10}}; buf = ToString@TeXForm@lst 

which gives

\left( \begin{array}{cccc} 1 & 2 & \text{this is a test} & 4 \\ \pi & 20 & \text{xy} & 10 \\ \end{array} \right) 

I need to remove all the places where this pattern shows up \text{.....} and replace it with just what is inside ..... i.e. strip out the \text{ and the closing } on the other side. For each such instance in the input.

So the above should become

\left( \begin{array}{cccc} 1 & 2 & this is a test & 4 \\ \pi & 20 & xy & 10 \\ \end{array} \right) 

I tried many things. Tried also using RegularExpression.

One attempt:

StringReplace[buf, "\\text{" ~~ x___ ~~ "}" .. :> x] 

But this has a problem. It does not stop at the first closing }, but goes all the way to the ending } in the string, ending up with

\left( \begin{array}{cccc} 1 & 2 & this is a test} & 4 \\ \pi & 20 & \text{xy} & 10 \\ \end{array \right) 

Notice, it went all the way to the end, and removed the } after {array.

I did not know how to tell it to stop at the first } it sees after it sees \text{. And that is what I am struggling with. I know I wrote x__ but I needed to do this, so I can pick out the x.

Any idea how to do this? Either using StringPattern or ReqgularExpression will work.

$\endgroup$
1

2 Answers 2

9
$\begingroup$

With both StringPattern and RegularExpression the problem is greediness: wildcards will try to match as much as possible. With StringPattern this can be fixed using Shortest:

StringReplace[buf, "\\text{" ~~ Shortest[x___] ~~ "}" :> x] 

With a regular expression a quantifier can be made ungreed with ? (e.g. {(.*?)}), but when you're going that way, you can actually write a safer regular expression using a negated character class:

StringReplace[buf, RegularExpression["\\\\text{(.*?)}"] :> "$1"] 

Which gives the same result.

Both of these have one issue though: they're not entirely safe. When your actual string contains }, then they will stop at that. Consider:

lst = {"abc", "x}y", "123"}; buf = ToString@TeXForm@lst 

This gives:

\{\text{abc},\text{x$\}$y},123\} 

And using either solution will turn it into:

\{abc,x$\$y},123\} 

I think to fix this, only a regular expression approach is viable, which knows exactly what characters (or combinations) are allowed within the {...}:

StringReplace[buf, RegularExpression["\\\\text{((?:\\\\.|[^\\\\}])*)}"] :> "$1"] 

Which gives

\{abc,x$\}$y,123\} 

as expected.

$\endgroup$
3
  • $\begingroup$ Thanks. It is the Shortest which I did not know to use. $\endgroup$ Commented Apr 6, 2016 at 10:25
  • 1
    $\begingroup$ +1, but all those escapes in the penultimate snippet are making me dizzy. :) $\endgroup$ Commented Apr 6, 2016 at 10:29
  • 2
    $\begingroup$ @J.M. hm yeah, it's definitely annoying me that there is no sort of verbatim string syntax in Mathematica whenever working with regex. :/ $\endgroup$ Commented Apr 6, 2016 at 10:53
0
$\begingroup$

A form without RegularExpression that I believe works on Martin's second example:

lst = {"abc", "x}y", "123"}; buf = ToString @ TeXForm @ lst; StringReplace[buf, "\\text{" ~~ Shortest[a___] ~~ b : Except["\\"] ~~ "}" :> a <> b] 
"\\{abc,x$\\}$y,123\\}" 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.