17

I have a string like this String str = "la$le\\$li$lo".

I want to split it to get the following output "la","le\\$li","lo". The \$ is a $ escaped so it should be left in the output.

But when I do str.split("[^\\\\]\\$") y get "l","le\\$l","lo".

From what I get my regex is matching a$ and i$ and removing then. Any idea of how to get my characters back?

Thanks

2
  • 3
    String str = "la$le\$li$lo"? do you mean String str = "la$le\\$li$lo" ?? Commented May 12, 2010 at 14:56
  • Can the escapes be escaped as well? If so, regex will not do (regex-es can't count!). Commented May 12, 2010 at 14:58

4 Answers 4

22

Use zero-width matching assertions:

 String str = "la$le\\$li$lo"; System.out.println(java.util.Arrays.toString( str.split("(?<!\\\\)\\$") )); // prints "[la, le\$li, lo]" 

The regex is essentially

(?<!\\)\$ 

It uses negative lookbehind to assert that there is not a preceding \.

See also


More examples of splitting on assertions

Simple sentence splitting, keeping punctuation marks:

 String str = "Really?Wow!This.Is.Awesome!"; System.out.println(java.util.Arrays.toString( str.split("(?<=[.!?])") )); // prints "[Really?, Wow!, This., Is., Awesome!]" 

Splitting a long string into fixed-length parts, using \G

 String str = "012345678901234567890"; System.out.println(java.util.Arrays.toString( str.split("(?<=\\G.{4})") )); // prints "[0123, 4567, 8901, 2345, 6789, 0]" 

Using a lookbehind/lookahead combo:

 String str = "HelloThereHowAreYou"; System.out.println(java.util.Arrays.toString( str.split("(?<=[a-z])(?=[A-Z])") )); // prints "[Hello, There, How, Are, You]" 

Related questions

Sign up to request clarification or add additional context in comments.

1 Comment

@Fenris: the difference is that if $ is the first character, my regex can still split on it, and yours can't, because it insists that there is a character preceding it (that is not a slash).
2

The reason a$ and i$ are getting removed is that the regexp [^\\]\$ matches any character that is not '\' followed by '$'. You need to use zero width assertions

This is the same problem people have trying to find q not followed by u.

A first cut at the proper regexp is /(?<!\\)\$/ ( "(?<!\\\\)\\$" in java )

class Test { public static void main(String[] args) { String regexp = "(?<!\\\\)\\$"; System.out.println( java.util.Arrays.toString( "1a$1e\\$li$lo".split(regexp) ) ); } } 

Yields:
[1a, 1e\$li, lo]

Comments

1

You can try first replacing "\$" with another string, such as the URL Encoding for $ ("%24"), and then splitting:

String splits[] = str.replace("\$","%24").split("[^\\\\]\\$"); for(String str : splits){ str = str.replace("%24","\$"); } 

More generally, if str is constructed by something like

str = a + "$" + b + "$" + c 

Then you can URLEncode a, b and c before appending them together

import java.net.URLEncoder.encode; ... str = encode(a) + "$" + encode(b) + "$" + encode(c) 

1 Comment

Good point. I updated my response for a more general solution that assumes that you are splitting str because it really consists of three strings you were appending together in the first place.
0
import java.util.regex.*; public class Test { public static void main(String... args) { String str = "la$le\\$li$lo"; Pattern p = Pattern.compile("(.+?)([^\\\\]\\$)"); Matcher m = p.matcher(str); while (m.find()) { System.out.println(m.group(1)); System.out.println(m.group(2)); } } } 

gives

l a$ le\$l i$ 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.