2

In Java:

  • If I print "123\u202e987\u202c456abc" then the result is 123‮987‬456abc

  • If I print "123\u202e987\u202cxyzabc" then the result is 123‮987‬xyzabc

You see that when "456" is changed to "xyz" inthe string to be printed the output sequences are different.

How does this work?

1
  • in java, if I print "123\u202e987\u202c456abc" then result is 123‮987‬456abc. if I print "123\u202e987\u202cxyzabc" then result is 123‮987‬xyzabc. how it works, you see the "456" is changed to "xyz", then the sequence be different Commented Apr 29, 2022 at 1:32

2 Answers 2

5

TLDR: The effect you are seeing arises because digits and alphabetic characters are treated differently by the Unicode algorithm that determines the rendering of text containing format control characters.

For the texts you are displaying:

  • \u202e is the RIGHT-TO-LEFT OVERRIDE (RLO) character.
  • \u202c is the POP DIRECTIONAL FORMATTING (PDF) character.
  • Both are formatting control characters in Unicode, and their sole effect is to impact the appearance of output text.
  • In your examples the RLO character specifies that the text which follows is to be displayed from right to left (RLO), and PDF character cancels ("pops") the effect of the RLO.

That explains why the text 123\u202e987\u202cxyzabc in your example is rendered as 123‮987‬xyzabc. The RLO (\u202e) causes the text that follows to be rendered in right to left order (so 987 is displayed as 789), and the PDF (\u202c) terminates reversal for the subsequent text.

But it does not explain why 123\u202e987\u202c456abc is rendered as 123456789abc. By that argument, the expected output should be 123789456abc instead.

The algorithm used to determine the output in scenarios like this is very complex, but one factor is the directionality of the characters being rendered. Alphabetic characters have strong directionality, but numbers (i.e. digit characters) have weak directionality. For full details see the Unicode document Unicode® Standard Annex #9 UNICODE BIDIRECTIONAL ALGORITHM, and especially section 3.3.4 Resolving Weak Types

That document provides an example similar to yours, with text containing a RIGHT-TO-LEFT EMBEDDING (RLE) character (rather than an RLO), later followed by a PDF and some trailing text containing digits:

Memory: it is called "[RLE]AN INTRODUCTION TO java[PDF]" - $19.95 in hardcover.

Display: it is called "$19.95 - "java OT NOITCUDORTNI NA in hardcover.

Note that in their example it wasn't just the digits that were moved. The dollar sign and the period were as well, because all six of the characters in the text $19.95 have weak directionality.

Notes:

Sign up to request clarification or add additional context in comments.

2 Comments

thank you, I see the document you listed, I think the problem is pretty complicated
@jackma Yes, your question is deceptively simple, and the answer really is complicated. It's impossible to summarize the rules concisely, which is why it took a 49 page document for Unicode to specify them! I found the best way to get insight on how it all works is just to play with different strings and see the results. Try "123\u202e987\u202c$4?5 6abc" as a variation of your examples. That's complicated enough, but things get even worse when mixing characters from left to right languages with characters from right to left languages.
0

The Unicode is doing that. Because both depend on the text after them and edits them in a way.

  • \u202e reverses text (RIGHT-TO-LEFT override)
  • \u202c: POP DIRECTIONAL FORMATTING

In your question, 123\u202e987\u202cxyzabc outputs 123‮987‬xyzabc. \u202e causes the 987 to be outputted (reversed) as 789. And \u202c stops the RIGHT-TO-LEFT override.

In the second case, after the \u202c are some digits, which have weak directionality. So, the unicode causes only the digits to be directed to before the \u202e.

EDIT: @skomisa's answer is better.

2 Comments

Your answer doesn't really explain the effect that the OP is seeing.
I konw that, but the question is why two text have different behavier when \u202e and \u202c both at the same positions

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.