0

i am working on a java’s bytecode project, and need to identify ternary operator and nested ternary operator.

I got two questions

  1. how to determine an if statement is ternary operator?based on stack variables?
  2. how to determine an if statement’ s consumed value on stack oprand is a ternary operator’s result

example:

((a>b ? 0 : 1 ) > (a > c ? 10 : 20) ? 100 : 101

here is a full example of ternary operator chain:

source code:

public void ddddd() { int m,k, n, z; Random r = new Random(); boolean p = (r.nextInt() > 100 || (30 > r.nextInt() ? (r.nextInt() > 4000 ? 1 : 0) : (r.nextInt() > 2000 ? 100 : 20)) < (r.nextInt() > 1000 ? 0 : 1)); } 

structed bytecode:

[

 7 astore] java/util/Random var1(1) = <init>(v0) [ 9 aload] java/util/Random var1(1) = (stack_var)var1 [ 11 invokevirtual] I v11 = var1.nextInt() [ 14 bipush] (I)v14 = 100 if { // block_id: 16 16 => 89 parent_id: 0 [ 16 if_icmpgt] v11 > v14 : goto => 85 if_false_block { // block_id: 15 19 => 89 parent_id: 16 [ 19 bipush] (I)v19 = 30 [ 21 aload] java/util/Random var1(1) = (stack_var)var1 [ 23 invokevirtual] I v23 = var1.nextInt() if { // block_id: 9 26 => 64 parent_id: 15 [ 26 if_icmple] v19 <= v23 : goto => 48 if_false_block { // block_id: 8 29 => 45 parent_id: 9 [ 29 aload] java/util/Random var1(1) = (stack_var)var1 [ 31 invokevirtual] I v31 = var1.nextInt() [ 34 sipush] (I)v34 = 4000 if { // block_id: 6 37 => 45 parent_id: 8 [ 37 if_icmple] v31 <= v34 : goto => 44 if_false_block { // block_id: 5 40 => 41 parent_id: 6 [ 40 iconst_1] (I)v40 = 1 [ 41 goto] goto: 66 } if_true_block { // block_id: 4 44 => 45 parent_id: 6 [ 44 iconst_0] (I)v44 = 0 [ 45 goto] goto: 66 } } } if_true_block { // block_id: 7 48 => 64 parent_id: 9 [ 48 aload] java/util/Random var1(1) = (stack_var)var1 [ 50 invokevirtual] I v50 = var1.nextInt() [ 53 sipush] (I)v53 = 2000 if { // block_id: 3 56 => 64 parent_id: 7 [ 56 if_icmple] v50 <= v53 : goto => 64 if_false_block { // block_id: 2 59 => 61 parent_id: 3 [ 59 bipush] (I)v59 = 100 [ 61 goto] goto: 66 } if_true_block { // block_id: 1 64 => 64 parent_id: 3 [ 64 bipush] (I)v64 = 20 } } } } [ 66 aload] java/util/Random var1(1) = (stack_var)var1 [ 68 invokevirtual] I v68 = var1.nextInt() [ 71 sipush] (I)v71 = 1000 if { // block_id: 12 74 => 81 parent_id: 15 [ 74 if_icmple] v68 <= v71 : goto => 81 if_false_block { // block_id: 11 77 => 78 parent_id: 12 [ 77 iconst_0] (I)v77 = 0 [ 78 goto] goto: 82 } if_true_block { // block_id: 10 81 => 81 parent_id: 12 [ 81 iconst_1] (I)v81 = 1 } } if { // block_id: 14 82 => 89 parent_id: 15 [ 82 if_icmpge] v40 >= v77 : goto => 89 [ 85 iconst_1] (I)v85 = 1 [ 86 goto] goto: 90 if_true_block { // block_id: 13 89 => 89 parent_id: 14 [ 89 iconst_0] (I)v89 = 0 } } } } [ 90 istore] I var2(1) = (stack_var)var2 [ 92 return] return 
15
  • 2
    Why do you need to do that? The bytecode doesn't really contain information on how the source code was generated. Also, what exactly generated the "structured bytecode"? Commented Apr 21, 2024 at 6:09
  • 8
    There is no reliable way to do this. Ternary operator is a language feature. The compiler can compile it into any sequence of instructions that behaves correctly, which could be the same as what an if statement would be compiled to. And different compilers may compile it differently too. This seems like an XY Problem at best. Commented Apr 21, 2024 at 6:11
  • 1
    and don't forget that with newer Java versions we also got switch expressions... Commented Apr 21, 2024 at 7:08
  • 1
    Taking the question literally, it’s actually very simple. Ternary operators normally have a value on the stack at the last branch merge point whereas if statements have not. The link given by Bohemian actually demonstrates the typical compilation strategy. I don’t know why you say “it’s not relevant” when it is exactly the answer to the question. In fact, I don’t see how your own answer is relevant to the question. There, you discuss whether you can “shrink a nested type of ternary operator into one line” but that was never asked for. Commented Oct 24, 2024 at 14:23
  • 1
    No need to feel ashamed, it’s a valid question and not necessarily an easy one. A lot of commenters dodged it by saying “it’s compiler dependent” which is correct, but still, there are things, typical compilers have in common. I think, the tough challenge is not the expression form of the two alternative values but the compound condition, as compilers typically compile it to an optimized form with no intermediate boolean value on the stack but rather directly jumping to either, the final target or the next conditional. Having assignments in-between is adding fuel to the flames. Commented Oct 25, 2024 at 15:07

1 Answer 1

1

For ternary operator, java compiler generates byte code as if/else blocks, for example:

int a = m > 1 ? 0 : 1 

after compiled the class file's bytecode look like the follow one(don't care faked offset)

0. aload_1 // load local variable m 1. ifeq #4 2. iconst_1 3. goto #5 4. iconst_0 5. istore // store local variable a 

the control flow graph

block_0: 0. aload_1 1. ifeq #4 block_1: 2. iconst_1 3. goto #5 block_3 4. iconst_0 block_4 5. istore 

it clearly shows that

  1. first a block_4 start with store/return, and the operand stack have some value.
  2. if (block_4->prev->prev) == block_0 and block_0 end with if statement.
  3. bingo! from block_0 to block_4 can be a ternary operator.

this is my stupid idea at the beginning, but not real world!

For the following example, the above method does not work

boolean dddd = (m = 20) > 100 || (30 > ((r.nextInt() > 100 && r.nextInt() != 1000) ? (100 & r.nextInt()) : 20) ? (Or.a1 = this.b1 = m = k = n = 100 > r.nextInt() ? r.nextInt() : 0) : (r.nextInt() > 2000 ? 100 : 20)) < (z = r.nextInt() > 1000 ? 0 : 1); 

I stucked, this why I asked the question.

After lots of days, I found the solution finally

the control flow graph should be build as following

block_0: 0. v0 = aload_1 1. ifeq v0 == 0 goto #4 block_1: 2. v1 = iconst_1 3. goto #5 block_3 4. v2 = iconst_0 block_4 5. var1 = v3 

because of my code will be run each path of cfg, v3 is easily marks as intersection of v1 and v2, then the cfg should be

block_0: 0. v0 = aload_1 1. ifeq v0 == 0 goto #4 block_1: 2. v1 = iconst_1 v3 = v1 3. goto #5 block_3 4. v2 = iconst_0 v3 = v2 block_4 5. var1 = v3 

after copy v3 to block_3 and block_1 and inline stack variable:

block_0: 0. v0 = aload_1 1. ifeq v0 == 0 goto #4 block_1: 2. v3 = iconst_1 3. goto #5 block_3 4. v3 = iconst_0 block_4 5. var1 = v3 

ok, the stack variable v3 is ternary.

This is not the full story, after I learned Static Single Assignment Form

if block A (have dominate frontiers && block's stack out's depth > 0) for block child in block A's successors insert phi node to child head 

depend on phi node, it's more eaiser find v3, and no need to loop through each path of control flow graph.

I can't write all here, its too much. Hope it helps others.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.