I'm working on a decompiler for a language for which I only have the bytecode. I have this graph (and several others with similar patterns) for which I can't seem to figure out the actual pseudo-code control flow. Typically, when you have multiple nodes branching towards one common node, that points towards chain of && and || seperated expression inside the condition of an if statement. But in this case, the top node jumos "into the middle" of the bottom condition chain. I don't know what this would look like in regular control flow, without resorting to tricks like code duplication.
2 Answers
Ignore my previous answer. It would be correct if basic blocks were statements in a procedural language (this is exactly what early-return looks like). You said this was a functional language, so I had another look at this and realized these are values. All of the basic blocks are values, and all of that is a single value. There is probably not a single explicit control flow statement in source code for this CFG. It's just an expression that looks something like that:
(and (or // bb 0-5 shambler-standing-explode-line-of-motion-check() // bb 0. If true, short-circuit 1-5 (and // bb 1-5 is-rogue-mode?() // bb 1 is-player?() // bb 2 (or player-is-prone?() // bb 3 player-is-supine?() // bb 4 ) player-in-prone-hiding-region?() // bb 5 ) ) (> melee-fact-get-time-since(player, shambler-explode) 5) // bb 6 (> melee-fact-get-time-since(arg_2, time-since-in-finisher-fail) 2) // bb 7 ) There are two interesting things here. Firstly, a compiler was smart enough to do early return as soon as answer is known, even for deeply nested bb 4 (which is not that surprising given than and and or are short circuiting in Racket so if bb 4 returned false then bb 5-7 must be short circuited anyway). Secondly, all of those early returns use same r4 register for condition, and all of them use negative branch for early return (so that r4 value is reused as return value). I bet top-level or would've had similar early returns with positive branches.
Previous incorrect answer follows.
It is not reducible to just if/else as control flow structure items (without doing shenanigans like injecting extra variables to aid control flow). It also employs early-returns; vertices going to return node do break nested structure. If you remove them you are left with a fairly easy CFG which can be represented by if/elses.
The code structure probably looked something like this:
// bb 0 if (!) { // bb 1 if () return; // bb 2 if () return; // bb 3 if (!) { // bb 4; if () return; } // bb 5 if () return; } // bb 6 if (!) { // bb 7 } Now, faithful/naive representation of this CFG would necessarily include some gotos.
You probably do want to "resort to code duplication" for this special case of branch-to-epilogue.
Early return is not the only language construct that breaks strict control flow structure nesting. E.g. in C there are also continue and break; switch is somewhat complicated. Python adds for/else construct on top of that. Rust has multi-level breaks.
- Alright, that's already interesting. I should've probably mentioned in the question that the source code is heavily inspired by Racket, so everything is sorta functional. Unfortunately I have very little experience with that language.DeepQuantum– DeepQuantum2025-11-24 16:17:40 +00:00Commented 2 days ago
- To your edit: That looks super helpful, thanks a lot! My bad that I didn't mention the functional part earlier. I honestly started writing the decompiler in C-style code because I thought it wouldn't matter which front end I use, but now it seems like I should switch it to the functional version instead. Thanks a lot!DeepQuantum– DeepQuantum2025-11-25 00:16:19 +00:00Commented 2 days ago
Adding to @Andrew Turkin's answer, I believe the full decompilation would look like this:
(and (or shambler-standing-explode-line-of-motion-check() (and is-rogue-mode?() is-player?() (or player-is-prone?() player-is-supine?() ) player-in-prone-hiding-region?() ) (and is-rogue-mode?() is-player?() npc-can-path-to-object?(arg_2, arg_3) (< distance-between-points(get-nav-destination(arg_2), get_object-position(arg_3)) 4.0) ) ) (> melee-fact-get-time-since(player, shambler-explode) 5) (> melee-fact-get-time-since(arg_2, time-since-in-finisher-fail) 2) ) 