6
$\begingroup$

One beneficial feature of Rust is that loop, the unconditional looping statement, allows so-called "returning break", like this:

loop { // ... if condition { break foo; } // ... } 

I see no reason not to extend this feature to the if statement. One example Rust-esque pseudocode is this:

'outer: if foo_condition { // ... if bar_condition { break 'outer foo; } // ... } else { // ... break bar; } 

Note the usage of a label. Without the label, break foo; would exit only from the inner if statement.

Also note that the returning part of break isn't necessary, if the if statement doesn't evaluate to a value.

I can extend this idea even for the while statement. One consequence of doing this is that the while statement must be endowed with a way to evaluate to a value when the while-loop finishes, like this for example:

'outer: while foo_condition { // ... if bar_condition { break 'outer foo; } // ... } finally { // ... break bar; } 

So why don't major C-like languages do this? Is it because the syntax would become too clunky?

$\endgroup$
8
  • 3
    $\begingroup$ en.wikipedia.org/wiki/Goto#Criticism $\endgroup$ Commented Jul 28 at 21:04
  • 2
    $\begingroup$ @NaïmCamilleFavier Yeah, but break is not goto. $\endgroup$ Commented Jul 28 at 21:08
  • 3
    $\begingroup$ Java, as a major C-like language, supports this. $\endgroup$ Commented Jul 28 at 21:13
  • 8
    $\begingroup$ @DannyuNDos of course it is. Every loop, for, while, if, and break is a goto. The language -- and the people who use the language -- just don't want to admit it. (Yes, I'm a greybeard who wrote in line-number BASIC and in assembly.) $\endgroup$ Commented Jul 29 at 6:28
  • 2
    $\begingroup$ Ah, right, I had forgotten about these. The example might be easier to understand if you actually used the return value (let result = …) in the code. $\endgroup$ Commented Jul 29 at 22:23

5 Answers 5

14
$\begingroup$

Java is a major C-like language, and the Java Language Specification gives this for break:

A break statement with label Identifier attempts to transfer control to the enclosing labeled statement (§14.7) that has the same Identifier as its label; this statement, which is called the break target, then immediately completes normally. In this case, the break target need not be a switch, while, do, or for statement.

(bold added)

That is, you can break from any block statement, including if but also bare braced blocks (and also foo: break foo; works as a no-op statement). This code includes both an if and a bare block that can be broken out of:

foo: { System.out.println("1"); bar: if (true) { if (Math.random() < 0.5) break bar; System.out.println("2"); } System.out.println("3"); if (Math.random() < 0.5) break foo; System.out.println("4"); } System.out.println("5"); 

This has been part of the language for a very long time; I'm not sure just how long, but it has been there at the very least since Java 6 in 2006 (the oldest JLS on the website), but I think perhaps from the very beginning. Despite that vintage, the main effect of it seems to be, in practice, that it gets almost no actual use outside of generated code and most Java programmers don't know that it exists.

It's notable that C#, which made an effort to match the Java syntax, doesn't include this (or labelled break at all). I am not aware of discussion about why. It does have goto with subsuming functionality, though.

Java does not have return values from if statements, loops, or blocks, so the "returning break" isn't part of the design. Most major C-like languages are in the same boat there. That is a fairly invasive and less "C-like" feature, so it's possible that languages that have it just aren't considered C-like. Even in languages that do have if expressions, I'm not sure how common wanting to "early return" a value from the expression would be: in most cases either a more precise else-if condition or use of a broader pattern-matching construct probably does a better and clearer job. It's not without utility, though, so it comes down to a matter of complexity budget, software-engineering questions about multiple exits from blocks, and fine-grained design & implementation effort.

I could see this whole package fitting as part of Rust, though I don't know that it's sufficiently motivating to be worth the work. I don't think the while-finally version would be worth its weight above the equivalent infinite loop with if+break at the top.

As far as break-from-anywhere itself goes, though, we have a widespread example, it doesn't appear to cause any trouble, and it gets very little use.

$\endgroup$
10
  • 3
    $\begingroup$ You're correct about the heritage. The same language about break is present in JLS 1 from 1996. javaalmanac.io/jdk/1.0/langspec.pdf $\endgroup$ Commented Jul 29 at 8:26
  • $\begingroup$ @alexh Beat me by 7 mins. Used Java for maybe 15 years, starting with 1.2, and don't think I ever used that. I did see others use it though. $\endgroup$ Commented Jul 29 at 8:37
  • $\begingroup$ In Python, labelled break was proposed and rejected. $\endgroup$ Commented Jul 29 at 12:51
  • $\begingroup$ IME (many years of Java, and C before that), it's very occasionally useful to be able to break out of (or continue round) multiple nested loops, which needs a label.  But I don't think I've ever used break or continue with anything other than a for or while loop, though. $\endgroup$ Commented Jul 29 at 13:01
  • 1
    $\begingroup$ @gidds, not to a label. The label identifies the block to be broken from. And the situation where I most frequently want that is when I have a switch inside a loop, and I want one of the cases to break from the loop. In Java, a labelled break is a pretty clean solution. In C, you have to use a goto (or rewrite the code in some less natural way to avoid it). Of course, good choice of labels improves clarity. $\endgroup$ Commented Jul 29 at 15:41
5
$\begingroup$

A simple answer from usage side. If you allow a language to break an if, then common code patterns like this will literally break:

loop ... { if( false condition 1 ) break; // <-- notice this is what many code does. if( false condition 2 ) { loop post processing break; // <-- you'd cause confusion letting `if`s break. } do something; if( true condition 3 ) continue; do something; ... } ``` 
$\endgroup$
3
  • 1
    $\begingroup$ This is why you have a labeled break in those languages which allow it. $\endgroup$ Commented Jul 29 at 17:52
  • $\begingroup$ A huge sacrifice in readability nonetheless. Make it break only loops unless a condition block is explicitly requested through use of a label. $\endgroup$ Commented Jul 29 at 18:23
  • 1
    $\begingroup$ @DannyNiu That's exactly how it works in Java or JavaScript: unlabelled break breaks from the nearest enclosing loop or switch $\endgroup$ Commented Jul 29 at 22:11
4
$\begingroup$

Many programming languages have a fixed, "magically-defined" set of two or three non-local exit targets:

  • break leaves some implicitly-defined block, typically the innermost lexically enclosing loop.
  • return leaves the lexically enclosing method / function / procedure block (the innermost one if the language supports nesting at that level), often combined with the possibility or necessity to provide a return value.
  • continue leaves the loop body of the innermost lexically enclosing loop, thus directly going to the next iteration.

In rare cases, it is useful to exit some block that does not fit into these patterns, e.g. in deeply nested loops, exit the three innermost ones, but continue with the outer loops. And therefore, some languages allow to specify which block to exit, e.g. Java's labelled breaks or LISP's return-from, not requiring these blocks to fulfill any special condition like being tied to a loop.

It's a matter of language "symmetry" to allow non-local exits not only from a "magically" given set of enclosing blocks, but from any enclosing block, and some languages support it while others don't.

But is that feature needed?

From a software engineering point of view, if the need arises to exit some block not covered by the "standards", this typically asks for a refactoring.

The situation will inevitably involve deeply nested blocks, and should for readability and code-complexity reasons be refactored, extracting some of the inner blocks into methods of their own, thus allowing to use a return statement from the extracted inner block.

$\endgroup$
5
  • $\begingroup$ Anything that can be represented as a nested Markov chain can be represented with a plain Markov chain. ? $\endgroup$ Commented Jul 30 at 11:24
  • 1
    $\begingroup$ I wish languages would split "continue" into two forms--one of which would branch to the condition check, and one of which would bypass the condition check, since each can be useful in different contexts. $\endgroup$ Commented Jul 30 at 19:08
  • $\begingroup$ In theory, the nested escape situation could be cleanly (but somewhat unnaturally) solved with a "nested break", where the break keyword can be repeated to specify how many levels of nesting to exit. (E.g., in your nested loop situation, break break break; could exit the three innermost loops, and continue the outer loops from there.) I have yet to see a language actually do so, though; languages usually expect either refactoring if possible, or otherwise goto so you're forced to label (and thus document) the exit point. $\endgroup$ Commented Jul 30 at 19:59
  • 2
    $\begingroup$ @JustinTime-ReinstateMonica I'd strongly recommend against break break break;. We humans are quite bad in counting, so finding the correct number of break repetitions will be a challenge. And later, when changing existing code, inserting or removing one level of braces (typically quite an innocent change) will break (pun intended) the program. And regarding goto, a named break is fundamentally simpler than that, as the jump target can only be the exit of a lexically-enclosing block. $\endgroup$ Commented Jul 31 at 15:46
  • $\begingroup$ That's a good point, yeah, @RalfKleberhoff. It's viable in theory, but in practice, we can both think of different problems with it. (For comparison, the biggest issues I saw with it are that it can encourage spaghetti code, and that it can be hard for the programmer to keep track of where it would break to.) [And as a note, I lumped named breaks in with goto, since they're essentially just goto with extra scope restrictions on it.] $\endgroup$ Commented Aug 1 at 1:55
2
$\begingroup$

The reasons against having them are complexity and semantics. Also, as Michael Homer's answer contained an option in Java that I've never encountered in my career and I also haven't seen any style guide preventing that, I guess basically nobody even knows that it is allowed. That's a pretty bad trade off for a feature to be added.

Note: A "do continue while false" pattern is essentially what you are looking for. That pattern is very uncommon, but used. Shortening a very uncommon pattern at any cost is uneconomic.

Complexity

Note: this section can be skipped if you assume compilation works by just defining labels and jumping to labels and never optimize or track termination or reachability. Writing code that isn't reachable in the CFG is usually a warning or error to make the programmer aware that he is producing something that cannot even be translated into a connected graph.

Break and continue introduce cross edges into your CFG. The ugly part about them is that they, together with optimizations, can result in arbitrary graph shapes and arbitrary complex path termination even before inline optimization. Without them, one operates on a planar graph that consists of forks and loops which is easy to operate on and also easy to draw for debugging. I.e. debugging the compiler by drawing the result without cross edges almost always results in a layout matching your intuition. With cross edges, that's almost never the case, thus, eating a lot of your time by just trying to understand what's in front of you.

In Tyr, it is allowed to return results from loops, also via break/continue. That adds a bit of extra complexity. E.g. how do you track results precise enough to allow using them directly (example; the names in the function signature get joined correctly and can be used; would not work if you do it in the wrong order, e.g. with labels and replacing labels later).

Additionally, it is allowed to pass blocks bound locally into a function as parameter that is inline called. Allowing break/continue there actually added a lot of complexity, but is also a kind of useful feature to abort Iterator.foreach etc. which are essentially loops if called from a loop (the break/continue would not bind to the loop inside foreach obviously because the caller would not know that it exists). The reason why this adds so much complexity is that the control flow of the inlined function changes while getting inlined and care must be taken not to create disconnected parts.

I wrote an article when working on Tyr 0.7 to deal with the pain it caused me. Most of the break continue tests in the testsuite contains some break/continue tests for Tyr 0.7 (see). With their restricted way, testing them interacting with loops correctly is kind of it. If you would allow them anywhere, you would essentially have to check in arbitrary constellations that reaching the current CFG part's main exit happens and handle cross edges resulting in a lot of extra tests.

Semantics

Is something like

def f = label : {continue label}

a valid function body? In essence it's a self-goto and the same as an empty loop, but would you tell the programmer that it doesn't make sense? How would you detect that? For a syntactic loop statement it is really simple.

An extremely tricky part is finally; while that's already tricky with loops, it gets even more confusing if you just add jumps by allowing break/continue to be used anywhere. To explain why this is tricky, let me make an example:

outer: { val is = new Iterator try { while is.hasNext { try { // do something if(done) return } finally { break /*outer*/ } } } finally { delete is } } 

Here, I try to explain several issues in one condensed example.

The first is try finally (and equivalents) is usually used for resource management (delete memory, release connections, close files). In most cases, it is an extremely bad idea to just skip these blocks. I.e. if you exit the function from within the loop , you expect both surrounding finallys getting executed (the return part).

Now, if you allow break/continue/return in a finally block it would get two goto targets in general. Which is an issue, because execution can only continue in one.

For Tyr 0.8, this results in a rule called cross edge suppression. The idea here is to align cross edges with the behavior of exceptions. Exceptions would result in essentially continuing with the next embracing finally block (assuming no catches exist on the path). The same is done with break/continue/return. I.e. the semantics is to try to execute all embracing finally blocks and then complete the initial jump or return.

Finally, an issue is side effects of implicit conversions contributed by control flow joins (mostly phi after loops that return results). If your break/continue with block would allow to return values, you kind of have to keep the entire path structure alive there just to be able to correctly insert such implicit conversions to get the expected semantics (a surrounding finally could have deleted the entity that you need to read in your implicit conversions).

I haven't written an article on that part yet, because I still haven't got all corner cases to work and do, hence, not know if it can work in general.

$\endgroup$
2
$\begingroup$

It's just less useful in a language that supports local functions (or lambda expressions).

Your example becomes:

function outer() { if foo_condition { // ... if bar_condition { return foo; } // ... } else { // ... return bar; } } x = outer(); 

True, this only gives the choice of two control-flow targets (break/continue enclosing loop, return from enclosing local function), but unless you love breaking all the code complexity guidelines, that's enough.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.