Is it OK for a function to both do-something and return-something?

Question

Is it OK for a function to both do-something and return-something? I try to avoid it for the most part, but there are situations where trying to avoid it would lead to worse/less clean code.

Ex:

if ( false == read_from_file( file, dest_buffer, num_bytes ) ) //error handling, etc

(I code mostly in C++ but probably applies elsewhere too.)

I could make it to check beforehand whether its possible to read that many bytes from file or not, and make the read function so that it asserts/crashes if it can't read. But then it's a realistic thing to fail to read from a file or from network, so that's not a very good solution. I can also call read and then check afterwards if an error-flag is set or not. I saw some libraries do this. I can give the function an error_value variable as another output parameter but that too is more complicated. But that is at least impossible to ignore, unlike the return-value.

So, is there a standard elegant solution for this? Is the return-bool version fine and acceptable?

What bothers me more is comparing the return value to false, in Yoda style no less. But to stay on topic: this is exactly what exceptions are for. — Bossie
– Bossie, Commented Sep 29, 2024 at 5:51
@Bossie: Exceptions are not intended for returning the status result of a processing operation. — Doc Brown
– Doc Brown, Commented Sep 29, 2024 at 5:54
“check beforehand whether its possible to read that many bytes from file or not, and make the read function so that it asserts/crashes if it can't read” is itself an anti-pattern at best, and a time-of-check-to-time-of-use (TOCTOU) bug at worst. And not even always possible to achieve: sometimes you can’t know if you can until you try. — user3840170
– user3840170, Commented Sep 29, 2024 at 11:18
@Newline true/false is not saying much. This is especially true for the error side. You should pass meaningful, rich errors. It helps so much. The best way would be to have a generic/template Result<TOk, TErr> class and pass around meaningful errors in the form of custom class. — freakish
– freakish, Commented Sep 30, 2024 at 2:14

Doc Brown · Accepted Answer · 2024-09-30 10:06:36Z

Command-Query separation is surely a useful principle, but it should not be taken as a dogma. Fowler once wrote

Meyer likes to use command-query separation absolutely, but there are exceptions. Popping a stack is a good example of a query that modifies state. Meyer correctly says that you can avoid having this method, but it is a useful idiom. So I prefer to follow this principle when I can, but I'm prepared to break it to get my pop.

So yes, when code gets cleaner with returning a status result from a processing operation, go ahead. You won't be arrested for it.

Let me add that in your specific example, if the read_from_file function would be part of an object, it would be easy to separate the state check from the processing:

my_reader.read_from_file( file, dest_buffer, num_bytes ); if ( my_reader.last_read_failed() ) //error handling, etc

But a design like this is not better just because it follows CQS literally - as commenters have pointed out, it bears some risk to evaluate the status of the wrong read operation.

For a free function, an equivalent could be realized by some global state variable, introducing probably more problems than it solves.

Separating the read() action from the is_failed() check is a recipe for bugs, and one of the big defects in the iostream standard library (in my opinion). It's very easy to forget to check for errors. Checks can be enforced e.g. by using a [[nodiscard]] return value, exceptions, or by combining the main result + possible errors in a single return value that must be unpacked first (e.g. std::expected or std::variant). — amon
– amon, Commented Sep 29, 2024 at 10:24
@amon: I agree to any kind of synchronous action. If, however, one tries to design some command based system, maybe with aysnchronous operations like an async read_from_file, separating the is_failed check may be perfectly jutsfified, — Doc Brown
– Doc Brown, Commented Sep 29, 2024 at 20:54
@DocBrown no, it is even worse when the operation is async. It gives an impression that you can run those in parallel, which obviously you can't. Like ever. That's a very bad design, success/error result should always be included in return value (or exception). — freakish
– freakish, Commented Sep 30, 2024 at 2:11
@freakish: you can't issue different read commands for different files in parallel? That's new to me. Anyway, in a command based system, the request/response pattern is unsuitable, the result of any command must be communicated by a separate event. And yes, that makes it more complicated than a synchronous request/response and hence more error prone. Still both designs have their place. — Doc Brown
– Doc Brown, Commented Sep 30, 2024 at 5:24
@freakish: I am not your "dude". Feel free to write a better answer, if you like. But as you may have noted, I am not a fan of this separation between read operation and result checking, too, and I hope my answer expresses that well. — Doc Brown
– Doc Brown, Commented Sep 30, 2024 at 10:08

Jerry Coffin · Accepted Answer · 2024-10-09 15:24:16Z

There are times that it's not only acceptable, but nearly unavoidable.

When you have almost any sort of parallel/concurrent processing involved, it's basically a given that almost any attempt at a pattern like:

if (operation can succeed) do operation

...is broken because it has a race condition--by the time you try to do the operation, the condition you checked may have changed. There are exceptions if only a single thread can affect that particular condition, but those definitely are exceptions.

People frequently see this with simple things like pushing or popping a stack (or a queue). But it's not unique to them. Although it's often less obvious, it's pretty much the norm for almost everything (in part because more complex things are composed of things like stacks and queues).

For your code to work, you nearly always have to try to do the operation, and then you can find out whether it worked. I'd also note that even if your program is single threaded, file I/O always has the potential for concurrency on any reasonably modern system. As such essentially all file I/O has to be written in terms of trying to do something, then seeing whether it succeeded for it to be correct.

You can certainly write low-level code that tries to do the operation, collects information on the success or failure, then just returns that result, and leaves it to higher level code to do the actual error handling.

But that still leaves it to some higher level code to call the lower-level function, and then handle the error if it failed. I suppose if you wanted to badly enough, you could still segregate that by having an error handling function that just returns if the operation succeeded:

void handle_error(ErrorIndication const &e) { if (e.succeeded()) return; // handle error } // ... foo = read_file(...); handle_error(foo);

That doesn't look like an improvement to me though.

I'd also take some issue with idea that you should always be able to query the state of the system without modifying it. Again, precisely the opposite is often true. One of the best known examples is the lowly mutex. At least as originally defined, there's no way to find whether somebody else current holds a mutex or not. The available operations are to take the mutex or release the mutex. Providing an "is the mutex taken" operation is purely an invitation for users to write code with race conditions, because virtually any code of the form if the mutex is taken do X is a guaranteed race condition. So there's really only one way to learn anything about the current state of a mutex: take the mutex, and know that you now own it.

The same thing arises with things like thread-safe queues and stacks. Well designed ones typically won't have any way to tell you how much space is available at any time (or whether space is available), because virtually any attempt at using that information leads to a race condition (the sole sort-of exception I've seen being periodically logging its size to get kind of a warm fuzzy feeling that it's not constantly backed up, or something like that).

Nice answer. "is broken because it has a race condition" yep, and that class of errors has a name: TOCTOU In order to allow for command query separation and avoid these issues (when applicable) one can use locks but that's often problematic for various reasons. — JimmyJames
– JimmyJames, Commented Oct 9, 2024 at 20:49

candied_orange · Accepted Answer · 2024-10-01 10:35:30Z

The absolute sin is to create a system in which I have to change something to learn something. Something that already existed.

If I can’t tell if a door is closed unless I issue a command to close the door and expect it to throw an exception when the door is already closed, well this system is crap. I just wanted to know the state of the door.

Do not let a command become the only way to learn state. In fact, be sure the query is easy to find so people won’t be temped to use a command in its place.

A well designed system provides a query that will let you learn about any state a command would have let you learn so you’re never in a position where you’re issuing state changing commands just to learn something.

Refusing to return anything but void from all commands is a way to ensure a system is designed that way. It’s not fool proof. And it’s not the only way.

Conversely, sometimes you need to hear directly from the command. Asking if a file exists before writing to it can set up a race condition (which is not a good thing). Sometimes it is better to hear what happened from the command. Just don’t make me write to files when I have nothing to write and all I want to do is find out if they exist.

Return is just one of many ways to hear directly from a command. Don't fool yourself into thinking you're compliant simply because you encoded the result in an exception or output parameter rather than using return. Using return is simply a style choice.

What you're really choosing between is a command that the client can fire and forget (see event) and a command that must be managed in some way.

That management represents more coupling. But sometimes it's required. Sometimes it's not. Sometimes you can write the client so it doesn't care if it failed. Sometimes failures are for something else to deal with.

CQS says "asking a question should not change the answer". Which I deeply believe. However, "More formally, methods should return a value only if they are referentially transparent and hence possess no side effects." is naïve. Oh sure you make the functional people feel warm and fuzzy but if you just hide the result in something else that I still have to manage then you haven't given me an event. You're just playing hide and seek with me.

So if we're talking about a command that simply demands management I have no problem seeing it return. I mean, who expects an IO command to not have a side effect? Just make sure the importance of dealing with that return is understood. I've seen plenty of students fail to check for returned errors.

"Do not let a command become the only way to learn state." I agree with the overall point but there is one important caveat: if the query requires the same amount of legwork as the command and doing it twice is just wasteful for performance reasons; then having both a query and a command means you're liable to run into people doing if(exists) then do() logic, which does the legwork twice. It can make more sense to design your interface towards tryDo() which implies feedback about whether the job was done or not, but already having done it if it were possible. — Flater
– Flater, Commented Sep 30, 2024 at 2:03
I'm not saying one approach is universally superior over the other, but it's a contextual consideration as to whether you're reasonably expecting someone to learn state without intending to subsequently trigger the command. Sometimes that's a realistic expectation, sometimes it's not. At the very core, while I support the point this answer makes, the counterresponse to it can often run foul of the "tell don't ask" guideline, so this requires some balanced and contextual consideration. — Flater
– Flater, Commented Sep 30, 2024 at 2:04
@Flater you seem to be making the same point GregBurghardt was which is the same point my last paragraph was trying to make. I'm starting to feel really insecure about it. You know I was saying to avoid the race condition right? — candied_orange
– candied_orange, Commented Sep 30, 2024 at 7:13
@candied_orange My point wasn't so much a race condition as it was about (a) potentially costing more effort when people end up performing a query followed by a command and (b) general concern for this leading to "tell don't ask" violations. — Flater
– Flater, Commented Sep 30, 2024 at 8:52
@Flater So you don't think my last paragraph illustrates (a) at all. Wish I could see how to improve it. As for (b) we're already talking about exposed state. Properly encapsulated state isn't something a command exposes. But wouldn't you agree that exposing state only with commands leads to some whacky client code? — candied_orange
– candied_orange, Commented Sep 30, 2024 at 10:10

Steve · Accepted Answer · 2024-09-29 06:36:29Z

tl;dr

In absolute terms, yes, it is acceptable to both do-something and return-something.

Side effects and impurities

Firstly, I note the question is tagged side-effect and pure-function. Nobody likes "side-effects", do they? And nobody likes "impurities", either.

These terms arise from the perspective that the "main effects" of any method are it's formal results, and that the ideal method is one whose inputs are exclusively its formal arguments.

Unfortunately these terms have arisen in mathematical or rarefied academic circles, and the terms have completely inappropriate and misleading connotations for programmers, where the main effects of a method call are often not the formal results but the writing of data to storage or to a network channel, and where the inputs are very often not confined to the formal arguments but consist of reading data from storage or from a network channel.

And there are, furthermore, very common cases of algorithms in which a mixture of both reads and writes to storage must occur in an atomic and transactional fashion - in other words, where the primary focus of the activity is, in one overall shot, doing both of what some would deem drawing impurities and causing side effects.

Hopefully that sets the overall scene somewhat.

Good program design

It shouldn't need much justification for me to state that designing computer programs is a complicated craft activity that cannot be reduced to a few sentences.

A computer is ultimately a physical machine. A programming language exists to control the activity of that machine.

Part of designing computer software is about creating a conceptualisation of what the software causes (or is supposed to cause) the machine to do, and creating a terminology for that conceptualisation so that it can be talked about.

A great deal of software is designed to be used not by the "developers" of the software but by another group of people, the "users".

The most high-level conceptualisations of software, then, consist of operations ultimately available to the user which allow him to exert enough control over the machine and drive it for his purposes.

Because of the typical internal complexity of the software and the need for the developer to cope with it, there are also often more detailed internal conceptualisations, not necessarily visible to users, but certainly understood by the developers and visible to varying degrees in the source code.

The distinction between reading data (or putting it more abstractly, viewing the state of the machine and the records it holds) and writing data (or more abstractly, causing a change in state) tends to be considered one of the fundamental conceptual distinctions around which a program is devised and its activity sequenced.

This distinction between read and write is a recurring motif that permeates software design, and it doesn't have a single overwhelming cause or justification but instead seems to be found generally convenient for all sorts of reasons.

It is common, especially (but not solely) amongst inexperienced programmers, to do this design work badly.

There are more ways to do the design badly than to do it well, but one of the crucial ways of doing it badly is that concepts and terms end up muddled and conflated, and it becomes unclear to the relevant person (whether developer or user), and for the purpose of exercising their kind of control, whether a particular operation is reading or writing, what it is reading and/or writing, or perhaps how each step of reading or writing is sequenced.

The design can be unclear because the terminology in use doesn't match the activity that actually occurs. It can be unclear because the terminology is vague and doesn't mean anything specific. It can be because the source code is structured badly and it is difficult for the developer to see the details of what it does.

And even sometimes when the facts of what is going on are clear, it may seem that two operations are in an unnatural and unnecessary union relative to their conceptualisation.

Like not being able to ask what is for dinner without simultaneously defecating on the toilet, software might be written so that data cannot be inspected on-screen without triggering an unnecessary printout, or that a sales order cannot be merely viewed without causing the audit stamp/amendment date to change and being prompted to record a reason for change.

In both cases, it should be possible to see the existing state without causing the changes. Even if eating dinner is usually followed by going to the toilet, it shouldn't be an iron law but should be up to the person to control according to the circumstances.

It is from these kinds of situations that the dictum not to read and write in the same operation arises.

But it's not a dictum that can be interpreted slavishly - there could be a perfectly proper reason, for example, why merely reading and viewing data should create an audit record logging access to the data (but not amendment of it). What you end up with, often, is distinguishing between a primary category of record for which the read/write distinction remains crystal clear, and an ancillary category of record like debug/audit logs which can be written even when nominally in a read-only mode in relation to the primary record.

Similarly, when a write operation also involves reads, there can be various excuses.

Conclusion

Keeping reads and writes separate is very much a guideline - a default practice from which deviations are then justified for specific cases.

And methods generally consist of four ways in and out: the formal arguments and results, and the implicit inputs from and outputs to storage. Organising the usage of these ways in and out is the name of the game, not slavishly avoiding any of them.

Stack Exchange Network

Is it OK for a function to both do-something and return-something?

4 Answers 4

Linked

Hot Network Questions

Is it OK for a function to both do-something and return-something?

4 Answers 4

Linked

Related

Hot Network Questions