7

In large systems there are often code paths that modify state or produce side effects that other code comes to depend on. This makes it hard to safely change code without understanding the whole system, because changing the order of function calls or operations could change the behavior. Is there a software quality metric that measures or correlates with this kind of issue?

I'll give an example to illustrate.

Example: function modifies its inputs

function getTotalTripDistance(trip: Trip): number { // This modifies its inputs trip.waypoints.sort((w1, w2) => w1.time - w2.time); const legDistances = computeLegDistances(trip.waypoints); return sum(legDistances); } function formatWaypointLabels(trip: Trip): string { const waypointLabels = trip.waypoints.map(w => w.prettyLabel()); return `(${waypointLabels.join(',')})`; } function summarizeTrip(trip: Trip): string { const totalDistance = computeTripDistance(trip); const waypointsFormatted = formatWaypointLabels(trip); return `Trip to waypoints ${waypointsFormatted} takes distance ${totalDistance}`; } 

In this example, changing the order in which computeTripDistance() and formatWaypointLabels() are called will change the behavior. In order to safely change one part of the code (summarizeTrip), one must understand a larger part of the system than otherwise necessary.

Other ways this issue can manifest:

  1. A class changes its internal state within a method call. Other code depends on this happening.
  2. A function modifies global state that other code depends on.
  3. State is updated via I/O, such as by writing to a database. Other code depends on this happening.
  4. Same problem, larger scale: one system depends on another system changing its state in response to a service call/message.

Obviously a useful system has to change state, since this is usually part of the system's explicit requirements. Users expect their actions to produce changes. But most systems have some amount of "accidental" state changes that aren't necessary, and that make the system harder to understand and to change.

It seems like it would be difficult to define a metric that measures something interesting here, because common patterns like memoization involve changes to state that usually don't matter in practice. Existing code quality metrics like those on CISQ's Coding Rules to Deliver Resilient and Scalable Software don't mention changes to state or side effects. Is there any useful metric out there that can help highlight code that would benefit from removing side effects and state changes?

Edit: It wasn't clear in my original question, but a metric that has evidence that it predicts code quality is much more useful than one without evidence. I believe "pure" code is easier to work with but this is only an intuition based on my own experience. One thing I'd like to get out of learning about the available metrics is a more empirically grounded set of beliefs about coding practices.

1
  • 3
    I once thought that total count of mutable classes is a good metric. Turns out that depends on problem space and mutability can be replaced with functional code without really simplifying the code, so it is not a good metric. Otherwise, it would be easy to declare any non-Haskell program bad. Commented Jun 2, 2024 at 1:22

6 Answers 6

2

You could look at some functional programming metrics. for example Number Of Lambda Functions With Side Effects Used In A Class (LSE)

A function with no side effects is a pure function, by measuring the percentage of pure functions in your program you are essentially asking "Am I doing functional programming?"

If you applied this metric to your code, I think it would be begging the question, "Why not program in a functional language?". Then you could have the compiler enforce the rules.

This guy did a study for his masters : https://research.infosupport.com/wp-content/uploads/Master_thesis_bjorn_jacobs_1.6.1.pdf

RQ: To what extent can functional purity be used as a code quality metric in an object-oriented language like csharp?

Within object-oriented languages, there exists impure behaviour by design. However, how impure a function is can be quantified by measuring what kind of and how many impure actions it contains. By using these measurements, a purity metric can be calculated. Moreover, when evaluating the error-proneness predictions, the purity metric scored better than existing object-oriented and functional metrics. Even though these results seem promising these are derived from a limited data set.

10
  • 1
    For this work, we need some proof that functional style is "better". Commented Jun 2, 2024 at 15:06
  • well, I must admit i just skim read the paper and it's only a masters. The good thing is Bjorn put his code on github, you can check it out : github.com/bjornjacobs/PurityCodeQualityMetrics Commented Jun 2, 2024 at 15:20
  • This is exactly the kind of metric I was looking for. For this work, we need some proof that functional style is "better". The paper uses a limited data set but there's no law against gathering more data. What I was thinking of is measuring different codebases against the metric and comparing with engineers' subjective judgments of code quality. Commented Jun 2, 2024 at 17:41
  • It looks like he measures against a known "errorfullness" of the code, but if you look at the code hes written to implement the measure, its very functional, so i suspect a bias :) But we know empirically that typed languages get less errors, yet there are still people who say that strong typing is bad. Stats is like black magic Commented Jun 2, 2024 at 18:02
  • @Basilevs "For this work, we need some proof that functional style is "better"." This is a very good point. I'm working under the assumption that "pure" code is easier to work with, but there are certainly tradeoffs. A big one is that pure code typically makes more of its behavior explicit, which means more lines of code and more constraints on changing it. This is good and bad (the constraints in particular), because if the code is changed to produce a bug, the constraint might prevent that. But often the constraints are not important and just extra work. Commented Jun 2, 2024 at 18:14
5

Side effects cannot be eliminated. Not if your program is going to do useful work.

What you can do is be formal about side effects and create a place free of side effects and also a second place to keep them. A useful metric would show how compliant that side effect free place is.

There is a name for this style. It’s called Functional Core Imperative Shell

You would have to measure that compliance against how empty that side effect free code is because you could get a perfect score just by leaving it empty.

The point is to make it easier to reason about code. So none of this works if readers can’t tell and trust they are looking at supposedly side effect free code. Without that this is just more noise.

Conversely you could measure how compliant the side effect place is at being free of interesting logic. But you'd need a way to detect interesting logic.

What I am absolutely not advocating is simply counting how many functions in the program are side effect free. This metric would be useless. What readers need to know isn't the percentage of side effect functions in the program. It's whether this function, that they're reading right now, is side effect free.

3
  • 3
    "Side effects cannot be eliminated. Not if your program is going to do useful work." – Fully agree. Simon Peyton-Jones, one of the creators of Haskell, likes to say: "All a program with no side-effects can do is warm up the CPU". In one talk, an audience member pointed out, that warming the CPU is a side-effect, too :-D Commented Jun 2, 2024 at 14:00
  • Part of my motivation for finding a code quality metric related to side effects is to measure the impact on quality of moving to Functional Core Imperative Shell. "You would have to measure that against how empty that side effect free code is because you could get a perfect score just by leaving it empty." If side effect-free code has a lower cost/higher quality per LOC than imperative code (this is the metric I'm looking for and assuming there's supporting evidence on quality), then total quality can be improved by holding LOC equal and moving to FCIS. Commented Jun 2, 2024 at 18:24
  • @Sam the important thing is being able to trust that a bit of code is side effect free based on these metrics. If the metric can't make me feel that way it's just noise. The whole program can't be side effect free. But some bit of code might be. Find a way to help me see that so I can read code faster. Commented Jun 3, 2024 at 11:26
2

Metrics are things one can easily measure.

The difficulty with measuring side effects is to identify those you want to measure:

  • Of course, it's easy to detect and measure use of global variables that rely necessarily on side effets, and are sign of unhealthy design.

  • But when a function gets access to objects by references it is difficult to distinguish unwanted side effects from a desired state change. For example, is changing a value of a public property an undesired side effect if the property is public? Is calling the function tree.addChild(x) for a reference parameter tree an (undesired) side effect or a (legitimate) state change ?

  • and if a function changes some internal variable (e.g a dirty flag), how to tell if it is a desired change of the object state (in this example, that an object was altered and needs to be saved to the database at some stage) or an undesired side-effect or a dangereous design (e.g using internal state to communicate between functions) ?

So how to detect and measure the undesired side effects, without counting the legitimate state changes (because counting all state changes will just tell is something about the complexity of the problem, not the quality of code) ?

Btw, I'm not an expert of CISQ's rule set and the details of the CWEs (Common Weakness Enumeration) it refers to, but I think that undesired side effects that you describe should be covered by the point "Improper Protection of Alternate Path". Indeed this should catch the case where different order of calling functions would lead to different results. Without side effects or state changes you don't need to protect anything, but with side effects you should at least make sure that things are called in the right order.

2
  • 1
    In all three of those examples, whether intended or not the fact that state is being modified makes the code harder to understand and change. And it might be a worthwhile tradeoff! There are other factors that matter too. Measuring state changes would make it easier to make deliberate tradeoffs and compare two possible implementations more objectively. Commented Jun 2, 2024 at 17:59
  • @Sam not every problem can be solved stateless in an efficient manner. You can write a lot of wrong code with pure functions as well. And I don't think that any rocket, airplane, nuclear power plant or operating system is written only with pure functions. If you ask a question with an answer in mind, and discard those that do not meet your own belief, isn't the question is biased / opinion based. Commented Jun 2, 2024 at 21:08
0

There are languages where side effects are absent (well localized). They are not really any "better" than others. That's a strong evidence, that side-effects per se are not a good metric.

People usually talk about a principle of least surprise instead, but that is subjective and therefore cannot serve as a metric either.

1
  • Also to clarify as your answer was downvoted not long before Inposted my answer: I didn't downvote, since your statements are not wrong. I'd suggest by the way to add in your answer the comment you posted under the answer: it allows to better understand your position. Commented Jun 2, 2024 at 11:16
0

The problem is that not only does source code not explicitly record a lot of the underlying thinking which goes into the design of the system (including how the computer application fits with the wider world outside the computer), but by definition the source code cannot record any thinking about properties which the system has implicitly or latently but which were not intentionally designed-in by the developer at the outset.

If there was some kind of analysis or metric, it is unclear how it would work except by analysing source code, which is not a complete record about the design of software.

Generally speaking, data processing is not merely about automating the calculation of certain values from others, but about automating the storage, retrieval, and movement of data, as part of how a system of doing business is organised and how the activity of a workforce is coordinated. That is, the simple movement of data to and from storage (either quickly on demand, or on the correct schedule), in exactly the form it was originally input by the user, is often just as useful as any calculations and transformations that may occur.

What is recorded by a computer in a business context is often things about the real world - either temporary facts about the state of the wider business operation, or about the events that have occurred or activities performed within the scope of that operation.

It is therefore inherent that there are "state changes" and "side effects" in the machinery which automates that data processing, because the real world consists of changing circumstances which that machinery is supposed to record - there is nothing remarkable or undesirable about these state changes per se.

The process of developing software and computerisation is often not just about fitting the workings of that software to suit what the system of doing business requires in terms of data processing, but also altering the system of doing business to better resemble a simple mechanical activity that is eligible to have records made about it, and eligible for computerisation.

What this all means is that, for the purposes of any analysis of software, the bulk of information that is clearly necessary for a rigorous analysis of what it does and why, is simply not recorded anywhere - insofar as we accept that source code doesn't even touch the sides of describing "how a business works".

Indeed the information is not even necessarily present in the minds of the developers. When you supervise the design of an application over a period of time, you often find that it works in a certain way that was arbitrary at the time those workings were first conceived, but later becomes particularly useful for a new purpose. You can also find the counterpart case, that an originally arbitrary aspect of a design later becomes highly inconvenient.

The point is that the software design clearly has certain properties set into it, the usefulness (or disusefulness) of which might become very clear to us when we try to do certain new things with it, but those properties may be introduced arbitrarily in the first place.

I dare say there are also properties which are useful and necessary right from the outset, but which are introduced only accidentally in the first place, and which the developer may not recognise the existence of unless these properties are upset by a change.

This makes it hard to safely change code without understanding the whole system

Yes. The need for whole-system understanding when designing systems, in order to design them correctly, is unfortunately the nature of the challenge.

A lot of people unfortunately approach this problem in a small-minded way by asking how systems can be broken up into smaller independent pieces that they can presumably more easily grasp. But there often are no smaller independent pieces.

There are actually two other possible strategies. One is to control the complexity of designed systems in the first place, so that they can be understood and readapted more easily.

Another is to accept the complexity but consider how to effectively reproduce whole knowledge and understanding about them. This might often involve teams or communities of people having a collective understanding and capability, rather than a single individual having it. This is traditionally the role of institutions like churches, trade guilds, and universities - the latter when they were organised around the needs of attracting and developing intellectuals, rather than around retailing credentials.

The point is to emphasise that there are known strategies for coping with complex man-made systems, they just don't involve getting the moon on a stick, but involve processes of education which become a significant form of work activity in itself (both for the teacher and the learner) and it involves a scale-out of brains in which multiple people have to be organised into one mind that works together on a problem.

But most systems have some amount of "accidental" state changes that aren't necessary, and that make the system harder to understand and to change.

Perhaps - I can't immediately think of an example in practice of "accidental" and unnecessary state changes.

It is obviously not that easy to identify them, otherwise you wouldn't need a metric - you'd just review the code. I presume one of the ways they "make the system harder to understand", is by these defects themselves being hard to understand - hard to understand what they do as part of the system, and whether they are necessary for the correct workings or not.

And how easy is it to change the software to drive out these accidental and unnecessary state changes once they are identified as defects?

It may not pay and dividends by identifying them more easily through metrics, if it is still then difficult to drive them out - as difficult as the difficulties they are said to cause for making other changes.

3
  • 1
    I don't get it. We have IO monad for many years and it works just fine to very explicitly document and enforce where state change happens. So code can and will explicitly document mutations. And there is no need to understand other aspects of the system for this. Commented Jun 2, 2024 at 8:31
  • 1
    @Basilevs, I imagine something less than 1% of code for business data processing, is written using any language that has the concept of the "IO monad". But I read the OP as being concerned with measuring the appropriateness or correctness of mutations - not simply detecting their existence. In many decently written applications, you could detect the site of mutations by searching for the assignment operator, or for the word "insert" or "update" in database calls, etc. Commented Jun 2, 2024 at 9:22
  • 1
    Just to clarify, since your answer was downvoted shortly after I posted mine: I didn't downvote, as there are some interesting thoughts in your answer ! Commented Jun 2, 2024 at 11:11
0

Let's unwind this a bit.

Freely being able to change the order of operations is not a default expectation for most logic. The inability to do so is not indicative of an accidental or unnecessary side effect, it's merely proof of some kind of dependency (warranted or not), one that enforces a particular order of operations.

A forced order of operations is not indicative of a mistake. Sometimes this is inherently necessary, e.g. needing to square the numbers before adding those squares together when calculating the hypotenuse using the Pythagorean theorem. You couldn't do that the other way around and still get the right answer.

In this example, changing the order in which computeTripDistance() and formatWaypointLabels() are called will change the behavior.

What you're implying here is that you were expecting this to be freely interchangable.

From that, we can conclude that the mistake you made here was tying these behaviors together in a way that they could have a dependency connecting them and thus enforcing an order of operations that you are convinced should not have existed, even though you were actively asserting that no such dependency should ever exist (since you clearly are expecting them to be freely interchangeable).

You could have built the code in a way that it wouldn't be possible for these two behaviors to share a dependency, by separating them out, introducing immutability for anything that they both touch, and even straight up testing if these could be tackled in any order to confirm your expectations. Up until now, you did none of these things, since you find yourself in the position where the code was built in a way that you think it should never have been in the first place.

However, no metric is going to be able to base itself off of your implicit expectations. Metrics are based off of what you write, not what you thought you were writing. And what you did write was a piece of code where these two behaviors did have a dependency and where the order of operations did matter.
Any metric would use that knowledge to infer that you intended this design, and therefore it's not able to second-guess you.

What you're expecting from this metric is for it to be an AI that sees what you're doing, infers what you are actually trying to achieve, and proactively looks for better ways of achieving that. We can't even get people to do that reliably, let alone coming up with some kind of ruleset that would do it better.

But you want something to verify that things are the way you expect them to be. So my advice to you is to write tests that explicitly confirm your expectations of the behavior that you want to see. If it fails, you'll know that you deviated from your expectations. If you've written all the tests on all the topics that you care about, and all of them pass, then you know that the things you care about work the way you expect them to, and the other things (which you don't care about) are irrelevant since you don't care about them.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.