4

I'm trying to better understand cohesion and coupling. More specifically, I'm trying to use examples to better cement the concepts and one of the most common examples I see goes something like this.

For this "bad" code

public void Process() { string connectionString = getConnectionString(); SqlConnection connection = new SqlConnection(connectionString); DataServer1 server = new DataServer1(connection); int daysOld = 5; using (SqlDataReader reader = server.GetWorkItemData(daysOld)) { while (reader.Read()) { string name = reader.GetString(0); string location = reader.GetString(1); processItem(name, location); } } } 

A common refactoring being something like this

public void DoSomething() { DataServer2 server = new DataServer2(); foreach (DataItem item in server.GetWorkItemData(5)) { doSomeBusinessLogic(item); } } 

Firstly, am I wrong in assuming that just asking about cohesion and coupling with regards to this code is enough to motivate this refactoring, or should I be using other principles, asking other questions?

Second, as far as I understand it the code in the first snippet is cohesive; there's no statement/expression that isn't providing value towards the result of the function - it's all related to the function. Some would say that the SQL statements and ConfigurationManager are different concerns, but I don't understand that logic. Isn't a function always going to have this? It's the intent of the function that drives what cohesion means for the elements of the function.

So if it's not cohesion that motivates this refactoring, then it must be coupling, or like I said in the first sentence, something else entirely. In terms of coupling I hear descriptive terms like global, external, stamp and data which I see can apply to few things in the functions. But as far as I'm aware those terms don't cover things like the data initialised in a function (here looking at any of the SQL deps), which I'm unsure if it's strong or weak coupling because the description I'm used to "A change in module (A) which necessitates a change in module (B)" doesn't seem to cover this example. I would have though the initialisation of any deps from within the function is strong coupling, but not by the definition above, more because to change the particular dep you have to crack open the function, not just pass a different dep. Even though our function doesn't notice, ultimately that dep still has to be changed somewhere, we've just pushed that concern outward.

It seems like any dependency which isn't passed in is going to be considered strong coupling, but continuing that to it's logical conclusion would mean passing every element of a function as a param which surely means the function is leaky abstraction.

Apologies for the length, I've tried to keep this as succinct as possible.

6
  • Your refactored code still instantiates a DataServer2 object and is therefore tightly coupled. There are different types of dependencies. Data sources (file system, network, data base etc.) are one type of dependency where it's valuable to decouple from, replacing it by an interface for which you can inject a concrete implementation. This helps with testing but of course also allows to replace it with little effort later in production. Commented Mar 9, 2024 at 15:17
  • 1
    Cohesion and Coupling are context-dependent. Furthermore, they are are not goals nor objectives in themselves; merely a means to an end. Your end goals are always important; the methods you choose to get there far less so. Ultimately you should focus on delivering solutions that you and your users/stakeholders have sufficient confidence in. With that in mind, the question here is really irrelevant; Focus your effort on testing instead; as the quality and coverage of your tests are a far more clear-cut and effective way of ensuring your code is maintainable for future programmers. Commented Mar 9, 2024 at 18:11
  • 1
    I think one of the most useful questions to ask when thinking about the structure of code would be "How easily can I write automated tests for this?". If developers are having a hard time writing good tests, then it almost certainly means the structure of that code is wrong somehow. On the other hand, if developers are having a really easy time writing good tests, then refactoring is unnecessary. Your users and stakeholders do care about a working, robust, effective solution; which is where tests are important. They don't care about the structure of its source code. Commented Mar 9, 2024 at 18:36
  • @BenCottrell, I find as well that the structure may be wrong if manual tests and debugging are found to be unduly difficult. Other branches of engineering have their inspection ports for taking gas samples and pressure readings, and their test points for applying electrical multimeters. A common error in software design is "stovepiping", where there are too few interception points which allow different parts or stages to be verified separately. This impairs "testability", whether those tests ultimately be manual or automatic. Commented Mar 9, 2024 at 22:48
  • Funnily enough I thought the examples were about business logic, but it's really about noticing the SQL/ConfigurationManager deps isn't it - the business logic is already a module of it's own and could easily be tested. It's the other stuff that's not modularised which at the moment isn't readily reusable. Maybe the example would make a better point if the business processing were also inline to begin with... Commented Mar 10, 2024 at 11:22

5 Answers 5

2

Sure, all of those steps contribute to the ultimate goal of that function, but that's a very weak argument - it applies to all steps of any method as long as they're not redundant.

In SE terms, obtaining data from a repository, transforming them and transmitting them somewhere else are about as different as concerns can get. In any nontrivial system it is almost certain that at some point you will want to retrieve data without doing exactly this transformation, or transform data obtained from somewhere else (perhaps a cache, or user input...). You can only do this if retrieval and processing are in two separate methods, not in one big method. (Proper testing is another aspect that virtually requires you to separate them.)

Look at it this way: eating healthily, learning things and having an active lifestyle all contribute to the higher-level goal of living a fulfilled life. But to govern a country, it's incomparably better to have separate ministries of agriculture, education and sports even though these activities seemingly to occur in haphazardly intermingled ways.

3
  • In your answer you write "it applies to all steps of any method as long as they're not redundant." which I totally agree with, and in that sense my understanding of cohesion was unhelpful. It was helpful to realise it also goes the other way; how closely are the functions elements couched in the terminology of the functions name/params i.e. are they at similar levels of abstraction. Commented Mar 10, 2024 at 2:34
  • Keeping in mind that a functions elements should be at a lower level, but not too low a level. Which is precisely what I'm struggling to determine with my understanding of coupling/cohesion Commented Mar 10, 2024 at 2:36
  • Also, a few replies make mention of the business logic as a module which makes sense, but an equally valid module which I didn't initially notice, but like you mention, is the SQL stuff. It may not be until I reach for it in another business context that I realise it should be elevated to a module of its own. So perhaps in that respect my frustration/difficulty with these concepts is part to do with lack of experience Commented Mar 10, 2024 at 10:43
2

How to properly organise source code is closely related to the proper organisation of concepts which a programmer would use to devise the source code for a particular application.

As such, cohesiveness and coupling are as much properties of an abstract system of concepts and ideas, as they are properties of the source code which is written to represent those things.

The need for this organisation is because it's very rarely possible to represent a useful application as a single procedure.

This might be because it does too many things that it would become difficult to keep them all in mind at once, and once we start to interact with paper to extend our mental faculties and allow us to design things more complicated than we can keep all in mind at once, we then need to ensure that the things represented in paperwork remain navigable and that it assists our minds in a convenient way.

This is essentially the underlying reason for various kinds of "modularisation" - it organises the conceptual content of written material in a way that facilitates our continued interaction with it.

Source code is also unusual amongst human written material in that it also serves to drive the activity of a computer, and can do so independently of any immediate human attention, and this can cause people to forget that source code is a tool whose primary user is the human, and which is primarily tailored for human use.

Most humans have a good facility for understanding and recalling the "function" or "purpose" of things in the world, and less comprehension for the fine details of the steps involved in the things whose overall purpose they might know. A computer, by contrast, has no comprehension of function or purpose at all, and knows only the fine details of steps it is capable of executing.

Source code is a special form of writing which is supposed to appeal to both - it includes natural language elements and visual layouts that facilitate it's human use, whilst still representing in very fine detail the steps which the computer should execute when that code is being used in its capacity to control a computer.

"Cohesion" and "coupling" are basically an attempt to describe the desirable properties of a "module" or a high-level concept. A module must, by definition, consist of more than one concept internally - otherwise the thing would just be a concept, not a module of them.

"Cohesion" concerns the internal relations of a module's concepts. Cohesiveness means that the internal concepts relate to each other "naturally" - that is, they obviously go together, they are all necessary to fulfil the function of the high-level module, and they are not supernumerary or unnecessary in regard to the high-level function of the module.

"Coupling" concerns the relations that the module has with other modules. Here, relationships should be as simple and minimal as possible, whilst still fulfilling the overall function of the system of modules (or the whole computer application).

A lot of this stuff is far easier said than done - easier discussed amongst professionals who already get it, than useful as guidance for learners.

It is not necessarily the case that maximising cohesion or minimising coupling is desirable.

Completely uncoupled modules would be separate applications which have no interactions - so within an application, you certainly don't want to eliminate coupling altogether, only to control it.

And a module that is fully-cohesive conceptually, might require a lot more design work than is really appropriate for the task, or may require a "bag of tools" module in source code to be split off unnecessarily when it otherwise suffices to be understood as a bag of tools that does various minor or common things in the application (which are not complicated enough individually to require any modularisation in source code).

Ultimately, there are all kinds of complicated professional judgements involved which make it more like an art, which are poorly described in literature, which take a long time and a lot of experience to acquire, and which are somewhat context-specific to the industry or size of business which may be commissioning a computer application.

Another problem is that, because modularisation is about organising complicated concepts which even expert programmers cannot grasp without employing additional techniques to manage the complexity, it is difficult to set up scenarios which demonstrate to learners the difficulties which modularisation relieves, or how modules should and shouldn't be designed, using any kind of simple examples.

A procedure of two-dozen lines, as this question starts with, is hardly sufficiently complex to need modularisation, let alone to begin to demonstrate what different levels of coupling and cohesion would look like.

The problem with employing complicated examples, however, is that they are difficult to articulate (in this site's format, for example), they would require real sustained effort from a professional educator to devise significant-sized codebases for conceptually-complicated applications which demonstrate the contrast between good and bad qualities, and the scale of the examples would then require considerable engagement and study for the learner to start to interact properly with the written code and start to properly perceive its ergonomic qualities.

My advice, therefore, is that as a practitioner you will quickly encounter situations where the limits of your mind require you to modularise just to stay in control of the design, and you will then naturally start to appreciate the quality of cohesion and coupling amongst these modules (because appropriate improvements should make the design seem subjectively easier for you to work with), and you won't need to ask anybody else what they are but will instead recognise the words that other professionals are using to describe those qualities.

Asking, as a beginner, how to operate on a simple piece of code to make it more cohesive or less coupled, is completely the wrong question. Because if you have code that needs to be more cohesive or less coupled, your question would be: "I sense I've lost control of this codebase and can no longer grasp what is going on. My colleagues cannot understand it anymore. I'm overwhelmed by the complexity of designing this application. How do I regain my ability to work with it?". And that wasn't your question.

2

Firstly, am I wrong in assuming that just asking about cohesion and coupling with regards to this code is enough to motivate this refactoring, or should I be using other principles, asking other questions?

I would be inclined to say that this example doesn't quite demonstrate the cohesion/coupling concerns specifically; to me it seems to be more about refactoring to an appropriate level of abstraction (as in, the original function deals with a lot of minutia, while the refactored function is composed only out of high-level statements (+ some simple control flow)). In other words, it almost reads like a high-level list of things to be done - you can immediately see the intent, as it is directly expressed (that is, if you had actual names there, instead of "doSomeBusinessLogic"). E.g., the high-level functions could correspond to explanatory "header" comments that might appear in this code, like // Go get configured threshold or // Get things bigger than threshold.

That said, it's hard to see why this type of redesign is a good idea when you only have generic names (like "doSomeBusinessLogic") and no other context (what the operations are matters, how things are expected or observed to change matters, etc). One of the values of keeping all the statements in a function at the same level of abstraction (or same level of detail, if you will) is that now each step has well defined inputs and outputs, so when you dig in, it's easier to think about each step in isolation, and it's also easier to think about the overall flow of the steps without getting into what's happening within each step. That is, if the high-level functions are well designed, it becomes easier to decouple these concerns. And you're right, injecting the data server would be more flexible and it would promote looser coupling, but that's a somewhat separate point.

Second, as far as I understand it the code in the first snippet is cohesive; there's no statement/expression that isn't providing value towards the result of the function - it's all related to the function.

When it comes to cohesion, the problem isn't really apparent if you're just looking at the contents of one function. Sure, a function might have several responsibilities (by some measure) and you may find that undesirable, but in terms of cohesion, that's not the core issue. Code that is not cohesive is spread out across two (or more) places - but you'll often only become aware of this in hindsight. What I mean by that is not that you've split some function in a weird way for funsies, but that your original design (that seemed to organize things neatly and logically) at some point turns out to be flawed in the sense that there are two (or more) places in the codebase (in distinct, supposedly fairly independent classes) that you always need to change together, because otherwise your software won't quite work. E.g., maybe you change a data structure or a parameter list here, or the order of operations there, and this in turn forces you to update the code in these other places. This means that all these affected parts of the codebase implicitly rely on various specific aspects of each other, and are thus coupled - and they cannot avoid this coupling precisely because the code is spread out in several places. That is, the code in question is strongly coupled, but is currently not cohesive (not bundled up together), and that's a problem. The reason why you'd want to try and reshape the code to a more cohesive design has less to do with making the code beautiful or correct or whatever, and more to do with the fact that as you work with this codebase, you're encountering this problem frequently enough that it has become a pain in the a**, excuse my French. (Ideally, you'd want to anticipate it a bit before that, but that requires experience.)

Cohesion is about rethinking the design so that the code that is related in the way described (code that changes together) is brought together to one place, expressed as one concept - e.g. a class (or a component) with a narrowly defined job. Note that, in order to do this redesign, you have to take a good look at what is fundamentally coupled to what and in what way, and what is only incidental - it's not a simple copy-paste thing where you just shuffle the code around. Your new, cohesive component isn't necessarily going to include all of the code that's required to actually get something done. Sometimes, it's going to be a class that requires a number of dependencies plugged into it in order to actually be able to run and do anything. But it encapsulates the rules that were encoded in several places before. Other times, it's going to be a class that only manipulates things internally, and leaves communication with the UI or the DB to its callers. In the redesign process, you have to come up with interfaces and data structures that it exchanges with these dependencies (or callers), such that you end up being able to put all the stuff that's coupled (the high level logic) on one side, and all the code that implements the details of that (code that manipulates general purpose libraries and such) on the other side. You'll also want to design your boundary interfaces and data structures around something that's less likely to change than the code on either side (the abstractions need to be stable), and arrange things in such a way that the more volatile code on either side changes for different reasons (otherwise, you'd still have non-cohesive components, just in a different form).

You'd have some idea of what these interfaces and data structures might be through your experience working on the project, familiarity with previous change requests, the roadmap, project goals, and even due to the fact that there are parts of the codebase that are difficult to work with. That is, it would have to be based to a large degree on your understanding of your specific problem domain. Note also that this is not a magic bullet: this redesign would decouple the code with respect to the kinds of changes that were difficult due to the way code was coupled before, but it might not work so well for other kinds of changes - and it's the job of the designer(s) to figure out if the tradeoff makes sense or not.

Now, a common symptom of non-cohesive code is that some functions end up having several responsibilities, because some of those responsibilities are spread out across several classes/functions that often also deal with other concerns. But again, separation of concerns at the level of one function is only half of the story - you have to take a broader view.

It's hard to come up with a compelling example while keeping the answer length reasonable, but one common situation is when you have a bunch of logic (in different parts of the codebase) that depends on the value of an enum, so you have a bunch of if statements (or switch statements) that typically need to be kept in sync. These might even go across layers. What you then might do is, you first adjust the code a bit so that each case looks kind of the same, so that the logic (or its sequence of high-level operations) becomes easy(er) to generalize into a base class (or an interface) - you then implement each case as a separate subclass, and you pull the aspects that were previously separated (but pertained to the same case) into the hierarchy. You'd then have some sort of factory that selects the appropriate subclass (confining the if statements to one place and to one concern), and from then on you'd work with the class polymorphically. Ideally, the types of changes that previously required editing several files and being careful not to miss a case etc, should now be mostly confined to one subclass. Again, not necessarily easy to get right, and it involves tradeoffs (as any design does).

1
  • 1
    You said - "refactoring to an appropriate level of abstraction". That really resonated. While SQL and ConfigurationManager could be reasoned to be cohesive, they are expressing a higher level concept (that of retrieving data) which is arguably a more cohesive coupling to use in their place. Something like businessRecords.load() perhaps. Commented Mar 10, 2024 at 2:27
2

This answer made more sense prior to the questions 3rd edit.

Coupling and cohesion concerns can get you pretty far here but shouldn’t be considered in a vacuum.

I know this isn't meant as perfect example code. But it would be irresponsible not to call out the issues. As I do I'll try to not let that distract us from the point of your question.

Firstly, am I wrong in assuming that just asking about cohesion and coupling with regards to this code is enough to motivate this refactoring

This isn't a refactoring. A refactoring has the same behavior. The second listing has a magic 5 in it that seems unrelated to the first listing.

I do see an overall theme of applying doSomeBusinessLogic() in a loop over some things. You've changed how those things were expressed as arguments to this function and how they are acquired.

Choosing to express the same idea with fewer arguments could easily be argued as motivated by coupling concerns. Data coupling is a form of coupling (one of many).

Second, as far as I understand it the code in the first snippet is cohesive; there's no statement/expression that isn't providing value towards the result of the function - it's all related to the function. Some would say that the SQL statements and ConfigurationManager are different concerns, but I don't understand that logic. Isn't a function always going to have this? It's the intent of the function that drives what cohesion means for the elements of the function.

Here we're running into a problem with names. Your example has dead poor names. "But it's just as example" doesn't cut it because the names are critical here. Cohesion operates at different levels of abstraction. In any application that works there is a level at which you could argue everything is cohesive because it does what it's supposed to do. Which misses the point.

We want cohesion all the way down. What level we're at is easiest to spot from the name. For example, the exact same toys that are cohesive in a "toy box" are not cohesive in a "doll house". So the name is important.

You gave us, "doSomething()". Sigh.

Maybe we can work with "BusinessLogicClass". A good name tells you what does and doesn't belong inside. So what belongs here is logic that supports business rules. If we assume doSomething() is a poorly named business rule it makes sense to focus on logic from the business and not from the tech decisions. Like "use sql".

That explains much of this change (still no explanation for the 5).

It seems like any dependency which isn't passed in is going to be considered strong coupling

I consider that hidden coupling. Yes eventually decisions have to be made. But parameterizing gives those decisions a clear place to be made. It also clearly expresses needs. It doesn’t eliminate coupling. It makes the coupling obvious.

or should I be using other principles, asking other questions?

Certainly there is more to understand here than simply cohesion and coupling. I've already gone on about naming. The principles are sign posts that warn you of hard to see costs. The problem with trying to focus on one principle and ignore everything else is the one principle becomes nonsense in that vacuum. Sorry, but there is no silver bullet here.

-1

Some would say that the SQL statements and ConfigurationManager are different concerns, but I don't understand that logic

It's a trite example, but what if your SQL licence expired and you switched the database to mongodb?

Are you really sure that having the data on SQL is an integral part of this objects function? Surely its "doing something, to multiple things" do the things have to be stored on an SQL db? It seems to me you could put them anywhere and still do something to them.

Now people will say "but I never change my database", sure, but its not about whether you will ever need to change your database, its an example which shows that this code is not cohesive.

In regards to coupling. You have this line:

int threshold = int.Parse(ConfigurationManager.AppSettings["threshold"]); 

You have coupled your doSomthing logic to the configuration manager. Probably there needs to be an appsettings file or something with a key in it called "threshold" or your app will crash when the function runs.

Is it good to require an xml file with some magic keyword in it for the doSomthing function to work? What if I want to call it more than once with more than one threshold? What if Microsoft retires ConfigurationManager and replaces it with SuperConfigManager?

continuing that to it's logical conclusion would mean passing every element of a function as a param which surely means the function is leaky abstraction.

Welcome to functional programming! Jokes aside, yes there are people who will say that one method per class with just one line of code is the ideal we should be aiming for. Anything more is a compromise.

But, rather than reaching for extremes, in your own judgement don't you prefer the refactored code? Basically you have just split up the SQL from the DoSomething into two functions rather than one. Don't you find it neater? Easier to change? to Test?

If you put threshold in as a construction parameter and read the config in your startup, would it make your code worse? Would you be able to remove some MockConfigManager code from your tests?

5
  • "don't you prefer the refactored code?" - yes I do, but I want to understand why. I think you make a subtle but interesting insight regarding the SQL ("do the things have to be stored on an SQL") ConfigurationManager. Yes they obviously contribute to the function and in that sense they're cohesive but they're an incidental concern, not a essential one. It seems as though these incidental concerns should be replace with the higher-level function they're fulfilling, in this case it's obtaining data. Commented Mar 10, 2024 at 11:00
  • Is that what you were getting at when you asked that question? That from a "narrative" (sorry, maybe a bad analogy but it's all I can manage at the moment) level, reading through the function the SQL and ConfigurationManager parts "dip" down into detail then back up regarding the business function. It's this fluctuating level of detail that makes a function harder to comprehend and therefore comprehension could be considered a trait of cohesion? Commented Mar 10, 2024 at 11:05
  • I guess what I'm getting at is cohesion and coupling are ways of describing what makes "nice" code, rather than rules from which "nice" code arises. If you don't like the "nice" then its natural to question the rules. But if you appreciate the "nice" then the rules become obvious from the practical effects that you like. Commented Mar 10, 2024 at 11:26
  • cohesion describes "if i want to find the bits that do X, i look here and have everything i need", loose coupling describes "if I want to switch a bit out, I can do it easily" Commented Mar 10, 2024 at 11:29
  • "not understanding the rules" often just means, "I havent encountered these problems" Commented Mar 10, 2024 at 11:31

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.