Go: Testing Frameworks and Mini-Languages
Newfoundland, Canada
If you’ve ever read the Go FAQ, it contains a section on testing frameworks that features some provocative language that I’ll quote verbatim below:
A related point is that testing frameworks tend to develop into mini-languages of their own, with conditionals and controls and printing mechanisms, but Go already has all those capabilities; why recreate them? We’d rather write tests in Go; it’s one fewer language to learn and the approach keeps the tests straightforward and easy to understand.
It’s worth reading that text in context, so take a minute to read it in the section it appears in. I’ll wait.
Reflecting
What was Rob Pike getting at with that text? Is there truth to it?
I often write about Go: it a language I enjoy a lot for its unique tradeoffs. I enjoy plenty of other languages, too, to be sure. I value them for what they are, so this isn’t really going to be a piece about Go so much as a reflection of my own experiences with many of them.
Contextualizing the Go Case
Having interacted with a wide community of software developers both professionally and informally, I think the text above’s proposition potentially chafes some of them, yet it delights others. This makes the proposition a point worth exploring.
In Go, we can see that proposition manifest itself in several places
The crux of this guidance orients itself around the following:
- Create your tests using the ordinary
package testing, leaving testing to theTestfunction - Do not hesitate to create small abstractions or helpers associated with setup or verification around the topic of testing
- When you need to create blackbox validation mechanisms (e.g., ensure outside implementation of an interface you provide are correct), do so in an opinionated way
Yet, we can find the opposite happening in various projects, including not exhaustively (in no particular order):
- Ginkgo: A behavior-driven development (BDD) framework
- Gomega: A collection of assertions (typically used in conjunction with Ginkgo)
- Testify: A collection of assertions, framework for mocks, suite management
- gocheck: A framework for fluent-assertions and suite management
- goconvey: A behavior-driven development (BDD) framework
Essentially all of these examples deviate from the guidance found in the FAQ. They introduce a mini-language, or they prevent the test from continuing, or they don’t leave testing to the testing function. Click on any of the links above to see what I mean. While each of these is built on Go the language, the mechanism for evaluating and defining tests is decisively not the language itself but rather various forms of indirection as modeled by local domain-specific language or similar convention of the library itself.
A talk recently given by Michael Stapelberg on large-scale changes (LSC) provided me a moment to deeply reflect on the cost of mini-languages versus native language features, though Michael never mentioned such things or the conversion costs. What he described in this talk on the new Protocol Buffer Opaque API was an immense undertaking: bringing the Google monorepo into conformance with the new API through programmatic refactoring and conversion of the code base. Were I in his shoes doing this work1, I’d have hated to have dealt with:
- not only doing a difficult change to production code but
- also having to adapt my rewriting tooling to consider peculiarities of testing-specific domain-specific languages (DSLs) versus ordinary Go code.
Relativizing with Other Languages
I’ll freely admit that Rob Pike’s counsel in the FAQ above was something that did not sit well with me when I first read it . That was until I had a rather revelatory moment in 2015, however:
I found myself working on a LSC involving a large corpus of Java servers of varying vintages. The LSC arose due to changing how the behavior of one of the company’s core types: optimizing the byte buffering growth strategy. Invariably I’d need to run the tests of all affected code to make sure I didn’t break anything, ‘cause of Hyrum’s Law, of course.
In principle, the change should have been invisible to all of the library’s users since my change was only the the internals. It wasn’t, however. My change broke about 50 disparate tests across the company’s repo. Off into the mines to see what broke and why …
Now, at this point my career, I was mostly a Java developer. I used Go recreationally, mostly outside of work, but a lot of core philosophical aspects of good Go had not sunk in super deeply. I only mention this to say: I was really cut in the cloth of the (Enterprise) Java ecosystem and idioms at this time and my tolerance for ornate, baroque bullshit was high. What I came across among the broken tests triggered table-flipping annoyance. The tests were implemented with a cacophony of three testing frameworks:
JUnit 3 to 4 was not terribly significant of a difference, but the APIs associated with the harnesses were not equivalent. Truth was, well, very different. I laud what Truth does, but the custom validators involve an awful lot of ceremony to write for what little economy they offer (IMO).
Having to investigate foreign code I didn’t write was one thing and something I could put up with, but to deal with fragmented structures for the test code was another. I kind of lost it: I couldn’t care what testing library or approach you use; just use one.
And from that point onwards, I was poisoned. Whenever I opened the source of a random Go project that I didn’t write and discovered it uses one of these non-standard library frameworks, I wince. It gives me flashbacks to this JUnit and Truth episode.
Aside: Years later, I realized this observation about ceremony economy problem with Truth was due to triggering reminders of using gocheck early in my career as a Go developer.
Case Study Example
If you read my piece on Config Management at Scale: The Gold Standard, you might recall that I posed an example of building a small tool to programmatically manipulate many leaf configuration files to facilitate a LSC.
What if we do something similar with Go code itself including its tests to see how varying testing disciplines scale/handle changes.
Note: There is nothing specific to Go with this exercise. I’m just using it to demonstrate these ideas.
| |
One day when reviewing the code that is using package metallurgy, we notice that metallurgy.TitaniumEndurance is being used incorrectly. Something referred to the endurance limit in a closed interval when it should be open, thereby potentially constituting a safety error in the simulation:
| |
The correct code should look like this:
| |
Let’s assume for the sake of argument that all usages of relative comparison (<, >) around metallurgy.TitaniumEndurance need to be amended to include the closed-ended variants (respectively: <=, >=).
We can extract the abstract syntax tree (AST) for the original code body using something like this:
| |
That emits:
| |
Now, we can build a small program that detects and modifies patterns like this:
| |
Note: This approach is extremely naive. Something more robust might consider data flow techniques like single static assignment (SSA) to see how the constant
metallurgy.TitaniumEnduranceis propagated in a program and used (e.g., assigned to a local variable).Moreover, a robust version would consider using
package dstto prevent the mangling of comment bodies as the AST is rewritten.Finally, a very robust version would potentially use
package packagesto load each package that is to be inspected and modified.
So, if you have followed along with this AST rewriter2, you’ll see that it can handle x > metallurgy.TitaniumEndurance and metallurgy.TitaniumEndurance < x. The tool will just as well rewrite files associated with testing, too.
Now, let’s create a simple test case for the sake of argument to demonstrate how easy it is to programmatically convert logic found in non-framework test code and those that use significant frameworks.
| |
Core observations:
- At maximum, two levels of indentation with the prevailing level being zero (counting relative to the inside of the test body itself).
- Four control flow statements.
Now, what happens if we instead express this logic using one of the aforementioned testing frameworks’ fluent assertions? Let’s look.
Note: Even though dot imports are discouraged, I am using them below since many of these domain-specific languages (DSL) were design with their use in mind.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21import . "github.com/onsi/ginkgo/v2" import . "github.com/onsi/gomega" var _ = Describe("Simulation", func() { It("fails if pressure exceeds tolerance", func() { s := Simulation{ Pressure: 1 * Megapascal, } var errored bool // Escalation increases — among other things — pressure by 1 MPa. // We should fail around 648 * Megapascal. for i := 0; i < 1000; i++ { if err := s.EscalateExperiment(); err != nil { errored = true break } } Expect(errored).To(Equal(true)) Expect(s.Pressure).To(BeNumerically(">", metallurgy.TitaniumEndurance)) }) })Structural observations of the test code:
- Maximum indentation level is four with the prevailing level being two.
- Two flow control statements.
- Major oddities:
- Core logic anchored in a variable declaration.
- Major logic nested inside of anonymous functions.
- Stringly typed operators present.
Not only must the original code rewriter handle the ordinary Go, but it must be able to recognize and handle
BeNumerically(">", metallurgy.TitaniumEndurance)and possibly other expressable forms in the DSL.With Testify:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21import ( "testing" "github.com/stretchr/testify/assert" ) func TestStress(t *testing.T) { s := Simulation{ Pressure: 1 * Megapascal, } var errored bool // Escalation increases — among other things — pressure by 1 MPa. // We should fail around 648 * Megapascal. for i := 0; i < 1000; i++ { if err := s.EscalateExperiment(); err != nil { errored = true break } } assert.Truef(t, errored, "after stress test, failure state = %v, want %v", got, want) assert.Greaterf(t, s.Pressure, metallurgy.TitaniumEndurance, "after stress test, simulation pressure = %v (below threshold: %v)", s.Pressure, metallurgy.TitaniumEndurance) }Structural observations of the test code:
- Maximum indentation level of two, predominantly at zero.
- Two flow control statements.
But to migrate this code, our rewriter would need special handling for
assert.Greaterfand to check howmetallurgy.TitaniumEnduranceis referred to in the various assert statements. Costs are growing …With gocheck:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42import ( "metallurgy" "testing" . "gopkg.in/check.v1" ) var GreaterThan = &greaterThanChecker{ &CheckerInfo{Name: "GreaterThan", Params: []string{"got", "want"}}, } type greaterThanChecker struct { *CheckerInfo } func (*greaterThanChecker) Check(params []interface{}, names []string) (ok bool, failure string) { if params[0].(int) > params[1].(int) { return true, "" } return false, "Must I?" } func Test(t *testing.T) { TestingT(t) } type StressSuite struct{} var _ = Suite(new(StressSuite)) func (*StressSuite) TestStress(c *C) { s := Simulation{ Pressure: 1 * Megapascal, } var errored bool // Escalation increases — among other things — pressure by 1 MPa. // We should fail around 648 * Megapascal. for i := 0; i < 1000; i++ { if err := s.EscalateExperiment(); err != nil { errored = true break } } c.Assert(errored, Equals, true) c.Assert(s.Pressure, GreaterThan, metallurgy.TitaniumEndurance) }Structural observations of the test code:
- Need to define minimally one suite type for each package, potentially each testing file.
- Need to define a type for each assertion form. At least the user-defined assertions are mostly just Go code.
- Identation level is primarily at zero with maximum level of two.
- Two flow control statements.
¯\_(ツ)_/¯This looks pretty hard to migrate. We’d need something aware of the morphologies of this testing framework to identifyGreaterThanandgreaterThanCheckerand create a semantic analogue to these for closed interval checks.With goconvey:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31import ( "metallurgy" "testing" . "github.com/smartystreets/goconvey/convey" ) func TestStress(t *testing.T) { Convey("Given a simulation within nominal execution parameters", t, func() { s := Simulation{ Pressure: 1 * Megapascal, } var errored bool Convey("When the simulation parameters are brought to the limit", func() { // Escalation increases — among other things — pressure by 1 MPa. // We should fail around 648 * Megapascal. for i := 0; i < 1000; i++ { if err := s.EscalateExperiment(); err != nil { errored = true break } } Convey("The simulation should have failed", func() { So(errored, ShouldEqual, true) }) Convey("The pressure should exceed titanium's limit", func() { So(s.Pressure, ShouldBeGreaterThan, metallurgy.TitaniumEndurance) }) }) }) }Structural observations of the test code:
- Overall characteristics seem similar to Gomega.
- Most indentation at level two; most extreme at level four.
- Two flow control statements.
Nevertheless, our project declared DSL bankruptcy trying to handle all of these forms. This DSL involves an awful lot of ceremony.
You might look at these hypotheticals I posed and scoff a bit. It is worth acknowledging that static code rewriting does have its place. In Go, it played a pivotal role facilitating forward porting of user code with the gofix tool. Remember: not everyone has the luxury to use an IDE’s find cross-references feature for a given symbol and walk through and repair them all manually. The cross-references may be too numerous for the tool, the cross-references could be in external repositories or code bases you don’t control, etc. I know it sounds hard to believe for the average developer, but there are codebases and projects that are too large for even the beefiest machine and IDE to index, grok, and report — without specialized machinery.
So consider the original statement in the FAQ again:
mini-languages of their own, with conditionals and controls
Is it really reasonable, then, for each static rewriter to have to consider the multiplicity of these various testing DSLs and the API morphologies they expose? I’ll let you be the judge of that. To me, this very heavily underscores:
but Go already has all those capabilities; why recreate them?
In my base testing case example using native Go idioms, the example clocks in at 25 lines. Ginkgo with Gomega and Testify clock in at 21 lines. Are the four lines saved worth the complexity they added? When you consider the needs of programmatic maintenance, color me unimpressed. That’s just my preference though; see below for the discussion on code golf.
Recall the situation I originally posed about JUnit 3, 4, and Truth. Would you liked to have built a static code rewriter to have handled that? It’d be no different from what I described here with these various testing frameworks for Go. But per the ecosystem conventions found in the Java world, it seems that testing DSLs are a bit unavoidable.
Returning to the Heart of the Matter
So what really explains the original contention between the text found in the FAQ and the various style guides versus these various testing packages?
Is it dogma? Maybe. Then again, perhaps we should critically examine and admit something: there is no universal truth or set of values, even in engineering, and any attempt to claim that there is would be sophomoric. Let me offer a couple of propositions of my own (again, not exhaustive):
- Some developers like their code to embody code golf; others don’t.
- Some developers like their code to be so highly factorized and DRY that it is parched; others don’t.
- Some developers like clarity and seek to use the least complicated solution required by the problem (see least mechanism guidance); others don’t. This often anchors on how much abstraction to use and when.
- Some developers like things that are verifiably correct to the n-th degree; others don’t.
- Some developers prefer iteration speed over everything else; other’s don’t.
- Some developers want a high-degree of low-level control of the language and its interactions with the operating system and machine; other’s don’t.
I’ll posit that there is a place for each of these values depending on the circumstance, and one probably seldom wants to maximize for one at the expense of others. The reality probably looks more like a radar chart: a lot of one or two values and a little bit of the rest and potentially none of one.
And the essence here is acknowledging that beyond requirements there are multiple different psychographic profiles of software engineers to be found in the wild. The mature approach is to accept the differences in values versus to fight it with tribalism.
To show you what I mean, I took the liberty to model up three psychographic profiles to show you how I understand various developer communities found in the wild:

There’s nothing scientific about it; it’s just my own reductive summary driven from professional experience.
Perhaps it is fair to say that some programming language and library ecosystems embody/nourish the values of one psychographic profile or another; while repulsing others. Or, is it the other way around: the language ecosystem drives and informs the values? I really don’t know. It could be a little bit of each.
Either way, the topic of psychographic profiles is something worth exploring in greater detail at a later time.
Michael and I were on the same team for over six years and did very similar large changes many times prior. These LSC were anything but simple, but they in no way approach the complexity of the one Michael described in this talk that Michael gave. The situation Michael described was extreme. So the main point of my calling out his experience with this code migration: I have my own first-hand experience and knowledge of what this kind of extreme programmatic refactoring work is like, and I can strongly empathize with him and the pain and difficulty he has encountered along the way. It strongly leads me to question the value in preemptive over-engineering of system precisely for the cost it imposes on the code gardeners around us all. ↩︎
This type of a rewrite could probably be done also with an example-based refactoring tool. Don’t let that fool you, however. Example-based refactoring tools are, in my opinion from my own experience, of little utility due to needing a hair more sophistication than matching a simple reference pattern. I’ll also add that as you look over the examples where the testing DSL is used, you’ll quickly see that similar example-based approaches are likely to break here due to inflexibility. There’s no free refactoring. ↩︎
- Dogma (2)
- Go (17)
- Psychographics (1)
- Testing (6)