Revisions to Is source code generation an anti-pattern?

Post Made Community Wiki by maple_shaft♦

occurred Aug 17, 2020 at 12:55

Corrected book title.

edited Dec 4, 2017 at 13:45

642
3
8

Sussmann had much interesting to say about such things in his classic "Structure and organisationinterpretation of computer programs", mainly about the code-data duality.

For me the major use of adhoc code generation is making use of an available compiler to convert some little domain specific language to something I can link into my programs. Think BNF, think ASN1 (Actually, don't, it is ugly), think data dictionary spreadsheets.

Trivial domain specific languages can be a huge time saver, and outputting something that can be compiled by standard language tools is the way to go when creating such things, which would you rather edit, a non trivial hand hacked parser in whatever native language you are writing, or the BNF for an auto generated one?

By outputting text that is then fed to some system compiler I get all of that compilers optimisation and system specific config without having to think about it.

I am effectively using the compiler input language as just another intermediate representation, what is the problem? Text files are not inherently source code, they can be an IR for a compiler, and if they happen to look like C or C++ or Java or whatever, who cares?

Now if you are hard of thinking you might edit the OUTPUT of the toy language parser, which will clearly disappoint the next time someone edits the input language files and rebuilds, the answer is to not commit the auto generated IR to the repo, have it generated by your toolchain (And avoid having such people in your dev group, they are usually happier working in marketing).

This is not so much a failure of expressiveness in our languages, as an expression of the fact that sometimes you can get (or massage) parts of the specification into a form that can be automatically converted into code, and that will usually beget far fewer bugs and be far easier to maintain. If I can give our test and configuration guys a spreadsheet they can tweak and a tool that they then run that takes that data and spits out a complete hex file for the flash on my ECU then that is a huge time saving over having someone manually translate the latest setup into a set of constants in language of the day (Complete with typos).

Same thing with building models in Simulink and then generating C with RTW then compiling to target with whatever tool makes sense, the intermediate C is unreadable, so what? The high level Matlab RTW stuff only needs to know a subset of C, and the C compiler takes care of the platform details. The only time a human has to grovel thru the generated C is when the RTW scripts have a bug, and that sort of thing is far easier to debug with a nominally human readable IR then with just a binary parse tree.

You can of course write such things to output bytecode or even executable code, but why would you do that? We got tools for converting an IR to those things.

Sussmann had much interesting to say about such things in his classic "Structure and organisation of computer programs", mainly about the code-data duality.