Revisions to Is source code generation an anti-pattern?

Post Made Community Wiki by maple_shaft♦

occurred Aug 17, 2020 at 12:55

added 340 characters in body

edited Nov 30, 2017 at 5:05

942
6
20

Generating Code, just once

Not all source code generation is a case of generating some code, and then never touching it; then regenerating it from the original source when it needs updating.

Sometimes you generate code just once, and then discard the original source, and moving forward maintain the new source.

This sometimes happens when porting code from one language to another. Particularly if one doesn't expect to want to later port over new changes in the original (e.g. old language code is not going to be maintained, or it is actually complete (e.g. in the case of some math functionality)).

One common case is that writing aOne common case is that writing a code generator to do this, might only actually translate 90% of the code correctly. and then that last 10% needs to be fixed up by hand. Which is a lot faster than translating 100% by hand.

Such code generatorgenerators are often very different to do this, might only actually translate 90%the kind of code generators full language translators (like Cython or f2c) produce. Since the goal is to make maintain code correctlyonce. and then that last 10% needsThey are often made as a 1 off, to be fixed up by handdo exactly what they have to. WhichIn many ways it is the next level version of using a lot faster than translating 100% by handregex/find-replace to port code. "Tool assisted porting" you could say.

Generating Code, just once, from e.g. a website scrape.

Another caseClosely related is if you generate the code from some source you don't want to accesses again. E.g. If the actions needed to generate the code are not repeatable, or consistent, or performing them is expensive. I am working on a pair of projects right now: DataDeps.jl and DataDepsGenerators.jl.

DataDeps.jl helps users download data (like standard ML datasets). To do this it needs what we call a RegistrationBlock. That is some code specifying some metadata, like where to download the files from, and a checksum, and a message explaining to the user any terms/coditions/what the licensing status on the data is.

Writing those blocks can be annoying. And that information is often available in (structured or unstructured) froms on the websites where the data is hosted. So DataDepsGenerators.jl, uses a webscraper to generate the RegistrationBlockCode, for some sites that host a lot of data.

It might not generate them correctly. So the dev using the generated code can and should check and correct it. Odds are they want to make sure it hasn't miss-scraped the licensing information for example.

Importantly, users/devs working with DataDeps.jl do not need to install or use the webscraper to use the RegistrationBlock code that was generated. (And not needing to download and install a web-scraper saves a a fair bit of time. particularly for the CI runs)

Generating source code once is no an antipattern. and it normally can not be replaced with metaprogramming.

Generating Code, just once

Not all source code generation is a case of generating some code, and then never touching it; then regenerating it from the original source when it needs updating.

Sometimes you generate code just once, and then discard the original source, and moving forward maintain the new source.

This sometimes happens when porting code from one language to another. Particularly if one doesn't expect to want to later port over new changes in the original (e.g. old language code is not going to be maintained, or it is actually complete (e.g. in the case of some math functionality)).

One common case is that writing a code generator to do this, might only actually translate 90% of the code correctly. and then that last 10% needs to be fixed up by hand. Which is a lot faster than translating 100% by hand.

Another case is if the actions needed to generate the code are not repeatable, or consistent, or performing them is expensive. I am working on a pair of projects right now: DataDeps.jl and DataDepsGenerators.jl.

DataDeps.jl helps users download data (like standard ML datasets). To do this it needs what we call a RegistrationBlock. That is some code specifying some metadata, like where to download the files from, and a checksum, and a message explaining to the user any terms/coditions/what the licensing status on the data is.

Writing those blocks can be annoying. And that information is often available in (structured or unstructured) froms on the websites where the data is hosted. So DataDepsGenerators.jl, uses a webscraper to generate the RegistrationBlockCode, for some sites that host a lot of data.

It might not generate them correctly. So the dev using the generated code can and should check and correct it.

Importantly, users/devs working with DataDeps.jl do not need to install or use the webscraper to use the RegistrationBlock code that was generated. (And not needing to download and install a web-scraper saves a a fair bit of time. particularly for the CI runs)

Generating source code once is no an antipattern. and it normally can not be replaced with metaprogramming.

Generating Code, just once

Not all source code generation is a case of generating some code, and then never touching it; then regenerating it from the original source when it needs updating.

Sometimes you generate code just once, and then discard the original source, and moving forward maintain the new source.

This sometimes happens when porting code from one language to another. Particularly if one doesn't expect to want to later port over new changes in the original (e.g. old language code is not going to be maintained, or it is actually complete (e.g. in the case of some math functionality)).

One common case is that writing a code generator to do this, might only actually translate 90% of the code correctly. and then that last 10% needs to be fixed up by hand. Which is a lot faster than translating 100% by hand.

Such code generators are often very different to the kind of code generators full language translators (like Cython or f2c) produce. Since the goal is to make maintain code once. They are often made as a 1 off, to do exactly what they have to. In many ways it is the next level version of using a regex/find-replace to port code. "Tool assisted porting" you could say.

Generating Code, just once, from e.g. a website scrape.

Closely related is if you generate the code from some source you don't want to accesses again. E.g. If the actions needed to generate the code are not repeatable, or consistent, or performing them is expensive. I am working on a pair of projects right now: DataDeps.jl and DataDepsGenerators.jl.

DataDeps.jl helps users download data (like standard ML datasets). To do this it needs what we call a RegistrationBlock. That is some code specifying some metadata, like where to download the files from, and a checksum, and a message explaining to the user any terms/coditions/what the licensing status on the data is.

Writing those blocks can be annoying. And that information is often available in (structured or unstructured) froms on the websites where the data is hosted. So DataDepsGenerators.jl, uses a webscraper to generate the RegistrationBlockCode, for some sites that host a lot of data.

It might not generate them correctly. So the dev using the generated code can and should check and correct it. Odds are they want to make sure it hasn't miss-scraped the licensing information for example.

Importantly, users/devs working with DataDeps.jl do not need to install or use the webscraper to use the RegistrationBlock code that was generated. (And not needing to download and install a web-scraper saves a a fair bit of time. particularly for the CI runs)

Generating source code once is no an antipattern. and it normally can not be replaced with metaprogramming.

added 8 characters in body

Source Link

edited Nov 30, 2017 at 4:45

Frames Catherine White

942
6
20

Generating Code, just once

Not all source code generation is a case of generating some code, and then never touching it; then regenerating it from the original source when it needs updating.

Sometimes you generate code just once, and then discard the original source, and moving forward maintain the new source.

This sometimes happens when porting code from one language to another. Particularly if one doesn't expect to want to reportlater port over new changes in the original (e.g. old language code is not going to be maintained, or it is actually complete (e.g. in the case of some math functionality)).

One common case is that writing a code generator to do this, might only actually translate 90% of the code correctly. and then that last 10% needs to be fixed up by hand. Which is a lot faster than translating 100% by hand.

Another case is if the actions needed to generate the code are not repeatable, or consistent, or performing them is expensive. I am working on a pair of projects right now: DataDeps.jl and DataDepsGenerators.jl.

DataDeps.jl helps users download data (like standard ML datasets). To do this it needs what we call a RegistrationBlock. That is some code specifying some metadata, like where to download the files from, and a checksum, and a message explaining to the user any terms/coditions/what the licensing status on the data is.

Writing those blocks can be annoying. And that information is often available in (structured or unstructured) froms on the websites where the data is hosted. So DataDepsGenerators.jl, uses a webscraper to generate the RegistrationBlockCode, for some sites that host a lot of data.

It might not generate them correctly. So the dev using the generated code can and should check and correct it.

Importantly, users/devs working with DataDeps.jl do not need to install or use the webscraper to use the RegistrationBlock code that was generated. (And not needing to download and install a web-scraper saves a a fair bit of time. particularly for the CI runs)

Generating source code once is no an antipattern. and it normally can not be replaced with metaprogramming.

Generating Code, just once

Not all source code generation is a case of generating some code, and then never touching it; then regenerating it from the original source when it needs updating.

Sometimes you generate code just once, and then discard the original source, and moving forward maintain the new source.

This sometimes happens when porting code from one language to another. Particularly if one doesn't expect to want to report over changes in the original (e.g. old language code is not going to be maintained, or it is actually complete (e.g. in the case of some math functionality)).