May 25th 2023

Code generation for Langium-based DSLs (1/3)

Christian SchneiderChristian Schneider

Hey there, this is Christian. Today I’m writing about an important aspect of DSLs and DSL tooling: Code generation. I guess you agree on my experience that DSLs on their own are advantageous for formalizing, specifying, and communicating content because of their domain specific nature. The significant gain in productivity, however, is achieved once you can derive implementation code from your specified content. Starting with this post, I’ll show some specific code generation support we published as part of Langium v1.0. This article is the beginning of a series of three.

In essence code generation is concatenating string segments. Some of those segments are fixed and added every time, some of them are variable depending on the input. Compared to other languages JavaScript addresses this aspect already with its Template literals. That’s nice, but that’s not the end. Template literals may be evaluated by custom functions called Tag Functions turning the templates into Tagged Templates. Such functions take a template literal as arguments and can do anything with it. Very cool! 😎

Running example

Ok, let’s get to practice. The running example in this post uses Langium’s Arithmetics example language implementation. The language allows to specify arithmetic expressions. That includes assignments to variables and definitions of custom functions, which can then be referenced in subsequent expressions. The example input for our code generator will be the following:

MODULE priceCalculator

DEF materialPerUnit:               100;
DEF laborPerUnit:                  200;
DEF costPerUnit:                   materialPerUnit + laborPerUnit;

DEF expectedNoOfSales:             200;
DEF costOfGoodsSold:               expectedNoOfSales * costPerUnit;

DEF generalExpensesAndSales:       10000;
DEF desiredProfitPerUnit:          50;
DEF netPrice:
    (costOfGoodsSold + generalExpensesAndSales) / expectedNoOfSales + desiredProfitPerUnit;

DEF vat:                           0.15;

DEF calcGrossListPrice(net, tax):
    net / (1 - tax);

calcGrossListPrice(netPrice, vat);

This module describes a very simple calculation of product prices. It consists of assignments of constant values as well as computed values to variables. Finally, a function named calcGrossListPrice is called with the arguments netPrice and tax, which is been defined previously. The following picture illustrates the abstract syntax tree (AST) created by Langium while parsing the input.

AST parsed from the calculation example

Generating code

Now let’s translate this into pure JavaScript code. For composing the desired snippets the generator needs to visit the AST and inspect the corresponding parts. Let’s define the generator entry function by means of a plain JavaScript template as follows, it contributes some static framing code:

function generateModule(root: Module): string {
    return `
········"use strict";
········(() => {
········  ${generateModuleContent(root)}
········})
    `;
}

Let’s also define generateModuleContent(Module) and implement it as follows, this time with classic string concatenation because of the required loop:

function generateModuleContent(module: Module): string {
    let result = `let ${lastComputableExpressionValueVarName};\n`;
    for (const s of module.statements) {
        result += generateStatement(s) + '\n';
    }
    result += `\n`
    result += `return ${lastComputableExpressionValueVarName};`;
    return result;
}

Problem 1: With multi-line template literals the generated code will contain the whitespace indicated by ········ in generateModule(). I added that whitespace to make the generator satisfy our formatting rules.
Downside: It clutters our generation result.

Problem 2: When visiting lists we have to insert line breaks after each statement’s generation snippet. Besides, we have to take care of line breaks before and after the for loop. Finally, there’s the issue of \n vs. \r\n.
While this is quite simple here, it can get quite difficult in the presence of conditionally appended segments and/or multiple loops in a row.

Problem 3: The string concatenation in generateModuleContent() does not pay any attention to the indentation preceding the call of this function within generateModule().

The generated code will look as follows, depending on the implementation of generateStatement():

········"use strict";
········(() => {
········  let lastComputableExpressionValue;
const materialPerUnit = lastComputableExpressionValue = 100;
const laborPerUnit = lastComputableExpressionValue = 200;
    .
    .
    .

return lastComputableExpressionValue;
········})
....

This example nicely illustrates how indentation in generated code can go wrong. The surrounding static code is indented but shouldn’t, while the enclosed statements are not but should be.

Smart tagged templates

Solution A: Langium provides a tag function called expandToString that contributes smart whitespace handling.

Inserting a reference of expandToString directly before the opening back tick in line 2 of generateModule(Module) turns the subsequent template into a tagged template, see generateModule2(Module):

import { expandToString } from 'langium';

function generateModule2(root: Module): string {
    return expandToString`
········"use strict";
········(() => {
········  ${generateModuleContent(root)}
········})
    `;
}

This gets us the following generation result:

"use strict";
(() => {
  let lastComputableExpressionValue;
  const materialPerUnit = lastComputableExpressionValue = 100;
  const laborPerUnit = lastComputableExpressionValue = 200;
    .
    .
    .

  return lastComputableExpressionValue;
})

So what does expandToString do? Several things! 😉 … that are

  1. identification and trimming of the common leading whitespace among all non-empty lines of the template
  2. identification of the offset of expressions wrapped in ${}
  3. trimming of single leading and trailing line breaks
  4. consolidation of the line breaks within the template

Thus, Feature 1 eliminates the whitespace within generateModule2(Module) indicated by ········, which makes the static code to start at offset zero, i.e. being generated without any indentation. Feature 2 applies the additional indentation of ${generateModuleContent(root)} within its line (␣␣) to each additional line within the replacement string. For our example this yields properly indented statement implementation snippets, while the indentation is specified just once. Feature 3 drops the initial line break immediately following the opening back tick, as well as the trailing line break including the indentation of the closing back tick. This is less relevant for generator entry functions like generateModule2(Module) but very relevant for generation functions being called from within other tagged templates like generateModuleContent(Module), as the surrounding line breaks will be determined by the calling templates. Last but not least, Feature 4 makes all the line breaks to match the system line break flavor. This is desirable as the generated code is usually persisted to disk and expected to be in line with the platform.

Now let’s have another look at generateModuleContent(Module):

function generateModuleContent(module: Module): string {
    let result = `let ${lastComputableExpressionValueVarName};\n`;
    for (const s of module.statements) {
        result += generateStatement(s) + '\n';
    }
    result += `\n`
    result += `return ${lastComputableExpressionValueVarName};`;
    return result;
}

After rewriting the loop as a map;join expression we can implement the string concatenation with a tagged template and expandToString as follows:

function generateModuleContent2(module: Module): string {
    return expandToString`
        let ${lastComputableExpressionValueVarName};
        ${ module.statements.map(generateStatement).join('\n') }

        return ${lastComputableExpressionValueVarName};
    `;
}

For sure you have noticed the \n as separation character of the join operation. Don’t worry about that! Because of Feature 4 expandToString will take care about that and replace single \n by \r\n if executed on MS Windows machines.

The entire output for our price calculation example from above might look as follows, I skip missing generator parts here.

"use strict";
(() => {
  let lastComputableExpressionValue;
  const materialPerUnit = lastComputableExpressionValue = 100;
  const laborPerUnit = lastComputableExpressionValue = 200;
  const expectedNoOfSales = lastComputableExpressionValue = 200;
  const costPerUnit = lastComputableExpressionValue = materialPerUnit + laborPerUnit;
  const costOfGoodsSold = lastComputableExpressionValue = expectedNoOfSales * costPerUnit;
  const generalExpensesAndSales = lastComputableExpressionValue = 10000;
  const desiredProfitPerUnit = lastComputableExpressionValue = 50;
  const netPrice = lastComputableExpressionValue = ((costOfGoodsSold + generalExpensesAndSales) / expectedNoOfSales) + desiredProfitPerUnit;
  const vat = lastComputableExpressionValue = 0.15;
  const calcGrossListPrice = (net, tax) => net / (1 - tax);
  lastComputableExpressionValue = calcGrossListPrice(
      netPrice, vat
  );

  return lastComputableExpressionValue;
})

Beyond the concatenation of the plain keywords, identifiers, and operators, my generator inserts the canonical bracketing compound expressions, like in the calculation of the value of netPrice. Besides, function calls like that of calcGrossListPrice are generated on multiple lines, making the arguments better readable.

Find the entire example at https://github.com/langium/langium-in-browser-codegen-example.

Conclusion: expandToString serves us greatly if we want to implement our code generator with JavaScript template expressions instead of plain string concatenations, and if we want to have properly formatted generated code as well as properly formatted templates.

Remark: It’s important to indent your template lines consistently, esp. do not mix tabs and spaces! VS Code brings a convenient option to display whitespace characters called Toggle Render Whitespace.

The Toggle Render Whitespace command in VS Code

What’s next?

Advanced template-based code generation is nice but has some limitations. Sometimes we want to skip line breaks if the substitution(s) within certain lines resolve to empty strings. Sometimes we need to adjust the previously generated code because of size restrictions or other circumstances. Last but not least we may want to associate the generated code with definitions in the DSL source code, and, e.g., allow back and forth navigation between source code and generated code.
How cool would it be to make a Langium-based DSL debuggable? 😉
I hope you enjoyed this read. In the next part I’m gonna present our Solution B.
Stay tuned!

About the Author

Christian Schneider

Christian Schneider

Christian is a seasoned DSL enthusiast with a decade of expertise in language tooling and graphical visualizations. While striving for innovative and sustainable solutions, he still pays attention to details. Quite often you’ll find Christian nearby the sea, with the seagulls providing a pleasant ambience to his meetings.