Langium 1.0: A Mature Language Toolkit

Fri Dec 16 2022 by Dr. Miro Spönemann

Langium is a toolkit for domain-specific languages (DSLs) that is fully built with TypeScript. It provides a text parser with integrated cross-reference resolution and a language server for smart editor support. You can use Langium to build custom text languages that can be automatically processed. For example, you can execute user-defined behavior with an interpreter, or generate executable code, documentation, or other output formats. Such a custom language can

  • empower non-technical persons to provide data, logical constraints, executable behavior, or similar concepts,
  • greatly reduce the manual effort to create artifacts such as source code or documentation,
  • strengthen the communication between stakeholders with a formal language.

We published the first release of Langium in June 2021. The team behind this new language toolkit is proud to announce the availability of version 1.0 today. With this first major version, we want to signal that the project has reached a substantial level of maturity. If you are starting a new domain-specific language, you should consider building it with Langium to make use of its richness of features, simple integration and modern technology stack.

This post highlights the main achievements of the Langium project.

Langium Runs in the Web

The implementation of a Langium-based language is compiled to JavaScript and can be bundled to run in a web browser. This is particularly useful when you want to include a text editor in your website and provide smart support for your custom language in that editor. The text content can be processed by Langium to obtain a JSON data structure, which can then be used in other parts of your web application: visualize data, process it in the backend, show results of code execution / simulation, etc.

Langium Runs in VS Code

Langium has built-in support for the Language Server Protocol and can be easily integrated in VS Code, Eclipse Theia or other tools and IDEs that support this protocol. Such a tool integration facilitates the handling of workspaces with multiple documents (files). Langium can efficiently link references between different documents, empowering users to specify complex information in a well-structured way. Both VS Code and Theia can be used as desktop applications on your own machine or in the cloud with provisioned workspaces.

Declarative and Customizable

The first task when building a language is to define its syntax and the structure of the semantic information. These two aspects are closely coupled with each other and are not always fully clear from the beginning. For these reasons, Langium offers a grammar language to specify syntactic rules along with their semantic structure. The following example describes an entity that starts with the keyword entity followed by an identifier, an optional supertype (extended entity), and a list of features wrapped in curly braces.

Entity:
    'entity' name=ID ('extends' superType=[Entity])? '{'
        (features+=Feature)*
    '}';

The corresponding semantic model (the abstract syntax tree) has properties name, superType and features, which can be accessed in TypeScript to implement validation rules, a code generator and other processing services. Langium generates TypeScript type declarations to use the resulting model in a type-safe manner. This works particularly well for the initial rapid prototyping phase, but a more explicit approach is preferable in more mature projects. This can be done by declaring the types directly in the grammar language:

interface Entity {
    name: string
    superType: @Entity
    features: Feature[]
}

The example above includes a cross-reference written as superType=[Entity] (and @Entity in the type declaration), which means that we can use the name of another declared entity at this text position. Langium has a sophisticated framework to resolve such cross-references. It includes an index for fast symbol lookup across multiple documents and even supports references across different languages.

Every aspect of your language can be customized on a fine-grained level through dependency injection. This enables us to ship a large amount of out-of-the-box functionality where all defaults can be overridden to match the expected behavior of your language.

State-of-the-art Parser Technology

Langium uses the Chevrotain library to create a parser for your language. The parser is created in memory from the grammar declaration using the API provided by Chevrotain. This already gives us a powerful and fast parsing facility. We have created an extension of Chevrotain that increases the expressivity of the parser by implementing the ALL(*) algorithm. This is the same lookahead algorithm used by the well-known ANTLR parser.

Build your Project with Langium

The documentation of Langium is already quite complete. It includes instructions on how to get started, reference documentation for the grammar language and other aspects, and a collection of guides and tutorials about various topics. A good starting point for your Langium-based project is the Yeoman generator as described in the documentation.

The website also features a grammar language playground with a grammar editor, another editor for the language specified in the grammar, and a view of the resulting AST data structure. This can be very useful to experiment and to understand how Langium creates AST data from a given document of your language.

If you have already implemented a language with Xtext and want to evaluate the migration to Langium, you can use the Xtext2Langium generator fragment in your MWE2 workflow. It generates Langium grammar rules and type declarations from the Xtext grammar rules and EMF metamodels. Unfortunately, the custom language code written in Java or Xtend cannot be migrated automatically, but needs to be rewritten in TypeScript. However, many of the underlying concepts in Xtext and Langium are similar, so large parts of the migration don't require a conceptual redesign. And in our experience we often found functionality that is easier to realize with Langium and TypeScript compared to Xtext and Java.

What's to Come

The 1.0 version of Langium that is now available is a cornerstone for creating toolchains based on custom languages. But it's of course not the end of our investment into this project – our team is full of ideas for additional features and to further simplify the development of new languages.

  • Enhanced grammar rules: Parsers based on the LL approach do not allow left-recursive rule calls. We want to enhance this by transforming rules similarly to how ANTLR 4 does it. Other ideas about the grammar language are explicit constructs for binary operators and lexer modes.
  • Generator tracing: Langium already includes utilities to generate code in arbitrary text formats. The next step in this regard is to add information to trace source elements (AST nodes) to output text ranges. This will enable the automatic creation of source maps, which are the basis for debugging your DSL code.
  • Serializing: Some use cases require the ability to generate DSL code from an in-memory data structure like a programmatically constructed AST. While this can already be done with a hand-written code generator, it would be very useful to have a generic serializer. This component will create text that is compliant to the grammar of your language.
  • Type system support: It is already possible to create a Langium-based language with a complex type system, for example involving binary operators, lambda expressions and type parameters. But this is not an easy task, as it requires some experience or at least theoretical knowledge about this topic. We will add a component to simplify the implementation of type systems in Langium, with features like type inference and assignability checking.

The next level of language engineering is here. Start your DSL project with Langium today and get in touch with us if you need profesional support.

Read More