An architecture for efficient language engineering
We recently introduced Fastbelt as a language engineering framework designed for use cases that require high throughput, low latency and a small memory footprint.
On one hand, this requires a rethinking about how the framework works and especially what it actually does. On the other hand, Fastbelt should offer a similar amount of out-of-the-box features as the other well-known language engineering frameworks—such as Xtext and Langium—while also delivering the same amount of flexibility and customizability that is required to implement all of the semantics of a given language.
This article outlines the core ideas that shape the architecture of Fastbelt.
Abstracting from text
Language engineering frameworks and parser libraries mostly follow the same general logic:
- Take a piece of text, usually one file on disk or your editor.
- Tokenize the string into individual, smaller strings, like keywords, identifiers, operators, delimiters, etc.
- Use a parser to turn those tokens into two syntax trees: The abstract syntax tree (AST) and the concrete syntax tree (CST).
The AST holds the semantic information of the parsed input. It allows us to easily and quickly answer questions about our data model. What is the name of a variable? Where does this reference point to? As the name implies, it’s very much abstract. Details that carry no semantic meaning—such as whitespace, comments, most keywords and the exact positions of tokens in the source text—are stripped out. However, any framework that enables adopters to build language servers requires some way to map positional text information to the AST and back.
This is traditionally the job of the CST. Instead of following the semantics of the language, the CST is a simple tree of composite and leaf nodes that hold information about the input text (mainly start and end offsets). More importantly, the CST links this information to the respective AST node and the part of the grammar that was used for parsing of the input.
Fastbelt diverges from the pattern above in two places. The first is tokenization. In most high-level programming languages, breaking a string into smaller substrings is expensive, because each substring is a new string that has to be allocated and copied. Go lets us avoid this entirely: a slice of an existing string shares the same underlying memory, so creating a token costs little more than the size of a pointer.
The second is what happens during parsing. Unlike other frameworks, Fastbelt does not produce a concrete syntax tree at all. As the parser consumes the token stream, each token is linked to the AST node that consumed it, and that node keeps a pointer back to the token. This gives us the bidirectional positional lookup a language server needs—mapping text offsets to AST nodes and back—without a second tree to maintain it. The AST holds these token pointers directly, not the token values; the actual string values are materialized only on demand.
This matters because the CST is where the memory goes. As a file grows—more lines of code, more deeply nested constructs—memory consumption grows in two ways: more AST nodes are produced, but so are more CST nodes. Langium and Xtext compound this by storing additional model metadata on the CST, which inflates it further. Aside from the token metadata described above, Fastbelt stores nothing extra.
The trade-off is deliberate: We’ve designed Fastbelt not to be a modeling framework. What we get in return is significant. In our own application projects, we’ve seen CSTs account for more than 70% of the total memory a workspace occupies. In Fastbelt, that cost is nearly gone. The chart below compares the memory footprint of the same workspace loaded by Fastbelt, Xtext, and Langium. It’s is significantly slimmer than any other language engineering framework we’ve worked on:
Parallelized primitives
Fastbelt is built from the ground up to be highly parallelized, with most of its features designed to be thread-safe from the start. Keeping everything consistent is the job of a workspace-wide lock. While the workspace build is running, the lock is held for writing, so the build can mutate the model exclusively. Once the build finishes, the lock is downgraded to read access, allowing many readers to inspect the model concurrently without further synchronization.
During that build, work is performed in parallel wherever it can. Each file is tokenized and parsed independently, and the per-file symbol tables are constructed in parallel as well. Reference resolution is always one of the most expensive parts of any build, especially with complicated custom scoping rules. Fastbelt does that in parallel too, even within a single file.
The following chart shows the performance gains we obtain from parallelization of the build process across 16 CPU cores. We go from ~7MB/s on a single core to ~45MB/s on all 16. The chart shows we can likely still improve on our efficiency here, as the improvements flatten out once we reach ~10 cores. Nevertheless, we’re quite happy with the framework’s current performance.
Going beyond the build process: Once the lock switches to read-only, the same parallelism extends to everything that only reads the workspace. LSP requests, validations, and any other process that needs to inspect but not modify the model can all run simultaneously.
An API without escape hatches
Both Langium and Xtext are built around a class-based, template-method style of customization. They expose interfaces, but not every piece of behavior is fully captured by them. When a particular behavior is meant to be customizable, the usual answer is a protected method on some base class that adopters override.
This works, but it blurs the line between what is API and what is implementation detail. In practice, everything becomes API: any method an adopter can reach is something they might override, and therefore something the framework has to keep stable.
That makes the surface area enormous and the contract fuzzy.
Go’s struct-based type system doesn’t lend itself to that pattern, and we consider this a feature rather than a limitation.
Without inheritance and protected overrides, the only way to make behavior customizable is to design an interface for it.
This forces us to be explicit about what is and isn’t API. We’ve adopted a few principles to keep these interfaces sound:
- An interface implementation shouldn’t call its own interface methods. This keeps the behavior of each method self-contained and makes implementations safe to embed and partially override later, without the surprising re-entrancy that plagues template-method designs.
- No protected methods means clearer interfaces. Every behavior that is meant to be overridden gets a dedicated interface method. There’s no ambiguous middle ground between “public API” and “internal helper”—if it’s customizable, it’s in an interface; if it isn’t, it’s unexported.
- Customizable behavior leans on explicit design patterns. Rather than offering a grab-bag of override points, Fastbelt’s LSP features typically let adopters supply a
Strategyfor a given behavior. The extension points are named and explicitly documented.
This philosophy also extends to the parser. Because Fastbelt doesn’t integrate with a third-party parser library, we control the entire parsing API ourselves. There are no complicated workarounds to reach into parser internals. The interfaces for error recovery, error messages, and lookahead all interact directly with Fastbelt’s own types. Adopters of the framework never have to work around or wrestle with the types of an underlying library. This also means there’s no compromise in features. Every parser feature is genuinely supported and exposed, with the framework itself being built on those very same features.
Benchmark everything
We consider performance a core feature of Fastbelt, and we treat it as such. We not only test Fastbelt’s features to work just as you would test features in any piece of software, but to work fast. Benchmarks run in every CI build, and contributors are notified directly in their pull requests whenever a change introduces a performance regression.
This is less a technical decision than an organizational one, and it pays off on two fronts. It keeps regressions from slipping in unnoticed, and it lets us explore the performance impact of a change with confidence—we can immediately see how a single change ripples across many different parts of the framework.
To show a concrete example of how this helped us during the development of Fastbelt’s first version: Back when we started work on the framework, we used Go’s native regexp package to tokenize non-keyword terminals. This was predictably slow and was only able to tokenize ~30MB of text per second. In January, we added a code generator that produced a switch-driven state machine from a given regular expression. This massively improved the lexing performance by ~170% to 80MB/s. Even more importantly, we had much more room for improvements, now that we controlled the entire lexing pipeline. A few changes later (#42, #64, #88), we were able to get this up to 180MB/s. We can probably improve this even further—we already have a few ideas in mind—but this is good enough for now.
Try it out now!
Today, we released our first official version of Fastbelt. We encourage developers to give it a shot and appreciate feedback via a discussion or issue on our GitHub repo.
About the Author
Mark Sujew
Mark is the driving force behind a lot of TypeFox’s open-source engagement. He leads the development of the Eclipse Langium and Theia IDE projects. Away from his day job, he enjoys bartending and music, is an avid Dungeons & Dragons player, and works as a computer science lecturer at a Hamburg University.


