Langium 1.0: A Mature Language Toolkit
posted on Fri Dec 16 2022
Langium is a toolkit for domain-specific languages (DSLs) that is fully built with TypeScript. It provides a text parser with integrated…
Langium has its own playground, where you can now test the conversion of text into a well-defined model. It runs without support from the backend.
Langium is a tool to create languages (and feature-rich editors). And languages are raw texts that are both human-readable and machine-readable. Think of XML, C source code or CSS.
But to transform a text into an explicit structure, there is some work that needs to be done. A text transformation in Langium is described by a grammar.
Let's take the following text as an example (playground link):
grammar HelloWorld
entry Model:
(persons+=Person | greetings+=Greeting)*;
Person:
'person' name=ID;
Greeting:
'Hello' person=[Person:ID] '!';
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
A grammar – in the context of compiler construction – is a set of rules that spreads out all possible input texts that you want to recognize.
The shape of a rule can be compared with a function definition in your favorite imperative programming language: the left part preceding the colon is the signature of the function, the right part is its body.
The rule body consists of calls to other rules (by rule name) or terminals which represent actual text that gets recognized. Let's have a look at the rule Person
.
Person: 'person' name=ID;
Here: Person
is the name of a rule, 'person'
is a text and name=ID
is the call of the terminal rule ID
whose parsed text is assigned to a property name
.
Let's say we have the input person Markus
. The result of applying Langium is a model that contains a Person
object with the name Markus
.
The Langium playground can help you to sketch a text-to-model transformation (parsing). And you can prototype and test Langium immediately without installing anything.
All you need to know is the purpose of each panel inside of the playground.
person Markus
.Every change in the grammar triggers a rebuild of the parsing toolchain. And every content change triggers a new parsing attempt of the content, which populates the syntax tree in the third panel.
So, just drop some lines into the content panel. Just take one instance of your new language. Then, you can start your grammar and improve it until all errors are gone. Fantastic, isn't it?!
You are now able to test and share your ideas. The playground will help you to understand single abstract syntax trees.
Knowing the syntax tree is the key to master validators, generators and even interpreters.
Here is the plan for creating a JSON parser:
boolean
, true
and false
.list
to hold booleans.object
value.number
, null
, undefined
).Let's do it!
boolean
values (solution)Simple character data like false
or true
can be described as terminals. So, just write:
terminal BOOLEAN: /true|false/;
Every grammar has an entry parser rule, to signal where the language starts from. Please add:
entry Value: BooleanValue;
BooleanValue: value=BOOLEAN;
The assignment value=BOOLEAN
takes the content of the boolean value and assigns it to a property value
in a surrounding model typed as BooleanValue
.
If you want to ignore whitespace add this line as well. The hidden
keyword filters out all whitespace from the stream of terminals during the parsing process, so you do not have to care about it.
hidden terminal WS: /\s+/;
If you fill in true
to the content panel, you will get a nice syntax tree with your value true
in it.
list
values (solution)Out goal now is to add list
values. List contain arbitrary values (like lists and booleans for now). So, we add a definition for the list, first:
List:
'[' (elements+=Value (',' elements+=Value)*)? ']';
Mind the elements+=Value
assignments. The +=
creates a list of values for you.
Afterwards, you only need to reference it in the Value
rule by adding an alternative with the or
operator |
:
entry Value: BooleanValue | List;
Now, test it with a list of booleans like [true, false]
.
Amazing!
object
values (solution)For an object we need a string
value to model the name of each entry. At the same time we can introduce a new Value
rule alternative.
StringValue: value=STRING;
terminal STRING: /"[^"]*"/;
Now you can add a series of name-value pairs:
NameValuePair: name=STRING ':' value=Value;
And finally, add the Object
rule:
entry Value: BooleanValue | List | StringValue | Object;
Object:
'{' (nameValuePairs+=NameValuePair (',' nameValuePairs+=NameValuePair)*)? '}';
Test it!
{
"boolean": false,
"string": "abc",
"LIST": ["", false],
"object": {"hello": "world"}
}
Great! Beautiful :).
number
null
terminal NUMBER returns number: ...
Here is the solution.
Happy coding!