May 27th 2024

Native Notebook support for Eclipse Theia

Mark SujewMark Sujew

With Jupyter Notebooks, data scientists and developers using Python have had a de facto standard way of sharing documentation and executable code samples for quite a while now. In 2021, VSCode rolled out their new notebook experience for everyone. Consequently, developers started using this powerful new API to build more and more notebook extensions, with the Jupyter extension being the most popular.

VSCode with a notebook editor

VSCode with a Jupyter notebook editor

The Eclipse Theia IDE project supports VSCode extensions by emulating the VSCode extension API. The IDE framework project itself aims to be an extensible platform for building web and desktop IDEs. It’s used by adopters such as Arduino, ARM, Smartface and VUEngine for custom built IDEs.

Even though the initial support for VSCode extensions back in 2019 was major endeavour, maintaining support for newer versions of VSCode required relatively little maintenance work. However, large new features – like the notebook support – are a different story. Even in the VSCode open source project itself, it takes up more than 30.000 lines of code. Therefore, the notebook API was simply stubbed in VSCode extension emulation code for the longest time. Extensions that wanted to use the API didn’t crash – but they also didn’t really do anything.

Wait a second…

Wasn’t there already a integrated solution for Jupyter notebook files in VSCode before 2021? Well yes, before VSCode had native notebook support, there already was a Jupyter extension for VSCode that provided a Jupyter-Notebook-like experience for .ipynb files. Back then, this was implemented using the webview API that provides an iframe to display any website inside of a VSCode editor tab. This already worked really well and had fantastic support in the Theia project, since the webview API is supported in there.

However, this way of implementing this feature came with a few technical challenges:

  1. How do other extensions make use of the same kind of notebook view? For example, there are now quite a few other extensions using the notebook API. Providing a consistent UI/UX experience across multiple extensions proved to be quite challenging.
  2. How is a webview even able to display VSCode-like tiny editors inside of an iframe for every cell? While the monaco-editor that powers VSCode is freely available as its own package, every extension would effectively contribute another version of monaco to VSCode, leading to massively increased extension bundle size. Not to mention the performance impact, which can be quite noticable when showing hundreds of tiny monaco instances all over a webview.
  3. How do extension developers contribute language support to notebook cell editors? While VSCode has an API to register providers for completion, reference resolution and others, there’s no way to retrieve the corresponding provider for a consumer of the vscode API.

Native Notebooks

The solution for these challenges is fairly obvious, especially in hindsight, but still needed multiple iterations to finalize problem areas such as performance and VSCode API surface. After all, extensions should have a good abstraction layer to build new kernels, renderers, serializers and handle events of notebook editors:

  • Serializers are pretty self-explanatory. Notebook files can come in any shape or form. Therefore, the extensions that register a new type of notebook editor, usually bring a serializer as well. Given a binary file representation, serializers return a notebook VSCode API object that can be used to populate a notebook editor.
  • Kernels are responsible for performing computations on notebook cells. They usually execute the code within a cell and return with a result – in any shape or form. Which is why notebook output renderers are so important.
  • Renderers are basically small, self-contained JavaScript applications that receive data from kernels associated with their respective mime type (i.e. text/plain, text/html or image/png) and are supposed to render a visual representation of that data into a DOM node.

In the end, what the VSCode team came up with is a really well designed system and a solid editing experience for users. And while extension developers now have a pretty good time writing custom notebook extensions – as shown in our blog post explaining how to build a notebook extension with Langium based editing and execution support – this comes with a lot of effort on the IDE part. In fact, there are over 900 pull requests associated to the notebook support in the VSCode OSS repository.

Now, back to the topic at hand: Supporting this feature in Theia proves to be quite the challenge. We had a few adopters of the project asking for notebook support here and there, but no one was interested in taking a few weeks out of their busy schedule to actually contribute something. Luckily, we had the chance to do sponsored work on this during the summer of 2023, which resulted in the initial notebook contribution pull request that took us more than a month of work. After combing through multiple hundreds of review comments we arrived at a point at which the feature was functional, even it wasn’t pretty yet.

Getting to a stable experience

Theia is an active open source project with many different parties involved in its development. As part of its development policy, large changes are supposed to be vetted/reviewed by at least one other contributor party to ensure that they are beneficial to the project. Since there was at least some interest to support that feature, we got some pretty extensive reviews by contributors from Ericsson and Castle Ridge Software. During that review, we fixed accidental regressions, major bugs, user experience issues and improved code quality.

After that, we still had a bunch of unsupported features, less major bugs, UI/UX glitches and other problems. Given that it’s effectively impossible to reasonably review a 11.000+ LoC pull request, the development team opted to approve the changes as is with the condition that those issues should be eventually resolved. Since the initial contribution has been merged in late August of 2023, we were able to continue working on the feature, culminating in further 50 pull requests. By now, we’re fairly happy with how the feature turned out. It’s mostly feature complete, there are no glaring bugs and the user experience has become pretty smooth compared to a few months ago.

Initially we had a bit of trouble figuring out what kind of data VSCode passes to its notebook extensions, since this is a part of VSCode that is barely documented. With a bit of reverse engineering, even features like kernel selection and restart or notebook outlines are now fully supported as can be seen in this sneak-peek of the current version of Theia:

Improved notebook features over the past few months

Conclusion

Tackling huge features can be pretty daunting, especially when working on large and established open source projects. However, not all features need to be supported in fell swoop. Open source is built on trust, and if you can show that you will follow up on your unfinished changes, it’s fine to take things one by one. In fact, we already used a similar process back when we contributed the internationalization feature to Theia, so I can speak from experience here.

About the Author

Mark Sujew

Mark Sujew

Mark is the driving force behind a lot of TypeFox’s open-source engagement. He leads the development of the Eclipse Langium and Theia IDE projects. Away from his day job, he enjoys bartending and music, is an avid Dungeons & Dragons player, and works as a computer science lecturer at a Hamburg University.