Saturday, June 22, 2019

Support for Jupyter notebooks has evolved in Cantor


Hello everyone, it's been almost a month since my last post and there are a lot of changes that have been done since then.

First, what I called the "minimal plan" is arleady done! Cantor can now load Jupyter notebooks and save the currently opened document in Jupyter format.

Below you can see how one of the Jypiter notebooks I'm using for test purposes (I have mentioned them in previous post) looks in Jupyter and in Cantor.


As you can see, there aren't many differences in the representation of the content except of some minor differences in the rendering of the markdown code.

For the comparison, I also prepared some previews of the same fragments of the notebooks, opened in Jupyter and in Cantor.
This is a fragment from Understanding evolutionary strategies and covariance matrix adaptation notebook.




As the next example, we show a screenshot of A Reaction-Diffusion Equation Solver in Python with Numpy notebook.



As the final example, we show a screenshot of Rigid-body transformations in a plane (2D) notebook.



To be more detailed and concrete on what is currently supported in Cantor, below is the list of objects that can be imported:
  • Markdown cells
    • With mathematical expressions
    • With attachments
  • Code cells
    • With text (including error messages) and image results)
  •  Raw NBConvert cells
Cantor is able to handle almost all content specified by Jupyter notebook format, except of some metadata information about the notebook in general and about its cells, information about the used "kernel" (support for this will be added soon) and results of another types (for example latex or html outputs), which are more difficult to implement because of the lack of good and complete documentation of them.

When saving the project in Jupyter's format, Cantor handles almost all of its native entry types like markdown entries, text entries, code entries and image entries. For the remaining "page break entry" in Cantor it is still to be worked out how to map this element to Jupyter's structures.

Despite quite a good progress made, there is still a lot place and potential for improvements. Besides some technical issues arising when dealing with the import of another format and mapping its sturcture to the native structures of your application, which is very natural actually for all applications I guess, there is currently also currently problem with perfromance of the renderer used for mathematical expressions in Cantor. Openning of large documents (either in Cantor's native format or Jupyter notebooks) having a lot of formulas takes considerable amount of time because of the bad renderer implementation in Cantor. This heavily influence the user experience and I plan to start working on this soon.

So, there are some work for done before Cantor will support what I call the "maximum plan". With this I understand the ability to garantee the conversion between two formats when openning or saving projects to happen without any substantial loss of information relevant and critical for the consumption of the project file.

To achieve this, I want now to invest more into testing with more notebooks and closing the remaining gaps but also into writing automatic tests for Cantor covering this new functionality in Cantor. The latter are important to also prevent any kind of regressions introduce during bug fixing activities in the next weeks. This is something for the next week.

In the next post I plan to show a working test system and how Cantor are passing its tests.

Tuesday, June 4, 2019

Hello everyone! I'm participating in Google Summer of Code 2019, I am working on KDE Cantor project. The GSoC project is mentored by Alexander Semke - one of the core developers of LabPlot, Knights and Cantor. At first, let me introduce you into Cantor and into my GSoC-project:
Cantor is a KDE application providing a graphical interface to different open-source computer algebra systems and programming languages, like Octave, Maxima, Julia, Python etc. The main idea of this application is to provide one single, common and user-friendly interface for different systems instead of providing different GUIs for different systems. The details specific to the different languages are transparent to the end-user and are handled internally in the language specific parts of Cantor's code.
There is another project following this idea - the project Jupyter. As a result of its very big popularity, user base and the community around this project, there is a lot of content available for this project created and contributed by users from different scientific and educational areas, as documented in the gallery of interesting Jupyter Notebooks.
At the moment, Cantor has its own format for projects. Though this format is good enough to manage Cantor projects, there is not a lot of content created and published by Cantor users and the user base is still not at the level which this application would deserve. Furthermore, sharing of the content stored in Cantor's native format requires the availability of Cantor on the target system, which is available for linux only at the moment. This all complicates the attempts to make Cantor more popular and known to a broader user base. Adding the possibility to import/export Jupyter Notebook worksheets in Cantor will address the problems described above.
If you are interested in a more the technical and detailed description of the project, you can check out my proposal.

Actually, it's not my first contribution to Cantor. I am contributing to this project for roughly one year already. As a developer interested in C++, Qt and applications relevant for scientific purposes, I started to contribute to Cantor last year by working on smaller bug fixes first. With time and with more understanding about the overall architecture of Cantor I could work on bigger topics like new features, more complicated bug fixes and refactorings in the code and this year I'm happy to contribute yet another big and very important functionality to Cantor as part of GSoC.

To start I selected couple of well structured Jupyter notebooks from a gallery of interesting Jupyter Notebooks. Those notebooks were selected based on three criteria:

  • they should be self-sufficient
  • they should contain commands and results of different types
  • they should have a reasonable size sufficient for testing the new code and for demoing the results
Below you can see the screenshots of the notebooks I decided to use:

The notebooks will be used for testing functionality and also for showing a progress of this project and in the final post I will summarize and report on Cantor being able to successfully process such files.

In the next post I plan to already show a working first version of the Jupyter importer.