## Thursday, August 22, 2019

### Cantor and the support for Jupyter notebooks at the finish line

Hello everyone! It's been almost three weeks since my last post and this is going to be my my final post in this blog. So, I want to summarize all my work done in this GSoC project. Just to remember again, the goal of the project was to add the support for Jupiter notebooks to Cantor. This format is widely used in the scientific and education areas, mostly by the application Jupyter, and there is a lot of content available on the internet in this format (for example, here). By adding the support of this format in Cantor we’ll allow Cantor users access this content. This is short description, if you more intersted, you can found more details in my proporsal.

In the previous post, I described the "maximum plan" of the Jupyter support in Cantor being mostly finished. What this means in practice for Cantor is:
• you can open Jupyter notebooks
• you can modify Jupyter notebooks
• you can save modified Jupyter notebooks without loosing any information
• you can save native Cantor worksheets in Jupyter notebook format
To test the implemented code I used couple of notebooks mentioned in „link to the earlier post“. But the Jupyter world doesn’t consist out of this small number of notebooks only, of course. So, it was interesting to confront the code with more notebooks available in the wild out there.

I recently discovered a nice repository of Jupyter notebooks about Biomechanics and Motor Control with 70 notebooks. I didn’t use these notebooks before for testing and validation and didn’t know anything about there content. 70 notebooks is quite a number and my assumption was these notebooks, without knowing them in detail, will cover many different parts and details of the specification of the Jupyter notebook format and will challenge my implementation to an extent that was not possible during my previous testing activities. So, this new set of notebooks was supposed to be new and good test content for further and stricter validation of Cantor.

I was not disappointed. After the first round of manual testing based on this content, I found issues in 7 notebooks (63 projects functioning correctly!), which I addressed. Now, Cantor handles all 70 notebooks from this repository correctly.

Looking back at what was achieved this summer, the following list summarizes the project:
• the scope for mandatory features described in the project proposal was fully realized
• the biggest part of optional features was finalized
• some other new features were added to Cantor which were needed for the realization of the project like new result types, the supported for embedded mathematical expressions and attachments in Markdown Cells, etc.
• the new implementation was tested and considered stable enough to be merged into master and we plan to release this with Cantor 19.12
• new dedicated tests were written to cover the new code and to avoid regressions in future, the testing framework was extended to handle project load and save steps
I prepared some screenshoots of Jupyter notebooks that show the final result in Cantor:

Even though the initial goal of the project was achieved, there are still some problems and limitations in the current implementation:
• for Markdown entries containing text with images where certain alignment properties were set or after image size manipulations, the visualization of the content is not always correct which is potentially a bug in Qt
• because of small difference in syntax between MathJax used in Jupyter notebooks and Latex used for the actual rendering in Cantor, the rendering of embedded mathematical expressions is not always successful. At the moment Cantor shows an error message in such cases, but this message is often not very clear and helpful for the user
• Qt classes, without involving the full web-engine, as used by Cantor provide only a limited and basic support for HTML. More complex cases like embedded Youtube video and JavaScript don’t work at all.
This is all for the limitations, I think. Let's talk about future plans and perspectives. In my opinion, this project has reached its initial goals, is finished now and will only need maintenance and support in terms of bug fixing and adjustment to potential format changes in future.

When talking more generally, this project is part of the current overall development activities in Cantor to improve the usability and the stability of the application and to extend the feature set in order to enable more workflows and to reach to a bigger audience with this. See 19.08 and 18.12 release announcements to read more about the developments in the recent releases of Cantor. Support of the Jupyter notebook format is a big step into this direction but this not all. We have already many other items in our backlog like for the UX improvements, plots integration improvements going into this direction. Some of this items will be addressed soon. Some of them are something for the next GSoC project next year maybe?

I think, that's all for now. Thank you for reading this blog and thank you for your interest in my project. Working on this project was a very interesting and pleasant period of my life. I am happy that I had this opportunity and was able to contribute to KDE and especially to Cantor with the support of my mentor Alexander Semke.
So, Bye.

## Tuesday, July 30, 2019

### Markdown and support of embedded mathematics

Hello everyone!

In the previous post I mentioned that Cantor now handles embedded mathematical expressions inside of Markdown, like $...$ and $$...$$ in accordance with the Markdown syntax.

In the past Cantor for a long time didn’t have any support for Markdown and only have simple text entry type for comment purposes. Markdown entry type was added only in 2018 by kqwyf. Internally, this was realized by using the Discount library, which converts markdown syntax to the to html code which is then passed to Qt for final the rendering (Qt supports limited set of the html syntax).

Discount library actually supports integration with LaTeX: text inside LaTeX expressions like $$...$$, $$...$$, \[...]\ is passed to the output html string without modifications (except html escaping).

As you see Discount doesn't support embedded mathematics with single delimiter $...$ that is used in Jupyter very frequently. Of course, for my Jupiter integration projects ignoring this type of statements was not an option. I decided to report this issue in Discount bug tracker because all the other options solve this problem purely in Cantor had other problems.

Fortunately, the author of Discount reacted very soon (thanks to him for that) and suggested code changes for supporting the single-delimited math. Unfortunately, the changes didn't get into master branch yet. To proceed further in Cantor I decided to copy required Discount’s code having all the relevant changed into Cantor’s repository as a third party library.

## Wednesday, July 10, 2019

### New unit tests for the new code

Hello everyone,

today I want to present the test system for Cantor's worksheet.
The worksheet is the most central, prominent and important part of the application where the most work is done.

So, it is important to cover this part with enough tests to ensure the quality and stability of this component in future.

At the moment, this system contains only ten tests and all of them cover the functionality for the import of Jupyter notebooks only that was added recently to Cantor (I have mentioned them in my first post).
However, this test infrastructure is of generic nature and can easily be used for testing Cantor's own Cantor files, too.

The test system checks that a worksheet/notebook file is loaded successfully, tests the backend type and validates the overall worksheet structure and the content of its entries.

Actually, some content is not validated, for example the image content. This would increase the complexity of the tests and slow down their execution without additional big value with respect to the quality assurance.

This new infrastructure has proven to be helpful already. When writing the first tests for the worksheet I have found couple of bugs in the implementation of the import of Jupyter notebooks. After having fixed them and now, having this additional barriers, I'm more confident about the implementation and can say more surely that the import of Jupyter notebooks works fine.

In previous post I have mentioned some issues with the perfromance of the renderer used for mathematical expressions in Cantor. It turned out this problem is not so easy to solve as I assumed first. But now, after having finished a substantial part of the work that was planned to be done as part of this GSoC project, I can give more attention to to remaining problems, including this one with the performance of the renderer.
In the next post I plan to show a better realization of the math renderer in Cantor.

## Saturday, June 22, 2019

### Support for Jupyter notebooks has evolved in Cantor

Hello everyone, it's been almost a month since my last post and there are a lot of changes that have been done since then.

First, what I called the "minimal plan" is arleady done! Cantor can now load Jupyter notebooks and save the currently opened document in Jupyter format.

Below you can see how one of the Jypiter notebooks I'm using for test purposes (I have mentioned them in previous post) looks in Jupyter and in Cantor.

As you can see, there aren't many differences in the representation of the content except of some minor differences in the rendering of the markdown code.

For the comparison, I also prepared some previews of the same fragments of the notebooks, opened in Jupyter and in Cantor.
This is a fragment from Understanding evolutionary strategies and covariance matrix adaptation notebook.

As the next example, we show a screenshot of A Reaction-Diffusion Equation Solver in Python with Numpy notebook.

As the final example, we show a screenshot of Rigid-body transformations in a plane (2D) notebook.

To be more detailed and concrete on what is currently supported in Cantor, below is the list of objects that can be imported:
• Markdown cells
• With mathematical expressions
• With attachments
• Code cells
• With text (including error messages) and image results)
•  Raw NBConvert cells
Cantor is able to handle almost all content specified by Jupyter notebook format, except of some metadata information about the notebook in general and about its cells, information about the used "kernel" (support for this will be added soon) and results of another types (for example latex or html outputs), which are more difficult to implement because of the lack of good and complete documentation of them.

When saving the project in Jupyter's format, Cantor handles almost all of its native entry types like markdown entries, text entries, code entries and image entries. For the remaining "page break entry" in Cantor it is still to be worked out how to map this element to Jupyter's structures.

Despite quite a good progress made, there is still a lot place and potential for improvements. Besides some technical issues arising when dealing with the import of another format and mapping its sturcture to the native structures of your application, which is very natural actually for all applications I guess, there is currently also currently problem with perfromance of the renderer used for mathematical expressions in Cantor. Openning of large documents (either in Cantor's native format or Jupyter notebooks) having a lot of formulas takes considerable amount of time because of the bad renderer implementation in Cantor. This heavily influence the user experience and I plan to start working on this soon.

So, there are some work for done before Cantor will support what I call the "maximum plan". With this I understand the ability to garantee the conversion between two formats when openning or saving projects to happen without any substantial loss of information relevant and critical for the consumption of the project file.

To achieve this, I want now to invest more into testing with more notebooks and closing the remaining gaps but also into writing automatic tests for Cantor covering this new functionality in Cantor. The latter are important to also prevent any kind of regressions introduce during bug fixing activities in the next weeks. This is something for the next week.

In the next post I plan to show a working test system and how Cantor are passing its tests.

## Tuesday, June 4, 2019

Hello everyone! I'm participating in Google Summer of Code 2019, I am working on KDE Cantor project. The GSoC project is mentored by Alexander Semke - one of the core developers of LabPlot, Knights and Cantor. At first, let me introduce you into Cantor and into my GSoC-project:
Cantor is a KDE application providing a graphical interface to different open-source computer algebra systems and programming languages, like Octave, Maxima, Julia, Python etc. The main idea of this application is to provide one single, common and user-friendly interface for different systems instead of providing different GUIs for different systems. The details specific to the different languages are transparent to the end-user and are handled internally in the language specific parts of Cantor's code.
There is another project following this idea - the project Jupyter. As a result of its very big popularity, user base and the community around this project, there is a lot of content available for this project created and contributed by users from different scientific and educational areas, as documented in the gallery of interesting Jupyter Notebooks.
At the moment, Cantor has its own format for projects. Though this format is good enough to manage Cantor projects, there is not a lot of content created and published by Cantor users and the user base is still not at the level which this application would deserve. Furthermore, sharing of the content stored in Cantor's native format requires the availability of Cantor on the target system, which is available for linux only at the moment. This all complicates the attempts to make Cantor more popular and known to a broader user base. Adding the possibility to import/export Jupyter Notebook worksheets in Cantor will address the problems described above.
If you are interested in a more the technical and detailed description of the project, you can check out my proposal.

Actually, it's not my first contribution to Cantor. I am contributing to this project for roughly one year already. As a developer interested in C++, Qt and applications relevant for scientific purposes, I started to contribute to Cantor last year by working on smaller bug fixes first. With time and with more understanding about the overall architecture of Cantor I could work on bigger topics like new features, more complicated bug fixes and refactorings in the code and this year I'm happy to contribute yet another big and very important functionality to Cantor as part of GSoC.

To start I selected couple of well structured Jupyter notebooks from a gallery of interesting Jupyter Notebooks. Those notebooks were selected based on three criteria:

• they should be self-sufficient
• they should contain commands and results of different types
• they should have a reasonable size sufficient for testing the new code and for demoing the results
Below you can see the screenshots of the notebooks I decided to use:
The notebooks will be used for testing functionality and also for showing a progress of this project and in the final post I will summarize and report on Cantor being able to successfully process such files.

In the next post I plan to already show a working first version of the Jupyter importer.