GSoC 2019: Cantor and the support for Jupyter notebooks at the finish line

Hello everyone! It's been almost three weeks since my last post and this is going to be my my final post in this blog. So, I want to summarize all my work done in this GSoC project. Just to remember again, the goal of the project was to add the support for Jupiter notebooks to Cantor. This format is widely used in the scientific and education areas, mostly by the application Jupyter, and there is a lot of content available on the internet in this format (for example, here). By adding the support of this format in Cantor we’ll allow Cantor users access this content. This is short description, if you more intersted, you can found more details in my proporsal.

In the previous post, I described the "maximum plan" of the Jupyter support in Cantor being mostly finished. What this means in practice for Cantor is:

you can open Jupyter notebooks
you can modify Jupyter notebooks
you can save modified Jupyter notebooks without loosing any information
you can save native Cantor worksheets in Jupyter notebook format

To test the implemented code I used couple of notebooks mentioned in „link to the earlier post“. But the Jupyter world doesn’t consist out of this small number of notebooks only, of course. So, it was interesting to confront the code with more notebooks available in the wild out there.

I recently discovered a nice repository of Jupyter notebooks about Biomechanics and Motor Control with 70 notebooks. I didn’t use these notebooks before for testing and validation and didn’t know anything about there content. 70 notebooks is quite a number and my assumption was these notebooks, without knowing them in detail, will cover many different parts and details of the specification of the Jupyter notebook format and will challenge my implementation to an extent that was not possible during my previous testing activities. So, this new set of notebooks was supposed to be new and good test content for further and stricter validation of Cantor.

I was not disappointed. After the first round of manual testing based on this content, I found issues in 7 notebooks (63 projects functioning correctly!), which I addressed. Now, Cantor handles all 70 notebooks from this repository correctly.

Looking back at what was achieved this summer, the following list summarizes the project:

the scope for mandatory features described in the project proposal was fully realized
the biggest part of optional features was finalized
some other new features were added to Cantor which were needed for the realization of the project like new result types, the supported for embedded mathematical expressions and attachments in Markdown Cells, etc.
the new implementation was tested and considered stable enough to be merged into master and we plan to release this with Cantor 19.12
new dedicated tests were written to cover the new code and to avoid regressions in future, the testing framework was extended to handle project load and save steps

I prepared some screenshoots of Jupyter notebooks that show the final result in Cantor:

Even though the initial goal of the project was achieved, there are still some problems and limitations in the current implementation:

for Markdown entries containing text with images where certain alignment properties were set or after image size manipulations, the visualization of the content is not always correct which is potentially a bug in Qt
because of small difference in syntax between MathJax used in Jupyter notebooks and Latex used for the actual rendering in Cantor, the rendering of embedded mathematical expressions is not always successful. At the moment Cantor shows an error message in such cases, but this message is often not very clear and helpful for the user
Qt classes, without involving the full web-engine, as used by Cantor provide only a limited and basic support for HTML. More complex cases like embedded Youtube video and JavaScript don’t work at all.

This is all for the limitations, I think. Let's talk about future plans and perspectives. In my opinion, this project has reached its initial goals, is finished now and will only need maintenance and support in terms of bug fixing and adjustment to potential format changes in future.

When talking more generally, this project is part of the current overall development activities in Cantor to improve the usability and the stability of the application and to extend the feature set in order to enable more workflows and to reach to a bigger audience with this. See 19.08 and 18.12 release announcements to read more about the developments in the recent releases of Cantor. Support of the Jupyter notebook format is a big step into this direction but this not all. We have already many other items in our backlog like for the UX improvements, plots integration improvements going into this direction. Some of this items will be addressed soon. Some of them are something for the next GSoC project next year maybe?

I think, that's all for now. Thank you for reading this blog and thank you for your interest in my project. Working on this project was a very interesting and pleasant period of my life. I am happy that I had this opportunity and was able to contribute to KDE and especially to Cantor with the support of my mentor Alexander Semke.
So, Bye.

GSoC 2019

Thursday, August 22, 2019

Cantor and the support for Jupyter notebooks at the finish line

4 comments: