Tuesday, July 23, 2019

Improved rendering of mathematical expressions in Cantor

Hello everyone!

In the previous post I mentioned that the render of mathematical expressions in Cantor has bad performance. This heavily and negatively influences user experience. With my recent code changes I addressed this problem and it should be solved now. In this blog post I wand to provide some details on what was done.

First, I want to show some numbers proving the performance improvements. For example, loading of the notebook "Rigid-body transformations in a plane (2D)" - one of the notebooks I’m using for testing - took 15.9 seconds (this number and all other numbers mentioned in the following are average values of 5 consequent measurements). With the new implementation it takes only 4.06 seconds. And this acceleration comes without loosing render quality.

This is a example, how modern render looks like compared with Jupyter renderer (as you can see, Cantor doesn't show images from web in Markdown entries, but I will fix it soon).

I did further measurements by executing all the tests I wrote for the Jupiter import which cover several Jupyter notebooks. Here the results:
• Without math rendering - 7.75 seconds.
• New implementation - 14.014 seconds.
• Old implementation - 41.296 seconds.
To quickly summarize, we get an average of 535% performance improvement. This result depends on the number of available cores and I’ ll explain below why.

To get these results I solved two main problems of math rendering in Cantor.
First, I changed the code for the LaTeX renderer. In the old implementation the rendering process consisted of the following steps:
1. create TeX document using a page template and the code provided by the user.
2. run `latex` executable on the produced TEX file to generate the DVI file.
3. run `dvips` executable to convert the DVI file to an EPS file.
4. convert the produced EPS file to a QImage using Libspectre library.
After these four steps the produced QImage is shown Cantor’s worksheet (QGraphicsScene/View). As you see, the overall chain of steps to get the image out of a mathematical expression is quite long - there are several steps where the time is spent. In total, for a usual mathematical expression these operations take ~500 ms where the first three steps take 300 ms and the last step takes 200 ms. The complexity and the size of the mathematical expressions have a negligible impact on the numbers shown above. Also, the time spent in Cantor for expressions of other types is relatively small. So, for example if you have a notebook with 20 different mathematical expressions and some other entries of other types, Cantor will load the project in ca 20*500ms=10s.

I reduced this chain to three elements by merging the steps two and three. This was achieved by using pdflatex as the LaTeX engine which produces a PDF file directly out of the TEX file. Furthermore, I replaced libspectre library with Poppler pdf rendering library. This brought the overall time down to 330 ms with pdflatex process taking 300 ms and with the rendering in Poppler (converting PDF to QImage) taking only 30 ms. With this, for our example notebook with 20 mathematical expressions mentioned above the rendering take only 6.6 seconds. In this new chain, the LaTeX process is the bottle neck and I’m looking for potential acceleration here but until now I didn’t find any "magic" parameters which would help to reduce this time spent in latex rendering.

Despite this progress, loading of somewhat bigger documents will hurt in Cantor. For example, for a project having 100 formulas openning of the file will take ca. 33 seconds.

The problem here is that the rendering process is a blocking and non-parallelized operation - only one mathematical expression is processed simultanuosly and the UI is blocked for the whole processing time. Obviously, this behaviour is unsatisfactory and under-utilizes the modern multi-core CPUs. So I  decided to run the rendering part in many parallel tasks asynchronously without blocking the main GUI thread. Fortunately, Qt helps here a log with it's classes QThreadPool managing the pool of threads and QRunnable providing an interface to define "tasks" that will be executed in parallel.

Now when openning a project, for every Markdown entry containing mathematical expression, Cantor creates a render task for every expression, sends this task to the thread pool and continues with the processing of entries in the document. Once such a task has finished, the result is shown in Cantor's worksheet. With this a further good performance improvement can be achieved. Clearly, the more cores you have the faster the processing will be. Of course, if you have only a small number of physical threads possible on your computer, you won't notice a huge difference. But still, you should see an improvement compared to the old single-threaded implementation in Cantor.

For a notebook comparable in size to  "Rigid-body transformations in a plane (2D)" project which has 109 mathematical expressions, the loading of the notebook takes a reasonable and acceptable time on my hardware (I have 8 physical cores in the CPU, so that is why the render acceleration is so big). And, thanks to the asynchron processing, the user can interact with the notebook even though the rendering of the mathematical expressions is still in process.

Since my previous post, not only math renderer have changed, there is also a huge change in Markdown support - Cantor finally handles embeded math expressions, like `\$...\$` and `\$\$...\$\$` in accordance with the Markdown syntax. In the next blog post I'll describe how it works now.