GSoC 2019: Markdown and support of embedded mathematics

Hello everyone!

In the previous post I mentioned that Cantor now handles embedded mathematical expressions inside of Markdown, like $...$ and $$...$$ in accordance with the Markdown syntax.

In the past Cantor for a long time didn’t have any support for Markdown and only have simple text entry type for comment purposes. Markdown entry type was added only in 2018 by kqwyf. Internally, this was realized by using the Discount library, which converts markdown syntax to the to html code which is then passed to Qt for final the rendering (Qt supports limited set of the html syntax).

Discount library actually supports integration with LaTeX: text inside LaTeX expressions like $$...$$, $...$, \[...]\ is passed to the output html string without modifications (except html escaping).

As you see Discount doesn't support embedded mathematics with single delimiter $...$ that is used in Jupyter very frequently. Of course, for my Jupiter integration projects ignoring this type of statements was not an option. I decided to report this issue in Discount bug tracker because all the other options solve this problem purely in Cantor had other problems.

Fortunately, the author of Discount reacted very soon (thanks to him for that) and suggested code changes for supporting the single-delimited math. Unfortunately, the changes didn't get into master branch yet. To proceed further in Cantor I decided to copy required Discount’s code having all the relevant changed into Cantor’s repository as a third party library.

Independent of the support for the single-delimiter mathematics, there is a big problem with the embedded mathematical expressions - you need to somehow find these mathematical statements in output html string. In the initial implementation I simply searched for $$ in the result string but this could lead to "search collisions".

The dollar sign could be inside of a Markdown code block or inside of a quote block. Here, the dollar signs shouldn't treated as part of the embedded mathematics. After some further testing of couple of other implementations on Cantor sidethe conclusion was obvios - the identification and labeling of positions of embedded mathematics in the html string, produced by Discount, should be done directly inside Discount itself.

At this moment, the version of Discount added to Cantor’s repository had two additional functional fixes on top of the officially released version of this library. First, Discount copies all LaTeX expressions during the processing of markdown syntax to a special string list, which is then used by Cantor to search for LaTeX code. Second, a useful change was to add an ASCII non-text symbol to every math expression. This symbol is used as a search key which greatly reduces the likelihood for a string collision, still theoretically possible, though.

For example, if Discount will find (according Markdown syntax) math expression $\Gamma$ , then it will write the additional symbol and the expression iin the output html string will be $<symbol>\Gamma$ and Cantor will search exactly this text.

I think, that's all. Maybe this doesn’t look like a complex problem but solving this problem was a task that took the most time and it took me two months to fix it. So, I think the problem and its solution deserved a separate blog post.

At this moment, what I called "maximum plan" (I have mentioned this concep in this post) of the Jupyter support in Cantor is mostly finished. So, in the next post I plan to show how Cantor now handles test notebooks and what I’ll plan to do next.

GSoC 2019

Tuesday, July 30, 2019

Markdown and support of embedded mathematics

1 comment: