This month, I pushed a big update to latex2exp, bumping it to version 0.9.

latex2exp is the R package for rendering LaTeX-formatted text in any plotting context. This package lets the user add formatted text (e.g. bold, italic, underline, different font sizes…), mathematical symbols, and equations to plots. Although this capability exists in base R, latex2exp makes it more accessible by exposing it via LaTeX, which is the most common standard for typesetting mathematical formulas.

The 0.9.3 update massively increased the number of LaTeX commands recognized, reworked the parser to work correctly in circumstances where the old parser would fail, improved documentation, and expanded its test suite to include a large number of sample expressions, seen-in-the-wild uses of latex2exp, and edge cases.

In this post, I will outline briefly about how latex2exp came to be, how it works, what changed and what the roadmap for the package looks like.

Typesetting formatted text and formulas in base R

The primary way to add formatted text and mathematical notation in base R is to use plotmath expressions.

plotmath (described in ?plotmath) is a DSL that comprises expressions that obey the syntax of regular R code, but are specially interpreted by a renderer. An expressions such as

expression(lim(frac(f(x+h[i])-f(x), h[i]), h[i] %->% 0))

combines function calls, subscripts, and operators like %->% to produce a valid, unevaluated R expression. Each operand and token is then interpreted by the renderer at plotting time; for instance, the expression above is rendered as

Plotmath expressions can be used as plot annotations, titles, axis and legend labels in base R, lattice, and ggplot2 graphics.

This system has a number of disadvantages:

  1. The syntax is unfamiliar to most R users LaTeX is the de-facto standard for typesetting mathematical equations. Although the plotmath system is somewhat reminiscent of LaTeX, anecdotally, new and experienced R users find the plotmath syntax unnecessarily different and complicated. While it is an interesting application of R’s ability to do metaprogramming, in practice, this DSL does not bring tangible advantages over a string containing LaTeX notation.
  2. Equations have to be valid R syntax. Because a plotmath expression has to be parsable, a number of equations need workarounds in order to be written as valid R expressions. Those equations typically need workarounds combining {} (braces are invisible groups), phantom() (an invisible token) and * (which juxtaposes the operands).

For example, the equation a < b < c is not valid R code; the correct way to typeset the equation would be expression({a < b} * {phantom() < c}). 3. It is not easily extensible. As far as I can tell, there are no hooks for introducing new symbols and functions. 4. The quality of the output is… not ideal. Compared to a full LaTeX typesetter, the rendered output is merely passable. Limitations include the inability to italicize symbols and greek letters; typeset and align multi-line equations; and, in some cases, properly resize symbols in presence of equations containing tall elements, resulting in artifacts.

latex2exp tries to address (1)-(3) by providing an easier-to-use interface to plotmath’s capabilities. Because it is just a translational layer on plotmath, it cannot improve on the quality of the typesetting, although in many situations it can produce a higher-quality plotmath representation than a hand-written expression.

An interactive demo

Here’s an interactive demo! Enter a LaTeX expression into the text box on the right panel to preview how latex2exp will render it.

How latex2exp works

latex2exp, via its main function latex2exp::TeX(), parses an input LaTeX string and tries to convert it to a plotmath expression. It does that by scanning the LaTeX string looking for various types of tokens, and translating them into the plotmath representation that is visually closest to the ideal rendering.

For some expressions, the translation is straightforward. For example,

# is equivalent to

For others, the translation is not so straightforward, and latex2exp translates the LaTeX string into the closest equivalent, which might be a relatively complicated plotmath expression:

TeX(r"($\frac{ih}{2\pi} \frac{d}{dt} \ket{\Psi(t)} = \hat{H}\ket{\Psi(t)}$)")
# is equivalent to
  frac(ih, 2*pi) * phantom(.) *
  frac(d, dt) * phantom(.) *
  group('|', Psi(t), rangle) ==
  hat(H) * group('|', Psi(t), rangle))

A common source of bugs for new users of latex2exp is that the backslash character inside strings is used to start an escape sequence, such as \n (newline), \t (tabs), or \unnnn (a Unicode character). So, a user attempting to write a TeX string such as "\Psi" will be greeted by the error

Error: '\P' is an unrecognized escape in character string starting ""\P"

Prior to R 4.0, the only available option was to escape the backslash character using a double-backslash, e.g. "\\Psi", a surprising first hurdle for new users of latex2exp.

As of R 4.0, it is possible to use unquoted backslashes in a string, if the string is marked as a raw string, e.g. r"(\Psi)". Raw strings are written as r"(...)" with ... being any character sequence (see ?Quotes for a description of raw strings). I recommend using raw strings when using the package (unless using R > 4.0 is not possible).

At any point, it is possible to obtain a quick preview of the output of the call to TeX by calling plot() on the returned value:

e <- TeX(r"($\sum_{i=1}^{N} x_i$)")
plot(e, cex=3)

Finally, the package website has a filterable table of supported LaTeX commands and a preview of how they will be rendered. The same table can also be invoked anytime from the R prompt using the command latex2exp_supported().

What’s next?

As far as I can tell, the current version (0.9.3) of latex2exp supports all the LaTeX commands that can be feasibly rendered via plotmath (and a few more are “emulated” using some trickery).

The goal for the 1.x branch of latex2exp is to improve on workflows that are unnecessarily error prone, add new features that are hard or impossible to achieve with plotmath, and extend the range of examples provided in the documentation.

Specialized annotation geoms for ggplot and ggrepel

I would like to support ggplot more directly. Currently, for ggplot geoms like geom_text, geom_label, and annotate that take a label aesthetic, the input vector for the label aesthetic is expected to be of type character, rather than of type expression. In order to plot formatted text and formulas, the user is expected to pass a character representation of the plotmath expression and remember to specify the parameter parse=TRUE to force parsing of the expression.

This means that in order to use TeX with these functions, it is necessary to use the parameter TeX(..., output="character") to force it to return the character representation of the expression:

ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
  annotate("text", x = 4, y = 25, 
           label = TeX(r"(\alpha+\beta)", output="character"),

This is unintuitive, error-prone, and unnecessarily verbose.

I propose to add the functions: geom_text_TeX, geom_label_TeX, and annotate_TeX. These functions will forward to the underlying ggplot functions, and set the appropriate parameters for the user, such that:

ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
annotate_TeX(x = 4, y = 25, label = r"(\alpha+\beta)")

will automatically parse the TeX input, turn it into an expression, and forward it to annotate with parse=TRUE.

Multi-line equations

There is currently no way to introduce a line break in an equation.

I can see two possibilities for achieving this in a future 1.x version:

  • Break the expression into separate labels, each containining a line of equation, and compute positioning coordinates that achieve the correct layout and alignment;
  • Or, exploiting an undocumented (as far as I know) behavior of plotmath, it appears to be possible to liberal use of the atop() plotmath function to stack multiple equations on top of each other, wrapped in a call to displaysize() to ensure each line is displayed with the correct font size.

The latter produces a passable-in-a-pinch output where each line is center-aligned:

I am accumulating a few examples of usages of latex2exp – both from my own work, and in the wild – that can be used to showcase what the package can do for publication-quality plots.

Would you like to share a plot that you think showcases the ability of latex2exp (or just looks very pretty) for inclusion in this gallery? Please feel free to open an issue in the GitHub repo.