Regression tests and unlatex
- Date: 2 March 2023
- Time: 6:30 to 7:30pm (UK Time)
Summary
Jonathan Fine Regression tests are a good thing. They check that changed software still performs as expected after a change. It is a very important part of test automation. This TeX Hour is about improved regresssion tests.
And when there enough automated tests, it becomes much easier to refactor style files to provide improved implementation and additional functionality. And to do this without unknowingly changing the typesetting of existing documents.
For more information about the TeX Hour, including Zoom URL, see the About page.
Topics
By chance, today Ulrike Fischer of the LaTeX Project reported that an
l3build
test broke suddenly. This was because something appeared in
a different location in the pdftex
log file. The development of the
tagged PDF functionality of LaTeX will benefit from more detailed and
less noisy regression tests.
TeX produces identical outputs from identical inputs. One of TeX’s
outputs is a dvi
file, or for PDFTeX a pdf
file. Identical log
files was never part of the TeX specification. Identical dvi
very
much is. But there is another possible output, which is much better
for regression testing.
TeX and PDFTeX produce a dvi
of pdf
file by applying the
\shipout
primitive to a \box
(such as a typeset page). With just a
little work, we can instead use \showbox
to produce a text
representation of the \box
.
The text representation is not useful for human readers, but it is much better for regression tests. It provides both more detail about changes and less noise about irrelevant changes.
What is unlatex?
Here’s how I see large-scale refactoring of LaTeX. Because LaTeX is so mature, there are many existing documents. So much possibilities for regression tests.
So this is what we can do. Create a large inventory of \showbox
outputs. The task then is to unlatex
these \showbox
outputs. In
other words, produce TeX inputs that will produce exactly these
output.
Experience will show how useful this is. The success of lwarp
gives
me optimism. It produces quite good HTML from LaTeX, going via
dvi
. Going via \showbox
will give more information than via dvi
,
and so further opportunities to recover the syntax of the original
LaTeX document.
In short, perhaps with unlatex
we can both extensively refactor
LaTeX and convert LaTeX source documents to HTML/XML.
There is also a unified-latex (or unlatex) project in the JavaScript world (see the URLs).