TeX Hour

A weekly video meeting

Same inputs via the XML Catalog

Jonathan Fine: I will show work-in-progress that uses an XML Catalog and Resolver to solve the problem (in XML) of ensuring identical inputs. The back-end which stores the inputs will be a Git object store, and the internal names will be Git object identifiers. I want TeX to be able to use the same method. No change would be needed in source documents and style files.

Background: style files

Style files are used when rendering documents. This is true for TeX, for XML to HTML, and for CSS in the browser. Ensuring identical inputs is crucial to ensure uniform outputs. In the old days it was the printing press that ensured uniform outputs (except the misprints beloved of stamp collectors).

Famously, Don Knuth’s TeX generates identical outputs from identical inputs. It is also as a program remarkable stable over time. This makes TeX something of an an archival format and typesetting program. He wanted TeX to give identical results not forever but for at least 100 years.

This enduring nature is very important in STEM publishing, which represents the accumulation of knowledge. In such areas XML documents and tools are highly valued for being consistent and standards-based. Publishers like them.

The XML catalog

We need identical inputs, but without repeatedly fetching the resource over the network. One way of thinking is to hold a locally cached copy.

To support this identical inputs need, XML built on the Catalog provided by SGML, and also introduced the XML Resolver. I’ll explain the Catalog now, and the Resolver later.

The catalog (both XML and SGML) both provide a mapping from names to external entities. An external entity can be a file, but it doesn’t need to be. It could be a URL.

This allows XML documents to refer to external resources (such as stylesheets and schema definition) without have to specify where they are to be found. The XML Catalog is bit like an old-fashioned phone book. You look up a name and get not a phone number but a local name for a file.

An example XML Catalog

Most Linux installations have an XML catalog. Here’s mine:

$ head -5 /etc/xml/catalog
<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <system systemId="http://www.w3.org/2001/xml.xsd" uri="file:///usr/share/xml/xml.xsd"/>
  <system systemId="http://www.w3.org/2009/01/xml.xsd" uri="file:///usr/share/xml/xml.xsd"/>

The XML Resolver

The Catalog maps resource names in an XML document to file names (or other Uniform Resource Identifiers). The Resolver can do all this and more. For example, it can find a resource by looking up in a database. Or it can even do a computation on the fly. This is a way to get today’s weather into an XML document.

Resolvers in Python

The widely used Python module lxml is a wrapper around the standard libxml2 library. In lxml a Resolver can return any of:

Work in progress

In lxml you can, using Python code, easily write your own resolver. In particular you can use as a Catalog a mapping from names to Git object ids. You can then in the Resolver fetch the corresponding objects from a Git object store. This is what I’ll present at the TeX Hour.

You can see this work at:

For preliminary reading I suggest:


For more information about the TeX Hour, including Zoom URL, see the About page.

Home     About     Accessibility