TeX Hour

A weekly video meeting

Unlatex: results and prospects

Date: 13 April 2023
Time: 6:30 to 7:30pm (UK Time)

Summary

Jonathan Fine latex is a program that compiles a source document to a PDF document. I’m writing unlatex as a program that goes in the other direction. I’m doing this with a view to

Intercepting the creation of PDF to produce more accessible outputs.
Adding information just before the creation of the PDF to produce tagged PDF.
Providing high-quality regression tests for changes in macros and other inputs.

Well, next week, on Monday 17 April 1:00-5:00pm Eastern Time, we have the very first arXiv Access Forum. I’m looking forward to that. I’m most grateful to the organisers, and I hope they’re not overwhelmed. They have a massive responsibility. As do the esteemed presenters and panelists, and the many participants.

This TeX Hour is about my emerging unlatex tool for reprocessing TeX documents, to provide more accessible outputs. Here are some arXiv stats (in round numbers):

Total number of submissions: 2.25 million.
Downloads per month: 25 million.
Seconds in a month: 2.6 million.
Registered for arXiv Access Forum: 2,000.

Why seconds in a month? Well, it’s approximately equal to the total number of submissions. So we can make a Fermi estimate as to how long it will take to reprocess the entire arXiv to get accessible outputs (assuming suitable software).

Suppose we have a desktop PC with 12 cores, so 24 threads, so about 20 cores doing useful work. On such a machine, if not bottlenecked, we could do the whole lot in a month provided each item takes only 20 seconds. The download might take a while, and the electricity would be about £150 (or $150).

Harder is to make a Fermi estimate for creating suitable software, and yet harder is writing and testing the software, and its we hope accessible outputs.

For more information about the TeX Hour, including Zoom URL, see the About page.

TeX Hour

A weekly video meeting

Unlatex: results and prospects

Summary

URLs

arXiv Access Forum

Unlatex and Access Tree