Unlatex: results and prospects
- Date: 13 April 2023
- Time: 6:30 to 7:30pm (UK Time)
Summary
Jonathan Fine latex
is a program that compiles a source document
to a PDF document. I’m writing unlatex
as a program that goes in the
other direction. I’m doing this with a view to
- Intercepting the creation of PDF to produce more accessible outputs.
- Adding information just before the creation of the PDF to produce tagged PDF.
- Providing high-quality regression tests for changes in macros and other inputs.
Well, next week, on Monday 17 April 1:00-5:00pm Eastern Time, we have the very first arXiv Access Forum. I’m looking forward to that. I’m most grateful to the organisers, and I hope they’re not overwhelmed. They have a massive responsibility. As do the esteemed presenters and panelists, and the many participants.
This TeX Hour is about my emerging unlatex
tool for reprocessing TeX
documents, to provide more accessible outputs. Here are some arXiv
stats (in round numbers):
- Total number of submissions: 2.25 million.
- Downloads per month: 25 million.
- Seconds in a month: 2.6 million.
- Registered for arXiv Access Forum: 2,000.
Why seconds in a month? Well, it’s approximately equal to the total number of submissions. So we can make a Fermi estimate as to how long it will take to reprocess the entire arXiv to get accessible outputs (assuming suitable software).
Suppose we have a desktop PC with 12 cores, so 24 threads, so about 20 cores doing useful work. On such a machine, if not bottlenecked, we could do the whole lot in a month provided each item takes only 20 seconds. The download might take a while, and the electricity would be about £150 (or $150).
Harder is to make a Fermi estimate for creating suitable software, and yet harder is writing and testing the software, and its we hope accessible outputs.
For more information about the TeX Hour, including Zoom URL, see the About page.