Documents, warnings and errors at scale
- Date: 27 April 2023
- Time: 6:30 to 7:30pm (UK Time)
Summary
Jonathan Fine
The arXiv contains over 2.5 million STEM articles, most written in LaTeX. This TeX Hour is a small contribution to improving the rendering of archived articles. The focus is on the backlog.
For more information about the TeX Hour, including Zoom URL, see the About page.
The arXiv accessibility forum earlier this month was a big step. The arXiv intends to publish the HTML generated by the ar5iv project, alongside the PDF created by LaTeX. This gives a rough parity of esteem between PDF and HTML. And it gives approval to rendering tools outside TeXLive.
I’ve been poking around on ar5iv, hoping to learn and contribute. At the TeX Hour I’ll report of what I’ve learnt, what I’ve done, and how you can contribute. (Few special skills beyond knowledge of TeX required.)
A second topic for this TeX Hour will be an outline of the evolving
architecture of my unlatex
project, if time permits. The focus there
is making the use of the document analysis and conversion tools more
accessible.
Some of the videos from the arXiv forum are now available.
URLs
ar5iv and unlatex
- arXiv articles as responsive HTML (ar5iv)
- ar5iv conversion errors and warnings (github)
- unlatex: accessible tools for document analysis (github)