Technical updates

March the 27th 2011 at 12 o’clock am

Abstract

A bit of waffle about my technical prowess. The little bugs I have fixed that you are not interested in I am not reporting, so CompScis do read. This is probably as interesting as the posts about my own blogging system get.

I have done a spot of tinkering with this site this week in the course of writing some more stuff, still at my old tricks of adding a line or two of code for every article. Slightly more ambitiously this week I enabled neatly-generated PDF versions of articles. To do that, I am using PrinceXML. As far as I can find, it is the very best tool available for rendering XML content to pages using CSS styling, which means I can use it quickly without duplicating all my formatting for the main version of the site. Unfortunately, not only is PrinceXML not extensible to do quite all the things I would like (though I have to say, its CSS support is much better than any browser, perhaps because there are not performance implications); and, what is even more obnoxious is its licencing with forces me to link to their site on my Colophon page as well as prominently display their logo on all PDFs. Overall, I am very happy with the system and can live with its closed-source nature. It was quick and easy to enable.

I have also, while writing more for the site, exercised a bit more of my web technologies hobbying with planning future stylistic pinnacles for my content. The updated Colophon has the latest in sketch, but I’ll helpfully reproduce it here.

The largest thing I am investigating is to write a stonking JavaScript library to handle layout. Browsers can’t be trusted to do nice line-breaking which minimises raggedness; nor do they place floats cleverly; nor do they handle column breaking well (WebKit very buggily, Opera with quite a few features but poor float handling; Fx stably but with non-existent break handling). Basically, the text layout problem is quite hard, dealing with at least line-breaks, hyphenation, justification, column-breaks, and tables and floats and footnotes in columnar material. This is all very performance sensitive (typically O(# words 2)O(\text{# words}^2)), so browsers will never get as pretty as here as static renderers like LaTeX or InDesign. If re-flow takes 0.25 s in LaTeX, that is considered pretty good, but those sorts of algorithms are totally unacceptable in browsers. Where the content is not dynamic however, a single-pass of the appropriate algorithms can be done at page load. JavaScript, with its excellent selector languages (XPath and CSS) and clear separation between content tree, and rendering and painting/font metrics, actually makes the ideal way to go about implementing this sort of thing, and could even outdo TeX, which, for legacy reasons, is forced to accommodate the memory and processing power of 100 MHz computers and so uses first-fit fill for handling page breaks. There are libraries which have made a start on using JavaScript for beautiful text layout, but the sort of highly elegant handling of columns, floats, footnotes, and so on, all together, is still an open problem. Tackling it would take a decent time commitment and resolve, because it’s pointless to write or use until you get to the point where performance, features, and lack of glitches exceed browsers’ native layout.

This is a pretty ambitious idea, so I have no idea if I’ll embark on it, or how fully. Hacking in hyphenation and pretty paragraph breaks should be doable, but there’s not much point unless I can properly take over the column flows and get them just right. In the aim of making the site as readable and nice to use as possible, I ought probably to prioritising some updates to the old article listings more, but I’m loathe to take away the minimalist ideal.

As has been remarked before, content is king. With little to show to follow up on my long articles a few weeks ago, I can assure you that I do indeed have a lot of thoughts going on. You might see me before you read anything, though.