I was trying the other day to serialise some XML using the XMLWriter class in PHP, but in the end had to abandon it entirely. The class is a good solution to the general problem of outputting XML safely, using a simple stack to ensure well-formedness and passing output through methods designed to escape text and restrict to valid characters. The problem is that it has no way to control the output of whitespace, and prints it out in a dangerous way. After every tag it pops off the stack, it insists on printing a newline, so
<p>My clever <emph>sentence</emph> , broken.</p>
is the result, plainly broken because a renderer ought to (and will) insert a space between the word sentence and the comma. The correct way to handle whitespace is clear: text inside tags is ignored by a parser, while text outside tags should not be added spuriously because it sometimes has meaning. This leads to the following style:
<p >My clever<emph >sentence</emph >, not broken</p >
<p >My clever<emph>sentence</emph>, not broken</p >
if we use a more clever formatter to only break lines where necessary for length.
Unfortunately, xmllib used in PHP is not under development any longer and this plus a few other bugs will not get changed. So, if I stick with PHP for my current project, I will have to roll my own formatter. Is it any wonder that good XML support on the web is in such a dire state, when the most popular language mangles data so easily?