One sufficiently good reason to avoid PHP’s XMLWriter
I was trying the other day to serialise some XML using the XMLWriter class in PHP, but in the end had to abandon it entirely. The class is a good solution to the general problem of outputting XML safely, using a simple stack to ensure well-formedness and passing output through methods designed to escape text and restrict to valid characters. The problem is that it has no way to control the output of whitespace, and prints it out in a dangerous way. After every tag it pops off the stack, it insists on printing a newline, so
<p>My clever <emph>sentence</emph>
, broken.</p>
is the result, plainly broken because a renderer ought to (and will) insert a space between the word sentence and the comma. The correct way to handle whitespace is clear: text inside tags is ignored by a parser, while text outside tags should not be added spuriously because it sometimes has meaning. This leads to the following style:
<p
>My clever<emph
>sentence</emph
>, not broken</p
>
or
<p
>My clever<emph>sentence</emph>, not broken</p
>
if we use a more clever formatter to only break lines where necessary for length.
Unfortunately, xmllib used in PHP is not under development any longer and this plus a few other bugs will not get changed. So, if I stick with PHP for my current project, I will have to roll my own formatter. Is it any wonder that good XML support on the web is in such a dire state, when the most popular language mangles data so easily?
Update: To clarify, this is a complaint about inconsistent adding in of whitespace when output formatting is enabled. When disabled, it works as expected, and I appreciate that given the difficulties of distinguishing between relevant and irrelevant whitespace (that is, it is application specific), it is impossible to build a perfect formatting serialiser that inserts extra text nodes. My point is precisely that users should not be given the option whether enabled by default or not which says, “select me to get unexpected output with potentially harmful extra whitespace”, given that tag parsing rules provide for safe indenting as described above.