Blogging an XML-driven DMS, I

Prefaratory note

There have been hints over the last few months I would treat myself to the fun re-working of this site I have intended for a while. That has finally happened, and I neatened up a few odds and ends this week needed before the site goes live. While I tidy a few things, I will document my code in my little chunks of spare time over the next fortnight and explain the system itsef a little. So, my non-CompSci friends will have to bear with me. To open a little window into my mood, I have been very personally challenged recently, and this is somewhat a retreat from that, something to keep my hands busy in between bouts of maths and melancholy. Excessive code here is therefore to some extent a deliberate ploy to distract myself while I do some thinking. I am not very hasty, and I would prefer a bit of industry to than lengthy introspection to draw out some decisions.

Introduction

I have had some requests for websites this summer, and as part of my continuing experiments with web technologies, I also ought to build myself a decent system for my own site to keep myself in the game of evolving techniques and ideas. There are various reasons why I have concluded that I should to roll my own:

Aims

What then are the defining features of what I am building?

In the range of size, it will be fairly small, less than almost every established project, but more over-architectured than some personal sites and minimal systems, like Hixie’s or Blosxom.

It also must be able to correctly handle and output XML. At least some of the content will have to be stored in XML, such as MathML, and whether or not the rest is, the whole pipeline must make sure that nothing is mangled.

The input must be natural and flexible. There should be no requirement to hand-author XML directly, but with no dodgy, inscrutable, or complex transformations taking control away from my power-hungry posting. The data stored must as richly marked-up as possible, in a language that is interchangeable, widely accepted, clearly defined, and can be mapped to other formats. Nothing too demanding or heavy on buzzwords though. I am also interested in semantic relations between items of data. This is probably best represented using RDF/OWL, but no-one has worked out yet how to make any effective use of this sort of data on the internet. As a long term goal, I want to follow this work and see what could be done with it.

It should integrate well with existing tools, so the preferable way of storing data will be in flat XML files that can be versioned, processed, and searched from the shell. For lightness, I would prefer to have no database at all.

The information architecture of storing pieces of data is also important, including reasonable structuring in terms of content-types, metadata, taxonomies, and so on. These are small things, but I may as well get them right.

Progress

So far, I have built the guts of the system and am ready to go with it. It meets the aims well, in ways I had not always anticipated. It needs some styling and frontend work to make it pretty still, and I have some concerns with the feeds which are holding me back from jumping on it right away. Expect more soon.