In praise of GitHub and file-based workflow


Given my sudden need to learn git better recently, I have only praise for GitHub, a superb tool, and feel prompted to go back and refresh my thoughts on local and remote file workflow.

Table of Contents

1. GitHub: social version control

I should note to begin that I do not have much history in revision control. A keen hacker should be a lot slicker with the common systems than I am, but I have never quite had the commitment to learn any one system well. I have always been on the edge of every community I have contributed code to, spending a few hours or days with the codebase until I knew it well enough to scratch an itch, learn enough subversion to make the diffs, then slope off to another project with a different build system to work out.

In the last fews weeks though, I have been becoming increasingly impressed by git and Mercurial and finding them an excellent investment of a bit of spare time. Git particularly impresses me because of GitHub, a ‘social coding’ tool which has the slickest and most useful website I have seen for a long time, besides the usefulness of the free hosting they provide, within reason, to any open-source project.

The idea is better explained by their very clear website, but the point is that I had not appreciated until just now quite how good these sorts of service are. VisageFolio did not entice me, but using the very efficient set-up of social software development to form small groups of collaborators looks extremely attractive. As a way of bringing together a few like-minded coders and producing something very much more efficent than they could have managed themselves, I can see myself taking part in this sort of community much more in future.

2. Impact of light, community-led decentralised VCS

I have been living under a rock I realise, with my aversion to social networking for the sake of communication driving me away from using it to get a task done. When tools like GitHub really start penetrating even to me though, it is clear that decentralized, easy to use versioning is really starting to take hold among the casual crew like me.

It has even got to the point that GitHub’s clipboard tool has versioning. You know the sort of online code snippet tools, where a user pastes a code fragment to reference in an email or forum or newlist post? GitHub has its own little take, which supports collaborative editing with git diffs and the works. A little while ago, when setting up subversion was a nuisance, I remember when small projects worked by pushing around lots of diffs and patches, if the whole project was even uniformly version controlled at all. Now not only do we consider it worth setting up collabative VCS for individual files, but just file fragments. So, there is penetration firstly towards the more casual authors, and also towards more casual and smaller pieces of content.

The main impact of this trend that I am struck by is that it is really content and authors who are going to start benefitting soon. No-one that I knew ever had project (as opposed to file) version control before for collections of Word documents, but now the logical next step of this trend is for proper tools to reach those who really would not have used them before. With increasingly open and parsable formats, there is a gap in the market now for decentralised management to really hit the masses. Wikis have brought the idea to us all as a SVN equivalent, and now there is the chance for good versioning to step into the gap for content and document management.

My punditry has to take a slight pause here, because however carried away I get I realise I do not actually know the whole state of the present, let alone the future, apart from the fact that is is more commercial and less elegant than I would predict, as well as orange. I will note in closing though the three directions we could head.

Firstly, the boring direction. People write more documents in Word on their local systems and get confused with versions sent by email; friends produce content on VisageFolio or Wordpress and never get confused because their life is sold to some very cunning centralized service, and wikis carry on as before. GDocs never really takes off the way they want it to, and in a decade office software still exists. This basically covers all the content ordinary people are producing at the moment, and seems to me to be most likely.

There are possible shifts in each area though. GDocs is not really viable for many applications, but will grab more mind-share as time goes on and encourage project-based collaboration. Services like Dropbox help kill the synchronisation and collaboration problem with local working copies (the git approach basically, as best as possible). As file formats become more parsable with convergence to common XML elements (CALS, etc.), better integration of diffs will happen there. So, on the files front some things will move away from the cloud into managed control, just as other collaborative projects move into the cloud (I should note I am biased by experience: moving the CICCU exec. to Dropbox was a huge improvement over GDocs for several reasons). Finally, blogging and scattering of social services is slowly forcing development of the common protocols (Salmon, ActivityStreams, PubSubHubbub) needed to distribute it. Bringing content management closer to the user will probably not catch on much, but will make some little progress. This is the slightly cheery scenario.

Finally, there is always the hope that a really surprising service comes along and produces a big shift in habits. Really well-integrated services with automatic synching and versioning could redeem people’s filesystems as the future again, much like git paradoxically produces the cloud by empowering the working copy. On the other hand, better natural language search and indexing of documents could remove the last vestiges of the filesystem as an interface for working with even local documents; however, something would have to develop in usability much quicker than current trends to get anywhere really fast, as users consistently still clutter their documents folders rather than use indexing or basic organisation.

Which will it be? Who knows. My vote though is for the filesystem to thrive as an interface as the trend continues of versioning and managing smaller and smaller chunks of content by wider and wider pools of users. Whether or not clound centralization takes off alongside that is an interesting side thought.

3. Case: blogging

I could and should perhaps end there, but I will instead draw one brief application from it. As the most pertinent example right now in my mind, I have recently come to the conclusion that more than ever this is the time when there is an opportunity for blogging to come back to the users’ systems.

The web began with websites run out of home directories on mainframes and servers, and since then, most online content has been steadily moving away from that. First came the stage of writing files locally and copying them over ftp to the remote site (I think Dreamweaver still uses this model). Then came widespread adoption of the CMS, and finally web interfaces for entering and working with content. We think we have reached some kind of progress by getting to the point where the only place we can author and work with our online encyclopædias and blogs is by typing in a browser text window. There is a huge amount of truth in that, and there is a usefulness to using the web as a powerful open application framework (see recently this post highlighted on MozDev), but some technologists it seems to me over-hype this.

I pick the case of blogging because here I feel there is actually the genuine potential for decentralisation. I deliberately use flat files on this version of my blog, originally because in some nebulous way I thought it was the right thing to do. Now I have made the switch back though I am genuinely enthusiastic about the process. For much of the web’s content, this really is the way to go because I find to my surprise that I in fact have the optimal workflow.

To produce this scribble, I type it up in XMLMind XML Editor (free beer; Java so Win/Lin). I save the file as I would normally with a document in the right folder. I can edit it, work with it, and commit it using git. I can track the revisions of groups of articles if working on a series or something. To publish I then enter git-ftp push to update the live site. That really is it; the best possible outcome. The interface for document editing is more rich and productive than any web tool (and beats out Word, Writer and so on as well); the procedure commits linking changes to several files also beats any wiki or blog history mechanism that I know of. With pervasive, decentralised VCS, the airy standard argument that flat files are better than web storage ‘because you can use all your normal tools on them’ is actually only just reaching where it can appeal to a substantial number people.

So, I have explained how I see the clutch of decentralised VCS growing from very serious hackers outwards, and will go so far as to predict that it will increasingly hit authors and content producers too. The future is bright, and I conclude we are actually in a situation where for small producers of content this could be a move that pays off and brings back the working copy paradigm from coding to its original context of documents and their sharing.