Friday, January 19, 2007

Off to a bad start: XML sucks

I've been meaning to start a blog for a long time now. It's only now that I have the time to do it. Lots of things are getting in the way. One is school. Another is laziness. The other is that software sucks. Anyway, I've made it. Welcome to my blog.

Topic de jour: XML. XML sucks. Why is that? Isn't XML the next-generation all-purpose future-proof data storage/exchange format? Well, no.
  • XML is verbose: you know what? I'm not even going to complain about it anymore; this is a moderate failure that could eventually be fixed without much fuss (e.g. switch to lisp syntax)
  • XML is too complex: entities, namespaces, DOM, SAX, XSLT, XMLReader, XPath, XQuery, XUpdate, XPointer, DTD, XSchema, RELAX NG and these are just off the top of my mind.
  • XML is unfinished: Every one of the above technologies is not finished. You know DOM? Level 1, 2, soon to be 3. The specs are huge; no human should be tortured by having to read them.
  • XML will never be finished: Each and every one of the X* specs will die in a couple of years after it hits gold, being replaced with a newer, possibly incompatible, version. This will go on until the industry will have decided that you know, XML is just so unfashionable, they must move on to the next big thing.
So just why is the state-of-the-art data-storage/exchange format so lame? Well, because the industry is going after a chimera. The idea sounds very nice on paper: an universal interchange format, that is human readable and extensible.

Human readable? No way. If you still think XML was made for humans to read and only incidentally for machines to process, go read some XML to yourself (for the best effect, try a SOAP message, generated by your favorite framework).

Extensible? Yes. You can add new tags. Hmm. But, you know what? After you add that new tag, is your software going to work without modifications? In some cases, yes. Cases you had anticipated. In most cases, of course not; at least not like expected. Also, please remember to redo the DTD/XSchema/RelaxNG or whatever new validation technology will be invented.

So is XML any good? Well, sort of. The best thing about it is settings the record straight on encoding. Another nice thing is the tree structure for holding data. The rest is mostly crap.

Remember extensibility? XML solved a very very small problem about it: the syntax. You still have to determine the semantics of the change. You still have to recode your application to some extent. You still need to worry about incompatibilities between server and client. XML is not the silver bullet. It just solves an accident in the history of computing. It has no essence whatsoever.

Now XML is way over-hyped and will never be finished. Good. What are you supposed to use then? Well, it really doesn't matter all that much. If you need to design a new DSL, you can use XML. But more importantly, make sure you don't create essential complexity. Make sure you put in a version number somewhere. Better yet, chose a well documented text format. Also see the Art of Unix Programming.