Is XML too popular?
Sat, 05/29/2010 - 22:39
One note before I start: If you're ever trying to make a DOM document with Python and then hand it off to another parser, make sure you explicitly convert it to UTF-8 beforehand.
Indy and I can get into pretty vigorous discussions on XML. He hates it. And I can understand why. Misguided programmers have made all sorts of crazy stuff built on XML, including:
- MXML, a UI markup language for Adobe Flex;
- XUL, a UI markup language for Mozilla Firefox;
- Config files the world over;
- Data interchange formats galore;
- XSLT, a template processor/programming language built exclusively for handling XML.
UI Markup
I was a little bit unconvinced at first, but when I started writing apps in Flex I really began to love MXML. It forces you to design most of the stuff that should be in a structured format in MXML, and most of the stuff that should be coded in ActionScript. And in the case of XUL, it's a bloody great idea. It's a big part of the reason why FireFox works well on a variety of platforms: it's a very good cross-platform, native windowing toolkit.
Config Files
I'm a little bit on the fence about config files. There are good and bad examples of XML-based config files. On the bad side there's things like Apache's deceptively XML-like config files. On the good side, there's Tomcat's config files.
Data Interchange
Data interchange actually made me laugh out loud one day. I was using a Drupal module to return some XML for my wife's site, and then I made a Flash app to generate a menu based on that XML. Well, I thought about it, and what I was essentially doing was this:
PHP -> XML -> ActionScript
Effectively, I was converting first-class PHP objects into XML, then parsing them into first-class ActionScript objects. So eventually, instead of this:
<menu>
<item name="Home" icon="bunny"/>
...
I used PHP to generate JavaScript which interacted with the ActionScript via the ExternalInterface class:
<script type='text/javascript'>
flashObj.addMenuItem("Home", "bunny");
...
While that might not sound impressive to a layperson, the fact that I didn't have to hammer the XML into function calls (which would be essentially identical to the JavaScript above) saved me about a second of load time. Now it loads almost instantly, in fact.
XSLT
Let me just say this: I'd rather process XML in XSLT than almost any other language. DOM is a pain in the butt to do almost anything in, and very slow to boot. SAX is very fast, but it's even more painful to work in. As a functionally-inspired language, XSLT has a bit of a learning curve: you can't redeclare variables and recursion is almost always easier than looping. XPATH expressions are a little hard to get used to, too. And it can be slow with large documents. But I use XSLT a lot - it's the easiest way, by far, to manipulate raw XML into text, evaluable code or other forms of XML.
The problem with XSLT is mainly the problem with XML. A lot of times XML is not the best choice, but - if you're stuck with it - you'll end up writing an XSLT to kludge the output into something more usable.
Why is XML So Popular?
Part of the reason XML is so popular, of course, is because of the buzzword that was on every job posting about 3 years ago: AJAX. The X is AJAX is XML, so people latched onto it. Writing dynamic web applications is cool. I'm down with it. But as I discussed above (and what I'm sure most programmers realize by now) is that it's more efficient to simply return Javascript, all shrink-wrapped and ready to evaluate, or even [JSON]. So that's why most HR people have moved onto the next buzzword, Web 2.0. Whatever.
Also, using XML abdicates a certain level of responsibility and planning when it comes to writing complex software projects.
First, since you don't need any kind of rigid schema or structure when using XML, you can always extend it later and things probably won't break on the other end the way they would, say, if you were using serialized objects. Second, since DOM or SAX parsers exist in almost every language, and there are even cooler things out there like E4X, any place where a medieval mapmaker would write "Here There Be Dragons", the modern coder can write "Here There Be XML" and be relatively assured that, when it is explored, something can eat the garbage the other side of the gulf produces.
But there are downsides to these. When I got my first real job (in fact, my current job), my boss expressed a strong preference for fail-fast design. If something is going to fail, it's better that it fail when you first put the change in, rather than at a random interval after that. This is a good argument for static typing. For example, if you write a function in Java:
public int exampleFunction(int input) {
if that function gets anything but an int, it's going to bitch at you right away. But if you do the same thing in JavaScript (where declaring types is optional):
function exampleFunction(input) {
it might work well and good for months, until somebody with particularly fat fingers types 1w instead of 12 and all of a sudden you're getting weird errors and things are breaking and there's a null pointer error or a divide-by-zero.
The other downside is that - by not forcing you to have a plan - you're building a complex project without a plan. Part of the reason people still do boring stuff like write design documents and functional specs is because you get fewer surprises as the process goes along. Fewer surprises is always better.
Conclusion
So, while I prefer XML as a data format, and while I vastly prefer it to the inscrutable, spec-optional binary data formats which we're still dealing with in many areas, as a middleware tool it is lacking. And that's not what it was supposed to be in the first place, anyhow.