jaxb

2005-10-17

Here’s a simple use case which I imagine happens a lot:

You’re writing a Java application. It requires storing data in xml and retrieving this data from xml. It used to work like this:

  • Write some prototype xml that more or less has the information you need. At this point you expect things to evolve.
  • Parse the thing into a DOM or write a sax handler in Java
  • Write some code to extract data from the xml document and store it in memory as java beans
  • Write some code to turn the beans back into xml
  • Tune parsing code and xml as required while you develop the rest of the application

The above process is tedious. An important reason for this is that neither SAX nor DOM are very javaish. DOM was designed to work well with limited languages such as javascript. Hence everything is a Node. That sort of sucks if you have a language that supports inheritance and interfaces. Instead of using java language features you have to write code to check what kind of node you are dealing with and use tedious calls to extract information. In short, the DOM api sucks big time from an OO perspective.

SAX is worse, it is a very low level API that requires you to write event handlers to get data out of it. The only nice thing about it is that works ‘on the fly’, so you can prevent buffering the entire thing in memory.

Neither are very nice to work with in the above scenario. In most cases all I want to do is something like this:


Document foo = Document.parse("foo.xml", FooInterface.class);
FooInterface fooBean = (FooInterface)foo.getBean();
Bar barBean = (Bar)fooBean.getBar();
barBean.setFooBar("foobar!");
Document.writeXml("foo.xml", fooBean);

In short I want to unserialize some xml, do stuff with it and then serialize it back and not spend to much time thinking about how that should be implemented. There are of course some problems with this:

  • there are multiple possible xml structures to serialize to, which one is correct
  • the interfaces may not fully define the constraints this xml should match
  • there might be other, non java applications that need to do stuff with the xml

Here’s how sun expects people to solve the above problem (using JAXB):

  • Write an xml schema that describes your xml format
  • Compile this schema into java code with interfaces & implementations of beans that match this schema & some parsing and unparsing code
  • Write code that depends on this generated code

This solution has a number of problems:

  • You need to write an xml schema before you write any code. That sort of doesn’t work when you only have a rough idea of what should be in the xml
  • The code that you write to access the generated beans has a necessary level of indirection and is therefore ugly.
  • Any time you edit the schema and regenerate your bean you potentially break this ugly code.
  • XML schema is very complicated, you are likely to iterate multiple times to get it right, even if the xml format does not evolve.
  • The reference implementation of JAXB comes with strings attached. It’s a typical SUN thing bundled with piles of documentation, a shitty installer, a shitty license, lots of dependencies on stuff you don’t really need. There’s not much else. Apache JAXME is a project that intends to fix these issues but at 0.5 it is somewhat a risk to use in a project.

In short, like so many xml based tools, JAXB is an overarchitected solution that doesn’t actually have much advantages over hand parsed and unparsed stuff for the simple use cases that are so common. I’ve written plenty of xml parser code based on the DOM api and I’ve decided to keep doing that. In fact, in the time I used to figure out the above I could have written it for what I am currently building.