maven: good ideas gone wrong

I’ve had plenty of time to reflect on the state of server side Java, technology, and life in general this week. The reason for all this extra ‘quality’ time was because I was stuck in an endless loop waiting for maven to do its thing, me observing it failed in subtly different ways, tweaking some more, and hitting arrow up+enter (repeat last command) and fiddling my thumbs for two more minutes. This is about as tedious as it sounds. Never mind the actual problem, I fixed it eventually. But the key thing to remember here is that I lost a week of my life on stupid book keeping.

On to my observations:

  • I have more xml in my maven pom files than I ever had with my ant build.xml files four years ago, including running unit tests, static code checkers, packaging jars & installers, etc. While maven does a lot of things when I don’t need them to happen, it seems to have an uncanny ability to not do what I want when I need it to or to first do things that are arguably redundant and time consuming.
  • Maven documentation is fragmented over wikis, javadoc of diverse plugins, forum posts, etc. Google can barely make sense of it. Neither can I. Seriously, I don’t care about your particular ideology regarding elegance: just tell me how the fuck I set parameter foo on plugin bar and what its god damn default is and what other parameters I might be interested in exist.
  • For something that is supposed to save me time, I sure as hell am wasting a shit load of time on making it do what I want and watching it do what it does (or not), and fixing the way it works. I had no idea compiling & packaging less than 30 .java files could be so slow.
  • Few people around me dare to touch pom files. It’s like magic and I hate magicians myself. When it doesn’t work they look at me to fix it. I’ve been there before and it was called ant. Maven just moved the problem and didn’t solve a single problem I had five years ago while doing the exact same shit with ant. Nor did it make it any easier.
  • Maven compiling, testing, packaging and deploying defeats the purpose of having incremental compilation and dynamic class (re)loading. It’s just insane how all this application server deployment shit throws you back right to the nineteen seventies. Edit, compile, test, integration-test, package, deploy, restart server, debug. Technically it is possible to do just edit, debug. Maven is part of the problem here, not of the solution. It actually insists on this order of things (euphemistically referred to as a life cycle) and makes you jump through hoops to get your work done in something resembling real time.
  • 9 out of 10 times when maven enters the unit + integration-test phase, I did not actually modify any code. Technically, that’s just a waste of time (which my employer gets to pay for). Maven is not capable of remembering the history of what you did and what has changed since the last time you ran it so like any bureaucrat it basically does maximum damage to compensate for its ignorance.
  • Life used to be simple with a source dir, an editor, a directory of jars and an incremental compiler. Back in 1997, java recompiles took me under 2 seconds on a 486, windows NT 3.51 machine with ‘only’ 32 MB, ultra edit, an IBM incremental java compiler, and a handful of 5 line batch files. Things have gotten slower, more tedious, and definitely not faster since then. It’s not like I have much more lines of code around these days. Sure, I have plenty of dependencies. But those are run-time resolved, just like in 1997, and are a non issue at compile time. However, I can’t just run my code but I have to first download the world, wrap things up in a jar or war, copy it to some other location, launch some application server, etc. before I am in a position to even see if I need to switch back to my editor to fix some minor detail.
  • Your deployment environment requires you to understand the ins and outs of how where stuff needs to be deployed, what directory structures need to be there, etc. Basically if you don’t understand this, writing pom files is going to be hard. If you do understand all this, pom files won’t save you much time and will be tedious instead. You’d be able to write your own bash scripts, .bat files or ant files to achieve the same goals. Really, there’s only so many ways you can zip a directory into a .jar or .war file and copy them over from A to B.
  • Maven is brittle as hell. Few people on your project will understand how to modify a pom file. So they do what they always do, which is copy paste bits and pieces that are known to more or less do what is needed elsewhere. The result is maven hell. You’ll be stuck with no longer needed dependencies, plugins that nobody has a clue about, redundant profiles for phases that are never executed, half broken profiles for stuff that is actually needed, random test failures. It’s ugly. It took me a week to sort out the stinking mess in the project I joined a month ago. I still don’t like how it works. Pom.xml is the new build.xml, nobody gives a shit about what is inside these files and people will happily copy paste fragments until things work well enough for them to move on with their lives. Change one tiny fragment and all hell will break loose because it kicks the shit out of all those wrong assumptions embedded in the pom file.

Enough whining, now on to the solutions.

  • Dependency management is a good idea. However, your build file is the wrong place to express those. OSGI gets this somewhat right, except it still externalizes dependency configuration from the language. Obviously, the solution is to integrate the component model into the language: using something must somehow imply depending on something. Possibly, the specific version of what you depend on is something that you might centrally configure but beyond that: automate the shit out of it, please. Any given component or class should be self describing. Build tools should be able to figure out the dependencies without us writing them down. How hard can it be? That means some none existing language to supersede the existing ones needs to come in existence. No language I know of gets this right.
  • Compilation and packaging are outdated ideas. Basically, the application server is the run-time of your code. Why doesn’t it just take your source code, derive its dependencies and runs it? Every step in between editing and running your code is a potential to introduce mistakes & bugs. Shortening the distance between editor and run-time is good. Compilation is just an optimization. Sure, it’s probably a good idea for the server to cache the results somewhere. But don’t bother us with having to spoon feed it stupid binaries in some weird zip file format. One of the reasons scripting languages are so popular is because it reduces the above mentioned cycle to edit, F5, debug. There’s no technical reason whatsoever why this would not be possible with statically compiled languages like java. Ideally, I would just tell the application server the url of the source repository, give it the necessary credentials and I would just be alt tabbing between my browser and my editor. Everything in between that is stupid work that needs to be automated away.
  • The file system hasn’t evolved since the nineteen seventies. At the intellectual level, you modify a class or lambda function or whatever and that changes some behavior in your program, which you then verify. That’s the conceptual level. In practice you have to worry about how code gets translated into binary (or asciii) blobs on the file system, how to transfer those blobs to some repository (svn, git, whatever), then how to transfer them from wherever they are to wherever they need to be, and how they get picked up by your run-time environment. Eh, that’s just stupid book keeping, can I please have some more modern content management here (version management, rollback, auditing, etc.)? Visual age actually got this (partially) right before it mutated into eclipse: software projects are just databases. There’s no need for software to exist as text files other than nineteen seventies based tool chains.
  • Automated unit, integration and system testing are good ideas. However, squeezing them in between your run-time and your editor is just counter productive. Deploy first, test afterwards, automate & optimize everything in between to take the absolute minimum of time. Inserting automated tests between editing and manual testing is a particularly bad idea. Essentially, it just adds time to your edit debug cycle.
  • XML files are just a fucking tree structures serialized in a particularly tedious way. Pom files are basically arbitrary, schema less xml tree-structures. It’s fine for machine readable data but editing it manually is just a bad idea. The less xml in my projects, the happier I get. The less I need to worry about transforming tree structures into object trees, the happier I get. In short, lets get rid of this shit. Basically the contents of my pom files is everything my programming language could not express. So we need more expressive programming languages, not entirely new ones to complement the existing ones. XML dialects are just programming languages without all of the conveniences of a proper IDE (debuggers, code completion, validation, testing, etc.).

Ultimately, maven is just a stop gap. And not even particularly good at what it does.

update 27 October 2009

Somebody produced a great study on how much time is spent on incremental builds with various build tools. This stuff backs my key argument up really well. The most startling out come:

Java developers spend 1.5 to 6.5 work weeks a year (with an average of 3.8 work weeks, or 152 hours, annually) waiting for builds, unless they are using Eclipse with compile-on-save.

I suspect that where I work, we’re close to 6.5 weeks. Oh yeah, they single out maven as the slowest option here:

It is clear from this chart that Ant and Maven take significantly more time than IDE builds. Both take about 8 minutes an hour, which corresponds to 13% of total development time. There seems to be little difference between the two, perhaps because the projects where you have to use Ant or Maven for incremental builds are large and complex.

So anyone who still doesn’t get what I’m talking about here, build tools like maven are serious time wasters. There exist tools out there that reduce this time to close to 0. I repeat, Pyhton Django = edit, F5, edit F5. No build/restart time whatsoever.

Java Profiling

One of the fun aspects of being in a programmer job is the constant stream of little technical problems that require digging into. This can sometimes be frustrating but it’s pretty cool if you suddenly get it and make the problem go away. Anyway, since starting in my new job in February, I’ve had lots of fun like this. Last week we had a bit of Java that was obviously out of line performance wise. My initial go at the problem was to focus on the part that had been annoying me to begin with: the way xml parsing was handled. There’s many ways to do XML parsing in Java. We use Jaxb. Jaxb is nice if you don’t have enough time to do the job properly with XPath but the trade off is that it can be slow and that there are a few gotchas like for example creating marshallers and unmarshallers is way more expensive than actually using them. So when processing a shitload of XML files, you spent a lot of time creating and destroying marshallers. Especially if you break down the big xml files into little blobs that are parsed individually. Some simple pooling using ThreadLocal improved things quite a bit but it was still slow in a way that I could not explain with just xml parsing. All helpful but it still felt unreasonably slow in one particular class.

So I spent two days setting up a profiler to measure what was going on. Two days? Shouldn’t this be easy? Yes, except there’s a few gotchas.

  1. The Eclipse TPTP project has a nice profiler. Except it doesn’t work with macs, or worse, macs with jdk1.6. That’s really an eclipse problem, the UI is tied to 1.5 due to Apple stopping to support of Cocoa integration in 1.6.
  2. So I fired up vmware, installed the latest Ubuntu 9.04 (nice), spent several hours making that behave nicely (file sharing is broken and needs a patch). Sadly no OpenGL eyecandy in vmware.
  3. Then I installed Java, eclipse, TPTP, and some other stuff
  4. Only to find out that TPTP and JDK 1.6 is basically unusable. First, it comes with some native library compiled against a library that no longer is used. Solution: install it.
  5. Then every turn you take there’s some error about agent controllers. If you search for this you will find plenty of advice telling you to use the right controller but none whatsoever as to how you would go about doing so. Alternatively people tell you to just not use jdk 1.6 I know because I spent several hours before joining the gang of “TPTP just doesn’t work, use netbeans for profiling”.
  6. So, still in ubuntu, I installed Netbeans 6.5, imported my eclipse projects (generated using maven eclipse:eclipse) and to my surprise this actually worked fine (no errors, tests seem to run).
  7. Great so I right clicked a test. and chose “profile file”. Success! After some fiddling with the UI (quite nerdy and full of usability issues) I managed to get exactly what I wanted
  8. Great! So I exit vmware to install Netbeans properly on my mac. Figuring out how to run with JDK 1.6 turned out to be easy.
  9. Since I had used vmware file sharing, all the project files were still there so importing was easy.
  10. I fired up the profiler and it had remembered the settings I last used in linux. Cool.
  11. Then netbeans crashed. Poof! Window gone.
  12. That took some more fiddling to fix. After checking the release notes it indeed mentioned two cases of profiling and crashes which you can fix with some commandline options.
  13. After doing that, I managed to finally get down to analyzing what the hell was going on. It turned out that my little test was somehow triggering 4.5 million calls to String.replaceAll. WTF!
  14. The nice thing with inheriting code that has been around for some time is that you tend to ignore those parts that look ugly and don’t seem to be in need of your immediate attention. This was one of those parts.
  15. Using replaceAll is a huge code smell. Using it in a tripple nested for loop is insane.
  16. So some more pooling, this time of the regular expression objects. Pattern.compile is expensive.
  17. I re-ran the profiler and … problem gone. XML parsing now is the bottleneck as it should be in code like this.

But, shouldn’t this just be easy? It took me two days of running from one problem to the next just to get a profiler running. I had to deal with crashing virtual machines, missing libraries, cryptic error messages about Agent Controllers, and several unrelated issues. I hope somebody in the TPTP project reads this: your stuff is unusable. If there’s a magic combination of settings that makes this shit work as it should: I missed it, your documentation was useless, the most useful suggestion I found was to not use TPTP. No I don’t want to fiddle with cryptic vm commandline parameters, manually compiling C shit, fiddle with well hidden settings pages, etc. All I wanted was right click, profile.

So am I now a Netbeans user? No way! I can’t stand how tedious it is for coding. Run profiler in Netbeans, go ah, alt tab to eclipse and fix it. Works for me.

WSDL Hell and other WS stuff

I’ve been working with web services technology extensively for the past few years. First as a regular software engineer and currently as a software architecture researcher at Nokia.

Right now the market can roughly be divided in a number of overlapping factions:

  • The enterprise service bus people (IBM et al.). These people consider SOAP to be one of the (many) ways to plug software into a so-called enterprise bus: middleware that does the communication and marshalling on behalf of the plugged in components. This notion is particularly popular among businesses with skeleton filled closets (legacy software). If this sounds an awful lot like CORBA, it is probably because these are the same people.
  • JBI (Sun et al.). Sun likes enterprise buses too but sees them more as a way of integrating Java tighter into the enterprise. JBI (Java Business Integration) is a container for java based services running inside an enterprise bus with convenient ways to access, and be accessed through a whole bunch of protocols (SOAP, CORBA, …). The subtle difference with the IBM vision is that JBI is more about exposing and integrating new Java based services than it is about exposing old legacy services to Java.
  • The WS-* (i.e. the whole mess of web service related standards being pushed by W3C, Oasis and others) people. These people base themselves on piles and piles of WSDL (web service description language) descriptions of all sorts of standardized service interfaces. The interfaces cover all sorts of functionality ranging from resource management to security. In theory this is nice, in practice prepare for agony trying to get any of that stuff working.
  • The lets use SOAP as the latest fashion in RPC protocols masses. Confused by the acronyms, most people produce and consume web services using a thick layer of tools that keep them far away from the nasty details. Of course the tools are quite stupid so effectively they are engaging in a really ineffective form of remote procedure calls. They like to think they are still doing distributed objects, but really all they got was a downgrade from good old CORBA.
  • The asynchronous XML guys. These guys realize that RPC over SOAP is a really bad idea. With responsetimes being measured in seconds, doing anything non trivial runs into some hard scalability issues. Not to mention that dealing with all the details ends up getting messy real quick. This is a vocal minority, most web services (including the high profile public ones) continue to be RPC based.
  • The REST (Representative State Transfer) guys. These guys got sick of all of the above and decided to just send (preferably simple) xml documents using HTTP. To them, the medium is not the message. It works surprisingly well for most usecases in the industry. For me setting up a REST based service is generally less work than the equivalent SOAP service, despite the fact that tools are supposed to make my life easy when doing the latter.

In short, it’s an ugly world out there. Few people get the whole picture. As a programmer, I am less than enthousiastic about all of the above. I remember fondly of being amazed with the ease with which two Java applications could talk to each other over RMI about ten years ago, effortlessly throwing entire running programs (aglets) over the network. Things have gotten a lot more difficult since then and a lot less flexible. Somewhere it seems, people forgot that this should be easy.

Let me summarize my concerns:

  • XML is a machine readable format for exchanging structured data that is poorly suited for human consumption. The common textual representation sadly encourages people to believe that they should edit it. Sadly few good xml editors exist. The ones that do exist are standalone, commercial products.
  • Many of the current web service solutions in the market are XML centric. That means they rely on the exchange, automated manipulation and manual editing of vast amounts of XML data. Manual editing is where all of these approaches become nasty.
  • To make things ‘easier’ for developers, tools generally come with their own set of tool specific xml documents in addition to tool specific extensions of the standard ones. The better tools offer alternatives to text editors for some of the documents. Don’t count on those tools to actually work as expected for non trivial usecases.
  • The tools are part of a vertical stack, usually from one vendor. For the vendor, the stack is a tool to keep the customers tied to its services. Vendor interoperability does not extend beyond the standardized xml formats. Forget about migrating a service or service client from tool A to tool B.
  • Standardization attempts to address this problem have resulted in more complex tools. The WS-* collection of standards is a good example.
  • Despite the many standards, such simple and crucial things as how to integrate a web service in a servlet container have not been standardized. Nor are there usable standards for accessing a web service from client applications. The only thing that has been standardized is the syntax and some semantics of communication between client and server. The process of actually making communication happen is not covered by those standards.

And now let me illustrate by an example. Suppose I want to expose this nice little method:

String helloWorld();

What hoops would you have to jump through to expose this as a web service and consume it from some client using off the shelf tools like Axis? Well, quite a few:

  • First you would need to generate a wsdl description. The tool for that is conveniently called java2wsdl The resulting document compared to the single line interface illustrates my earlier point beautifully. Several decisions need to be made:
    • Like what namespace should the package name be mapped to
    • What server address is going to be the endpoint for the service (not kidding you, this is part of the WSDL)
    • What is the name of the service.
  • The next step is to generate a server stub using wsdl2java. That is a bit of generated code that translates incoming messages back to Java.
  • Then you need to edit the generated code to make it do useful things. Yes that’s right, that means some complications if later on you decide that you would want to change the interface.
  • Additionally two wsdd files are generated by wsdl2java. Wsdd files tell axis what to deploy and undeploy.
  • At this point it is time to setup tomcat with the default axis web application that will host the service. Once you have that up and running you need to modify the axis web application to have your own jar files (including the compiled stub) in the classpath so that axis can access them. That’s right, you need to modify the service container to be able to run a service. If your service requires access to jndi resources, you will need to edit the default axis web.xml as well!
  • Now you can start tomcat and deploy the web service. Deploying in this case means using one of the default web services included with the axis web application to tell it that there is a new web service installed. The file used in this process is the earlier generated deploy.wsdd. Now that the service is running, it may be accessed. For that you need a clientstub.
  • To create a clientstub, download the wsdl description from your new web service (technically you could in this case use the earlier generated one. This is not always the case however!).
  • Using wsdl2java with a different set of parameters a few java classes may be generated. Compile them and use them to create a service call.

Now, that IMHO is a lot of work for Helloworld. Too much work in fact. All this stupid bookkeeping should be done automatically (I mean, Java has typechecking, generics and annotations for a good reason!). Be glad if it stays this simple. Unfortunately, it usually gets hairy if:

  • You ‘want’ (usually this means required) to use any of the WS-* stuff. This is the nightmare scenario, you need to edit basically all of the generated artifacts, hope that you don’t make any mistakes in the process and then it may work. A good example is securing the service using WS-Security. This will essentially triple your workload. You will be doing stuff like downloading various jar files to satisfy dependencies, fiddle with axis handlers, wsdd files and lots of other axis specific stuff.
  • You want to use some WS-* stuff not supported by axis (i.e. most of the WS standards). You will need to edit the generated WSDL file to do this.
  • You want to make the service asynchronous. This should be possible by modifying the wsdd files. I’ve never actually tried this. Nor have I ever encountered an asynchronous web service in the wild.
  • You want to change the Java interface and have these changes reflected in the WSDL and the client and server stubs. You need to start from step 1. Tip save some of the generated code you had to edit. You may be able to copy paste some stuff.

Now all of the above would still be doable if there was good documentation to assist programmers. Unfortunately there isn’t. Worse, any mistake you make will be punished with obscure exceptions either serverside or clientside. Obcure exceptions have two problems, they don’t tell you what the problem is and they don’t tell you where the problem is. Consequently, a small typo can cause you to spend hours trying to find out what is going on. I’ve been there multiple times. In several cases I found the solution just looking at the code where the exception came from (a big advantage of OSS software is that you can do that).
That’ in short is the reason I don’t like WSDL/SOAP based web services. Modern IDEs + application servers automate some of the tasks but rarely all. At best they hide the problem.

JQWeb

JQWeb is a software package for creating and running webquestionaires. You can create questionaires with JQWebEdit, save them as an XML file. The XML file can then be read by the JQWebServlet which produces an HTML form. The response of the form is processed by another servlet which simply appends it to a textfile. This textfile can than be read by JQWebEdit for analysis. Currently the only analysis supported is converting to tab separated format for easy importing in spreadsheet programs but more complex analysis strategies are on my to-do list. JQWeb is available under LGPL so you can change it as much as you like. And change you should because I never bothered to finish the program.