Maven: the way forward

A bit longer post today. My previous blog post set me off pondering on a couple of things that I have been pondering on before that sort of fit nicely together in a potential way forward. In this previous post and also this post, I spent a lot of words criticizing maven. People would be right to criticize me for blaming maven. However, that would be the wrong way to take my criticism. There’s nothing wrong with maven, it just annoys the hell out of me that it is needed and that I need to spend so much time waiting for it. In my view, maven is a symptom of a much bigger underlying problem: the java server side world (or rather the entire solution space for pretty much all forms of development) is bloated with tools, frameworks, application servers, and other stuff designed to address tiny problems with each other. Together, they sort of work but it isn’t pretty. What if we’d wipe all of that away, very much like the Sun people did when they designed Java 20 years ago? What would be different? What would be the same? I cannot of course see this topic separately from my previous career as a software engineering researcher. In my view there have been a lot of ongoing developments in the past 20 years that are now converging and morphing into something that could radically improve over the existing state of the art. However, I’m not aware of any specific projects taking on this issue in full even though a lot of people are working on parts of the solution. What follows is essentially my thoughts on a lot of topics centered around taking Java (the platform, not necessarily the language) as a base level and exploring how I would like to see the platform morph into something worthy of the past 40 years of research and practice.

Architecture

Lets start with the architecture level. Java packages were a mistake, which is now widely acknowledged. .Net namespaces are arguably better and OSGi bundles with explicit required and provided APIs as well as API versioning are better still. To scale software into the cloud where it must coexist with other software, including different (or identical) versions of itself, we need to get a grip on architecture.

The subject has been studied extensively (see here fore a nice survey of some description languages) and I see OSGi as the most successful implementation to date that preserves important features that most other development platforms currently lack, omit, or half improvise. The main issue with OSGi is that it layers stuff on top of Java but is not really a part of it. Hence you end up with a mix of manifest files that go into jar files; annotations that go into your source code; and cruft in the form of framework extensions to hook everything up, complete with duplicate functionality for logging, publish subscribe patterns, and even web service frameworks. The OSGi people are moving away towards a more declarative approach. Bring this to its ultimate conclusion and you end up with language level support for basically all that OSGi is trying to do. So, explicit provided and required APIs, API versioning, events, dynamic loading/unloading, isolation.

A nice feature of Java that OSGi relies on is the class loader. When used properly, it allows you to create a class loader, let it load classes, execute the functionality, and then destroy the class loader and all the stuff it loaded which is then garbage collected. This is nice for both dynamic loading and unloading of functionality as well as isolating functionality (for security and stability reasons). OSGi heavily depends on this feature and many application servers try to use this. However, the mechanisms used are not exactly bullet proof and there exist enormous problems with e.g. memory leaking which causes engineers to be very conservative with relying on these mechanisms in a live environment.

More recently, people have started to use dependency injection where the need for something is expressed in the code (e.g. with an annotation) or externally in some configuration file). Then at run time a dependency injecting container tries to fulfill the dependencies by creating the right objects and injecting dependencies. Dependency injection improves testability and modularization enormously.

A feature in maven that people seem to like is its way of dealing with dependencies. You express what you need in the pom file and maven fetches the needed stuff from a repository. The maven, osgi, & spring combo, is about to happen. When it does, you’ll be specifying dependencies in four different places: java imports; annotations, the pom file, and the osgi manifest. But still, I think the combined feature set is worth having.

Language

Twenty years ago, Java was a pretty minimalistic language that took basically the best of 20 years (before that) of OO languages and kept a useful subset. Inevitably, lots got discarded or not considered at all. Some mistakes were made, and the language over time absorbed some less than perfect versions of the stuff that didn’t make it. So, Java has no language support for properties, this was sort of added on by the setter/getter convention introduced in JavaBeans. It has inner classes instead of closures and lambda functions. It has no pure generics (parametrizable types) but some complicated syntactic sugar that gets compiled to non generic code. The initial concurrent programming concepts in the language were complex, broken, and dangerous to use. Subsequent versions tweaked the semantics and added some useful things like the java concurrent package. The language is overly verbose and 20 years after the fact there is now quite a bit of competition from languages that basically don’t suffer from all this. The good news is that most of those have implementations on top of the JVM. Lets not let this degenerate into a language war but clearly the language needs a proper upgrade. IMHO scala could be a good direction but it too has already some compromise embedded and lacks support for the architectural features discussed above. Message passing and functional programming concepts are now seen as important features for scalability. These are tedious at best in Java and Scala supports these well while simultaneously providing a much more concise syntax. Lets just say a replacement of the Java language is overdue. But on the other hand it would be wrong to pick any language as the language. Both .Net and the JVM are routinely used as generic runtimes for all sorts of languages. There’s also the LLVM project, which is a compiler tool chain that includes dynamic compilation in a vm as an option for basically anything GCC can compile.

Artifacts should be transient

So we now have a hypothetical language, with support for all of the above. Lets not linger on the details and move on to deployment and run time. Basically the word compile comes from the early days of computing when people had to punch holes into cards and than compile those into stacks and hand feed them to big, noisy machines. In other words, compilation is a tedious & necessary evil. Java popularized the notion of just in time compilation and partial, dynamic compilation. The main difference here is that just in time compilation merely moves the compilation step to the moment the class is loaded whereas dynamic compilation goes a few steps further and takes into account run-time context to decide if and how to compile. IDEs tend to compile on the fly while you edit. So why, bother with compilation after you finish editing and before you need to load your classes? There is no real technical reason to compile ahead of time beyond the minor one time effort that might affect start up. You might want the option to do this but it should not default to doing it.

So, for most applications, the notion of generating binary artifacts before they are needed is redundant. If nothing needs to be generated, nothing needs to be copied/moved either. This is true for both compiled or interpreted and interpreted languages. A modern Java system basically uses some binary intermediate format that is generated before run-time. That too is redundant. If you have dynamic compilation, you can just take the source code and execute it (while generating any needed artifacts for that on the fly). You can still do in IDE compilation for validation and static analysis purposes. The distinction between interpreted and static compiled languages has become outdated and as scripting languages show, not having to juggle binary artifacts simplifies life quite a bit. In other words, development artifacts (other than the source code) are transient and with the transformation from code to running code automated and happening at run time, they should no longer be a consideration.

That means no more build tools.

Without the need to transform artifacts ahead of run-time, the need for tools doing and orchestrating this also changes. Much of what maven does is basically generating, copying, packaging, gathering, etc. artifacts. An artifact in maven is just a euphemism for a file. Doing this is actually pretty stupid work. With all of those artifacts redundant, why keep maven around at all? The answer to that is of course testing and continuous integration as well as application life cycle management and other good practices (like generating documentation). Except, lots of other different tools are involved with that as well. Your IDE is where you’d ideally review problems and issues. Something like Hudson playing together with your version management tooling is where you’d expect continuous integration to take place and application life cycle management is something that is part of your deployment environment. Architectural features of the language and run-time combined with good built in application and component life cycle removes much of the need of external tooling to support all this and improves interoperability.

Source files need to go as well

Visual age and smalltalk pioneered the notion of non file based program storage where you modify the artifacts in some kind of DB. Intentional programming research basically is about the notion that programs are essentially just interpretations of more abstract things that get transformed (just in time) to executable code or into different views (editable in some cases). Martin Fowler has long been advocating IP and what he refers to as the language workbench. In a nut shell, if you stop thinking of development as editing a text file and start thinking of it as manipulating abstract syntax trees with a variety of tools (e.g. rename refactoring), you sort of get what IP and language workbenches are about. Incidentally, concepts such as APIs, API versions, provided & required interfaces are quite easily implemented in a language workbench like environment.

Storage, versioning, access control, collaborative editing, etc.

Once you stop thinking in terms of files, you can start thinking about other useful features (beyond tree transformations), like versioning or collaborative editing for example. There have been some recent advances in software engineering that I see as key enablers here. Number 1 is that version management systems are becoming decentralized, replicated databases. You don’t check out from git, you clone the repository and push back any changes you make. What if your IDE were working straight into your (cloned) repository? Then deployment becomes just a controlled sequence of replicating your local changes somewhere else (either push based, pull based, or combinations of that. A problem with this is of course that version management systems are still about manipulating text files. So they sort of require you to serialize your rich syntax trees to text and you need tools to unserialize them in your IDE again. So, text files are just another artifact that needs to be discarded.

This brings me to another recent advance: couchdb. Couchdb is one of the non relational databases currently experiencing lots of (well deserved) attention. It doesn’t store tables, it stores structured documents. Trees in other words. Just what we need. It has some nice properties built in, one of which is replication. Its built from the ground up to replicate all over the globe. The grand vision behind couchdb is a cloud of all sorts of data where stuff just replicates to the place it is needed. To accomplish this, it builds on REST, map reduce, and a couple of other cool technology. The point is, couchdb already implements most of what we need. Building a git like revision control system for versioning arbitrary trees or collections of trees on top can’t be that challenging.

Imagine the following sequence of events. Developer A modifies his program. Developer B working on the same part of the software sees the changes (real time of course) and adds some more. Once both are happy they mark the associated task as done. Somewhere on the other side of the planet a test server locally replicates the changes related to the task and finds everything is OK. Eventually the change and other changes are tagged off as a new stable release. A user accesses the application on his phone and at the first opportunity (i.e. connected), the changes are replicated to his local database. End to end the word file or artifact appears nowhere. Also note that the bare minimum of data is transmitted: this is as efficient as it is ever going to get.

Conclusions

Anyway, just some reflections on where we are and where we need to go. Java did a lot of pioneering work in a lot of different domains but it is time to move on from the way our grand fathers operated computers (well, mine won’t touch a computer if he can avoid it but that’s a different story). Most people selling silver bullets in the form of maven, ruby, continuous integration, etc. are stuck in the current thinking. These are great tools but only in the context of what I see as a deeply flawed end to end system. A lot of additional cruft is under construction to support the latest cloud computing trends (which is essentially about managing a lot of files in a distributed environment). My point here is that taking a step back and rethinking things end to end might be worth the trouble. We’re so close to radically changing the way developers work here. Remove files and source code from the equation and what is left for maven to do? The only right answer here is nothing.

Why do I think this needs to happen: well, developers are currently wasting enormous amounts of time on what are essentially redundant things rather than developing software. The last few weeks were pretty bad for me, I was just handling deployment and build configuration stuff. Tedious, slow, and maven is part of this problem.

Update 26 October 2009

Just around the time I was writing this, some people decided to come up with Play, a framework + server inspired by Python Django that preserves a couple of cool features. The best feature: no application server restarts required, just hit F5. Works for Java source changes as well. Clearly, I’m not alone in viewing the Java server side world as old and bloated. Obviously it lacks a bit in functionality. But that’s easily fixed. I wonder how this combines with a decent dependency injection framework. My guess is not well, because dependency injection frameworks require a context (i.e.) state to be maintained and Play is designed to be stateless (like Django). Basically, each save potentially invalidates the context require a full reload of that as well (i.e. a server restart). Seems the play guys have identified the pain point in Java: server side state comes at a price.

maven: good ideas gone wrong

I’ve had plenty of time to reflect on the state of server side Java, technology, and life in general this week. The reason for all this extra ‘quality’ time was because I was stuck in an endless loop waiting for maven to do its thing, me observing it failed in subtly different ways, tweaking some more, and hitting arrow up+enter (repeat last command) and fiddling my thumbs for two more minutes. This is about as tedious as it sounds. Never mind the actual problem, I fixed it eventually. But the key thing to remember here is that I lost a week of my life on stupid book keeping.

On to my observations:

  • I have more xml in my maven pom files than I ever had with my ant build.xml files four years ago, including running unit tests, static code checkers, packaging jars & installers, etc. While maven does a lot of things when I don’t need them to happen, it seems to have an uncanny ability to not do what I want when I need it to or to first do things that are arguably redundant and time consuming.
  • Maven documentation is fragmented over wikis, javadoc of diverse plugins, forum posts, etc. Google can barely make sense of it. Neither can I. Seriously, I don’t care about your particular ideology regarding elegance: just tell me how the fuck I set parameter foo on plugin bar and what its god damn default is and what other parameters I might be interested in exist.
  • For something that is supposed to save me time, I sure as hell am wasting a shit load of time on making it do what I want and watching it do what it does (or not), and fixing the way it works. I had no idea compiling & packaging less than 30 .java files could be so slow.
  • Few people around me dare to touch pom files. It’s like magic and I hate magicians myself. When it doesn’t work they look at me to fix it. I’ve been there before and it was called ant. Maven just moved the problem and didn’t solve a single problem I had five years ago while doing the exact same shit with ant. Nor did it make it any easier.
  • Maven compiling, testing, packaging and deploying defeats the purpose of having incremental compilation and dynamic class (re)loading. It’s just insane how all this application server deployment shit throws you back right to the nineteen seventies. Edit, compile, test, integration-test, package, deploy, restart server, debug. Technically it is possible to do just edit, debug. Maven is part of the problem here, not of the solution. It actually insists on this order of things (euphemistically referred to as a life cycle) and makes you jump through hoops to get your work done in something resembling real time.
  • 9 out of 10 times when maven enters the unit + integration-test phase, I did not actually modify any code. Technically, that’s just a waste of time (which my employer gets to pay for). Maven is not capable of remembering the history of what you did and what has changed since the last time you ran it so like any bureaucrat it basically does maximum damage to compensate for its ignorance.
  • Life used to be simple with a source dir, an editor, a directory of jars and an incremental compiler. Back in 1997, java recompiles took me under 2 seconds on a 486, windows NT 3.51 machine with ‘only’ 32 MB, ultra edit, an IBM incremental java compiler, and a handful of 5 line batch files. Things have gotten slower, more tedious, and definitely not faster since then. It’s not like I have much more lines of code around these days. Sure, I have plenty of dependencies. But those are run-time resolved, just like in 1997, and are a non issue at compile time. However, I can’t just run my code but I have to first download the world, wrap things up in a jar or war, copy it to some other location, launch some application server, etc. before I am in a position to even see if I need to switch back to my editor to fix some minor detail.
  • Your deployment environment requires you to understand the ins and outs of how where stuff needs to be deployed, what directory structures need to be there, etc. Basically if you don’t understand this, writing pom files is going to be hard. If you do understand all this, pom files won’t save you much time and will be tedious instead. You’d be able to write your own bash scripts, .bat files or ant files to achieve the same goals. Really, there’s only so many ways you can zip a directory into a .jar or .war file and copy them over from A to B.
  • Maven is brittle as hell. Few people on your project will understand how to modify a pom file. So they do what they always do, which is copy paste bits and pieces that are known to more or less do what is needed elsewhere. The result is maven hell. You’ll be stuck with no longer needed dependencies, plugins that nobody has a clue about, redundant profiles for phases that are never executed, half broken profiles for stuff that is actually needed, random test failures. It’s ugly. It took me a week to sort out the stinking mess in the project I joined a month ago. I still don’t like how it works. Pom.xml is the new build.xml, nobody gives a shit about what is inside these files and people will happily copy paste fragments until things work well enough for them to move on with their lives. Change one tiny fragment and all hell will break loose because it kicks the shit out of all those wrong assumptions embedded in the pom file.

Enough whining, now on to the solutions.

  • Dependency management is a good idea. However, your build file is the wrong place to express those. OSGI gets this somewhat right, except it still externalizes dependency configuration from the language. Obviously, the solution is to integrate the component model into the language: using something must somehow imply depending on something. Possibly, the specific version of what you depend on is something that you might centrally configure but beyond that: automate the shit out of it, please. Any given component or class should be self describing. Build tools should be able to figure out the dependencies without us writing them down. How hard can it be? That means some none existing language to supersede the existing ones needs to come in existence. No language I know of gets this right.
  • Compilation and packaging are outdated ideas. Basically, the application server is the run-time of your code. Why doesn’t it just take your source code, derive its dependencies and runs it? Every step in between editing and running your code is a potential to introduce mistakes & bugs. Shortening the distance between editor and run-time is good. Compilation is just an optimization. Sure, it’s probably a good idea for the server to cache the results somewhere. But don’t bother us with having to spoon feed it stupid binaries in some weird zip file format. One of the reasons scripting languages are so popular is because it reduces the above mentioned cycle to edit, F5, debug. There’s no technical reason whatsoever why this would not be possible with statically compiled languages like java. Ideally, I would just tell the application server the url of the source repository, give it the necessary credentials and I would just be alt tabbing between my browser and my editor. Everything in between that is stupid work that needs to be automated away.
  • The file system hasn’t evolved since the nineteen seventies. At the intellectual level, you modify a class or lambda function or whatever and that changes some behavior in your program, which you then verify. That’s the conceptual level. In practice you have to worry about how code gets translated into binary (or asciii) blobs on the file system, how to transfer those blobs to some repository (svn, git, whatever), then how to transfer them from wherever they are to wherever they need to be, and how they get picked up by your run-time environment. Eh, that’s just stupid book keeping, can I please have some more modern content management here (version management, rollback, auditing, etc.)? Visual age actually got this (partially) right before it mutated into eclipse: software projects are just databases. There’s no need for software to exist as text files other than nineteen seventies based tool chains.
  • Automated unit, integration and system testing are good ideas. However, squeezing them in between your run-time and your editor is just counter productive. Deploy first, test afterwards, automate & optimize everything in between to take the absolute minimum of time. Inserting automated tests between editing and manual testing is a particularly bad idea. Essentially, it just adds time to your edit debug cycle.
  • XML files are just a fucking tree structures serialized in a particularly tedious way. Pom files are basically arbitrary, schema less xml tree-structures. It’s fine for machine readable data but editing it manually is just a bad idea. The less xml in my projects, the happier I get. The less I need to worry about transforming tree structures into object trees, the happier I get. In short, lets get rid of this shit. Basically the contents of my pom files is everything my programming language could not express. So we need more expressive programming languages, not entirely new ones to complement the existing ones. XML dialects are just programming languages without all of the conveniences of a proper IDE (debuggers, code completion, validation, testing, etc.).

Ultimately, maven is just a stop gap. And not even particularly good at what it does.

update 27 October 2009

Somebody produced a great study on how much time is spent on incremental builds with various build tools. This stuff backs my key argument up really well. The most startling out come:

Java developers spend 1.5 to 6.5 work weeks a year (with an average of 3.8 work weeks, or 152 hours, annually) waiting for builds, unless they are using Eclipse with compile-on-save.

I suspect that where I work, we’re close to 6.5 weeks. Oh yeah, they single out maven as the slowest option here:

It is clear from this chart that Ant and Maven take significantly more time than IDE builds. Both take about 8 minutes an hour, which corresponds to 13% of total development time. There seems to be little difference between the two, perhaps because the projects where you have to use Ant or Maven for incremental builds are large and complex.

So anyone who still doesn’t get what I’m talking about here, build tools like maven are serious time wasters. There exist tools out there that reduce this time to close to 0. I repeat, Pyhton Django = edit, F5, edit F5. No build/restart time whatsoever.

server side osgi, a myth?

Two years ago, I started using OSGI, the popular Java dependency injecting component standard, for an internal project. Fast forward to now and I have a nice set of bundles that depend on, amongst other the OSGI HTTP service.

All along, I’ve been reading how great OSGI is and how flexible it is and how it is the future of server side Java. I was ready to believe it. But to cut to the meat of this blog post: server side OSGI is vaporware. It doesn’t exist. None of the vendors actually support it. Support it as in production quality, well documented, widely used product available right now. I’ve looked at Felix, Tomcat, Equinox,  Jetty, Glassfish, JBoss, etc. and came up with nothing but a few obscure, unsupported, undocumented components. The default HTTP service implementation is not my idea of scalable & production quality. And the connections of existing production quality OSGI containers to existing production quality application servers is sketchy at best.

Frankly, I’m very surprised at this.I know lots of people that claim use OSGI serverside and there are are lots of announcements of vendor X endorsing OSGI bla bla bla fully modularized bla bla bla dependency injection  bla bla bla. That’s great but after two years of OSGI hacking I was hoping for something a little more substantial than what I have found so far:

The best option I came up with is the HTTP servlet bridge from equinox. The documentation for this is either hopelessly out of date or this is a case of abandonware. Basically all the page says is download this bridge.war and good luck. Problem #1 this bridge.war is from 1997 .. eh 2007 :-). Problem #2, I’d like to use a bit newer version of Equinox. Does this work at all? Are people still working on this? Problem #3, this page hasn’t changed substantially since I started using OSGI. Is anyone still working on this or is this a dead project? Are there any users?

Option #2 is to use Apache Felix which apparently can embed Jetty. That’s great but I’m a tomcat guy and am more interested in using tomcat as the outer container than Jetty. Neither the jetty nor the tomcat option is documented properly. I’m not even sure the tomcat option is possible/advisable. Some people hint at this being possible. A particular concern for me is that I need to cluster the damn thing, potentially on a large scale. Is this possible at all? I’m pretty sure people have done this but in terms of production quality code and documentation they have not left much of a trail. The Felix people don’t seem to much documentation in general. There’s of course the gratuitous OSGI tutorial and some hints of how you could use it but that’s it.

This situation is not something I can sell here at Nokia. I need something more substantial, preferably Tomcat or JBoss based that is 1) scalable in a cluster 2) production quality 3) well documented. I’m now pretty far convinced that what I’m looking for doesn’t exist. If I don’t find something soon, I’m going to just have to rip out all the OSGI stuff and switch to a proper dependency injecting container. Spring 3.0 is looking pretty neat for example but a bit heavyweight in my opinion.

Anyway, comments are open and please point out how wrong I am and what information I overlooked :-). My main gripe here is that I just have very little to base a decision on. Sketchy documentation, bits and pieces on blogs and mailinglists but nothing solid. Either OSGI is a genuine server side option or it is just an urban legend (some people have heard of other people that have done this). Everything I’ve seen so far hints at the latter.

I know Jboss 4, Glassfish 3, and Spring Application server are all going to be OSGI based of course. These are far from vaporware but also not exactly production ready. Additionally, being OSGI based is one thing, being able to deploy servlets from OSGI bundles is another thing. Most things I’ve read on this suggests that these servers are not really designed to allow application developers to interact with the OSGI container directly (i.e. deploying bundles, using http service instead of WAR files, etc.).

Java & Toys

After a few months of doing python development, which to me still feels like a straight jacket. I had some Java coding to do last week and promptly wasted a few hours checking out the latest toys, being:

  • Eclipse 3.4 M7
  • Hudson
  • Findbugs for Hudson

Eclipse 3.4 M7 is the first milestone I’ve tried for the upcoming Eclipse release. This is due to me not coding Java much lately; nothing wrong with it otherwise. Normally, I’d probably have switched around M4 already (at least did so for 3.2 and 3.3 cycles). In fact it is a great improvement and several nice productivity enhancements are included. My favorite one is the problem hover that now includes links to quick fixes. So instead of point, click, typing ctrl+0, arrow down (1..*), enter, you can now lean back and point & hover + click. Brilliant. I promptly used it to convert some 1.4 non generics based code into nice generics based code simply by tackling all the generics related warnings one by one essentially only touching the keyboard to suggest a few type parameters Eclipse couldn’t figure out. Introduce generic type parameter + infer generics refactorings are very helpful here. The code of course compiled and executed as expected. No bugs introduced and the test suite still runs fine. Around 5000 lines of code refactored in under 20 minutes. I still have some work to do to remove some redundant casts and to replace some while/for loops with foreach.

Other nice features are the new breadcrumps bar (brilliant!) and a new refactoring to create parameter classes for overly long lists of parameters on methods. Also nice is a refactoring to concatenate String concatenation into StringBuffer.append calls. Although StringBuilder is slightly faster for cases where you don’t need thread safe code (i.e most of the time). The rest is the usual amount of major and minor refinements that I care less about but are nice to have around anyway. One I imagine I might end up using a lot is quickfixes to sort out osgi bundle dependencies. You might recall me complaining about this some time ago. Also be sure to read Peter Kriens reply to this btw. Bnd is indeed nice but tools don’t solve what is in my view a kludge. Both the new eclipse feature and BND are workarounds for the problem that what OSGI is trying to do does not exist at  (and is somewhat at odd with) the Java type level.

Anyway, the second thing I looked into was Hudson, a nice server for continuous integration. It can checkout anything from a wide range of version control systems (subversion supported out of the box, several others through plugins) and run any script you like. It also understands maven and how to launch ant. With the right plugins you can then let it do quite useful things like compiling, running static code analyzers, deploying to a staging server, running test suites, etc. Unlike some stuff I evaluated a few years this actually worked right out of the box and was so easy to set up that I promptly did so for the project I’m working on. Together with loads of plugins that add all sorts of cool functionality, you have just ran out of excuses to not do continuous integration.

One of the plugins I’ve installed so far is an old favorite Findbugs which promptly drew my attention to two potentially dangerous bugs and a minor performance bug in my code reminding me that running this and making sure it doesn’t complain is actually quite important. Of all code checkers, findbugs provides the best mix between finding loads of stuff while not being obnoxious about it without a lot of configuration (like e.g. checkstyle and pmd require to shut the fuck up about stupid stuff I don’t care about) and while actually finding stuff that needs fixing.

While of course Java centric, you can teach Hudson other tricks as well.  So, next on my agenda is creating a job for our python code and hooking that up to pylint and possibly our django unit tests. There’s plugins around for both tasks.

GX WebManager

Before joining Nokia, I worked for a small web startup in the Netherlands called <GX> Creative Online Development during  2004 and 2005. When I started there, I was employee number forty something (I like to think it was 42, but not sure anymore). When I left, they had grown to close to a hundred employees and judging from what I heard since, they’ve continued to grow roughly following Moore’s law in terms of number of employees. Also they seem to have executed the strategy that took shape while I was still their release manager.

When I joined GX, GX WebManager was a pretty advanced in house developed CMS that had gone through several years of field use and evolution already and enjoyed a rapidly growing number of deployments, including many big name Dutch institutions such as KPN, Ajax, ABN-AMRO, etc. At that time it was very much a in house developed thing that nobody from outside the company ever touched. Except through the provided UI of course which was fully AJAX based before the term became fashionable. By the time I left, we had upgraded release processes to push out regular technology releases first internally and later also outside to a growing number of partners that implemented GX WebManager for their customers.

I regularly check the GX website to see what thay have been up to and recently noticed that they pushed out a community edition of GX WebManager. They’ve spent the last few years rearchitecting what was already a pretty cool CMS to begin with to refit it with a standardized content repository (JSR 170) based on Apache Jackrabbit and a OSGI container based on Apache Felix. This architecture has been designed to allow easy creation of extensions by third parties. Martijn van Berkum and Arthur Meyer (product manager and lead architect) were already musing how to do this while I was still there and had gotten pretty far doing initial designs and prototyping . Last year they pushed out GX WebManager 9.0 based on the new architecture to their partners and now 9.4 to the internet community. They seem to have pretty big ambitions to grow internationally, and in my experience the technology and know-how to do it.

So congratulations to them on completing this. If you are in the market for a CMS, go check out their products and portfolio.

Google Android

Update: a slightly updated version of this article has been published on the Javalobby weekly news letter and on the javalobby site itself after Matthew Schmidt invited me to do so.

Update 2: The serverside has linked here as well. Readers coming from there, the version on Javalobby linked above is the latest and also has some discussion attached.

About an hour ago, Google released some additional information on the SDK for Android, its new mobile platform. Since I work for Nokia (whom I of course not represent when writing things on my personal blog, usual disclaimers apply), I’m naturally interested in new software platforms for mobile phones. Additionally, since I’m a Java developer, I’m particularly interested in this one.

I spent the past half hour glancing through the API documentation, just to see what is there. This does not provide me with enough information for a really detailed review but it does allow me to extract some highlights that in my view will matter enormously for platform adoption:

  • The SDK is Java based. No surprise since they announced it but it is nice to see that this doesn’t mean they are doing J2ME but instead use Java as the core implementation platform for all applications on the platform.
  • The Linux kernel and native libraries are just there to run applications on top of Google’s custom JVM Dalvik which is optimized for running on embedded hardware.
  • There is no mention of any native applications or the ability to write and install native applications
  • Particularly, there’s no mention of a browser application. Given Googles involvement in Firefox and their recent announcement of a mobile Firefox, this is somewhat surprising. Browsers are increasingly important for high end phones. Without a good, modern browser, Android is doomed to competing with low end feature phones. Browser seems to be webkit, the same engine that powers the iphone browser and the S60 browser.
  • Google has chosen to not implement full Java or any of the ME variants. This in my view very bad and unnecessary.
  • Instead a small subset of the Java API is implemented. Probably the closest is the J2ME CDC profile (so why not go all the way and save us developers a few headaches)
  • Additionally Google has bundled a few external libraries (httpclient, junit and a few others). That’s nice since they are quite good libraries. I’m especially fond of httpclient, which I miss very much when doing J2ME CLDC development.
  • The bulk of the library concerns android.* packages that control everything from power management, SMS to user interface.
  • I did not spot any OSGi implementation in the package; Google seems to intent to reinvent components and package management. This is disappointing since it is very popular across the Java spectrum, including J2ME where it is already shipping in some products (e.g. Nokia E90).

In my opinion this is all a bit disappointing. Not aligning with an existing profile of Java is a design choice that is regrettable. It makes Android incompatible with everything else out there which is unnecessary in my view. Additionally, Android seems to duplicate a lot of existing functionality from full Java, J2ME and various open source projects. I’m sure that in each case there is some reason for it but the net result seems reinvention of a lot of wheels. Overall, I doubt that Android APIs are significantly faster, more flexible, usable, etc. than what is already out there.

On the other hand the platform seems to be open so not all is lost. This openness comes however with a few Strings attached. Basically, it relies on Java’s security system. You know, the same that is used by operators and phone vendors to completely lock down J2ME to restrict access to interesting features (e.g. making phone calls, installing applications). I’m not saying that Google will do this but they certainly enable operators and phone vendors to do this for them. This is not surprising since in the current market, operators insist on this, especially in the US. The likely result will be that Android application developers will have to deal with locked down phones just like J2ME developers have to deal with that today.

The choice for the Apache 2.0 license is a very wise choice since it is a very liberal license that will make it easy for telecom companies to integrate it with their existing products. Provided that the Android APIs are reasonably well designed, it may be possible to port some or all of it to other platforms. The Apache license ensures that doing so minimizes risk for underlying proprietary platforms.

Additionally, the apache license also allows for some interesting other things to happen. For example, there’s the Apache Harmony project that is still working on a full implementation of Java. Reusing this work might of course also make much of android.* redundant. Additionally, there is a lot of interesting mobile Java code under eclipse’s EPL, which is similar to the Apache license. This includes eSWT, a mobile version of the eclipse user interface framework SWT. Eclipse also provides a popular OSGi implementation called equinox. Again, lack of OSGi is a missed opportunity and I don’t care what they put in its place.

Frankly, I don’t understand why Google intends to ignore the vast amount of existing implementation out there. It seems like a bad case of not invented here to me. Ultimately this will slow adoption. There’s already too many Java platforms for the mobile world and this is yet another one. The opportunity was to align with mainstream Java, as Sun is planning to do over the next few years. Instead Google has chosen to reinvent the wheel. We’ll just have to see how good a job they did. Luckily, the Apache license will allow people to rip this thing apart and do something more productive with it. OpenMoko + some apache licensed Java code might be nice. Also our Nokia Maemo platform can probably benefit from some components. Especially the lower level stuff they’ve done with the VM and kernel might be interesting.

OSGi: some criticism

Over the past few weeks, I’ve dived into the wonderful world called OSGi. OSGi is a standardized (by a consortium and soon also JCP) set of java interfaces and specifications that effectively layer a component model on top of Java. By component I don’t mean that it replaces JavaBeans with something else but that it provides a much improved way of modularizing Java software into blobs that can be deployed independently.

OSGi is currently the closest thing to having support for the stuff that is commonly modeled using architecture description languages. ADLs have been used in industry to manage large software bases. Many ADLs are homegrown systems (e.g. Philips’ KOALA) or simply experimental tools created in a university context (e.g. XADL). OSGi is similar to these languages because:

  • It has a notion of a component (bundles)
  • Dependencies between components
  • Provided and required APIs
  • API versioning

Making such things explicit, first class citizens in a software system is a good thing. It improves manageability and consistency. Over the past few weeks I’ve certainly enjoyed exploring the OSGi framework and its concepts while working on actual code. However, it struck me that a lot of things are needlessly complicated or difficult.

Managing dependencies to other bundles is more cumbersome than handling imports in Java. Normally when I want to use some library, I download it; put it on the classpath; type a few letters and then ctrl+space myself through whatever API it exposes. In OSGi it’s more difficult. You download the bundle (presuming there is one) and then need to decide on which packages you want to use that it exposes.

I’m a big fan of the organize imports feature in eclipse which seems to not understand OSGi imports and exports at all. That means that for one library bundle you may find yourself going back and fort to the manifest file of your bundle to manually add packages you need. Eclipse PDE doesn’t seemt to be so clever. For me that is a step backwards.

Also most libraries don’t actually ship as bundles. Bundles are a new concept that is not backwards compatible with good old jar files (which is the form most 3rd party libraries come in). This is an unnecessary limitation. A more reasonable default would be to treat non OSGi jar files as bundles that simply export everything in it and put everything it imports on the import path. It can’t be that hard to fish that information out of a jar file. At the very least, I’d like a tool to that for me. Alternatively, and this is the solution I would prefer, it should be possible to add the library to the OSGI boot classpath. This allows all bundles that load to access non OSGi libraries and does not require modifications to those libraries at all.

Finally, I just hate having to deal with this retarded manifest file concept. I noticed the bug that requires the manifest to end with a empty line still exists (weird stuff happens if this is missing). This is equally annoying as the notion of having to use tabs instead of spaces in makefiles. I was banging my head against the wall over newline stuff in 1997. The PDE adds a nice frontend to editing the manifest (including friendly warning of the bug if you accidentally remove the newline). But the fact remains that it is a kludge to superimpose stuff on Java that is not part of Java.

Of course with version 1.5 there is now a nicer way to do this using annotations. Understandably, OSGi needs to be backwards compatible with older versions (hence the kludge is excused) but the way forward is obviously to deprecate this mechanism on newer editions of Java. Basically, I want to be able to specify at class and method level constraints with respect to import & export. An additional problem is that packages don’t really have first class representation in java. They are just referred to by name in classes (in the package declaration) but don’t have their own specification. That means it is difficult to add package level annotations (you can work around this using a package-info.java file).