Enforcing code conventions in Java

After many years of working with Java, I finally got around to enforcing code conventions in our project. The problem with code conventions is not agreeing on them (actually this is hard since everybody seems to have their own preferences but that’s beside the point) but enforcing them. For the purpose of enforcing conventions you can choose from a wide variety of code checkers such as checkstyle, pmd, and others. My problem with this approach is that checkers usually end up being a combination of too strict, too verbose, or too annoying. In any case nobody ever checks their output and you need to have the discipline to fix things yourself for any issues detected. Most projects I’ve tried checkstyle on, it finds thousands of stupid issues using the out of the box configuration. Pretty much every Java project I’ve ever been involved with had somewhat vague guidelines on code conventions and a very loose attitude to enforcing these. So, you end up with loads of variation in whitespace, bracket placement, etc. Eventually people stop caring. It’s not a problem worthy of a lot of brain cycles and we are all busy.

Anyway, I finally found a solution to this problem that is completely unintrusive: format source code as part of your build. Simply add the following blurb to your maven build section and save some formatter settings in XML format in your source tree. It won’t fix all your issues but formatting related diffs should be a thing of the past. Either your code is fine, in which case it will pass the formatter unmodified or you messed up, in which case the formatter will fix it for you.

<plugin><!-- mvn java-formatter:format -->
    <groupId>com.googlecode.maven-java-formatter-plugin</groupId>
    <artifactId>maven-java-formatter-plugin</artifactId>
    <version>0.4</version>
    <configuration>
        <configFile>${project.basedir}/inbot-formatter.xml</configFile>
    </configuration>

    <executions>
        <execution>
            <goals>
                <goal>format</goal>
            </goals>
        </execution>
    </executions>
</plugin>

This plugin formats the code using the specified formatting settings XML file and it executes every build before compilation. You can create the settings file by exporting the Eclipse code formatter settings. Intellij users can use these settings as well since recent versions support the eclipse formatter settings file format. The only thing you need to take care off is the organize imports settings in both IDEs. Eclipse comes with a default configuration that is very different from what Intellij does and it is a bit of a pain to fix on the Intellij side. Eclipse has a notion of import groups that are each sorted alphabetically. It comes with four of these groups that represent imports with different prefixes so, javax.* and java.*, etc. are different groups. This behavior is very tedious to emulate in Intellij and out of the scope of the exported formatter settings. For that reason, you may want to consider modifying things on the Eclipse side and simply remove all groups and simply sort all imports alphabetically. This behavior is easy to emulate on Intellij and you can configure both IDEs to organize imports on save, which is good practice. Also, make sure to not allow .* imports and only import what you actually use (why load classes you don’t need?). If everybody does this, the only people causing problems will be those with poorly configured IDEs and their code will get fixed automatically over time.

Anyone doing a mvn clean install to build the project will automatically fix any formatting issues that they or others introduced. Also, the formatter can be configured conservatively and if you set it up right, it won’t mess up things like manually added new lines and other manual formatting that you typically want to keep. But it will fix the small issues like using the right number of spaces (or tabs, depending on your preferences), having whitespace around brackets, braces, etc. The best part: it only adds about 1 second to your build time. So, you can set this up and it basically just works in a way that is completely unintrusive.

Compliance problems introduced by people with poor IDE configuration skills/a relaxed attitude to code conventions (you know who you are) will automatically get fixed this way. Win win. There’s always the odd developer out there who insists on using vi, emacs, notepad, or something similarly archaic that most IDE users would consider cruel and unusual punishment. Not a problem anymore, let them. These masochists will notice that whatever they think is correctly formatted Java might cause the build to create a few diffs on their edits. Ideally, this happens before they commit. And if not, you can yell at them for committing untested code: no excuses for not building your project before a commit.

Versioneye

During our recent acquisition, we had to do a bit of due diligence as well to cover various things related to the financials, legal strucuturing, etc. of Localstream. Part of this process was also doing a license review of our technical assets.

Doing license reviews is one of those chores that software architects are forced to do once in a while. If you are somewhat knowledgable about open source licensing, you’ll know that there are plenty of ways that companies can get themselves in trouble by e.g. inadvertently licensing their code base under GPLv3 simply by using similarly licensed libraries, violating the terms of licenses, reusing inappropriately licensed Github projects, etc. This is a big deal in the corporate world because it exposes you to nasty legal surprises. Doing a license review basically means going through the entire list of dependencies and transitive dependencies (i.e. dependencies of the dependencies) and reviewing the way these are licensed. Basically everything that gets bundled with your software is in scope for this review.

I’ve done similar reviews in Nokia where the legal risks were large enough to justify having a very large legal department that concerned themselves with doing such reviews. They had built tools on top of Lotus Notes to support that job, and there was no small amount of process involved in getting software past them. So, this wasn’t exactly my favorite part of the job. A big problem with these reviews is that software changes constantly and that the review is only valid for the specific combination of versions that you reviewed. Software dependencies change all the time and keeping track of the legal stuff is a hard problem and requires a lot of bookkeeping. This is tedious and big companies get themselves into trouble all the time. E.g. Microsoft has had to withdraw products from the market on several occasions, Oracle and Google have been bickering over Android for ages, and famously Sco ended up suing world + dog over code they thought they owned the copyright of (Linux).

Luckily there’s a new Berlin based company called Versioneye that makes keeping track of dependencies very easy. Versioneye is basically a social network for software. What it does is genius: it connects to your public or private source repositories (Bitbucket and Github are fully supported currently) and then picks apart your projects to look for dependencies in maven pom files, bundler Gemfiles, npm, bower and many other files that basically list all the dependencies for your software. It then builds lists of dependencies and transitive dependencies and provides details on the licenses as well. It does all this automatically. Even better, it also alerts you of outdated dependencies, allows you to follow specific dependencies, and generally solves a lot of headaches when it comes to keeping track of dependencies.

I’ve had the pleasure of drinking more than a few beers with founder Robert Reiz of Versioneye and gave him some feedback early on. I was very impressed with how responsive he and his co-founder Timo were. Basically they delivered all the features I asked for (and more) and they are constantly adding new features. Currently they already support most dependency management tooling out there so chances are very good that whatever you are using is already supported. If not, give them some feedback and chances are that they add it if it makes sense.

So, when the time came to do the Localstream due diligence, using their product was a no-brainer and it got the job done quickly. Versioneye gave me a very detailed overview of all Localstream dependencies across our Java, ruby, and javascript components and made it trivially easy to export a complete list of all our dependencies, versions, and licenses for the Localstream due diligence.

Versioneye is a revolutionary tool that should be very high on the wish list of any software architect responsible for keeping track of software dependencies. This is useful for legal reasons but also a very practical way to stay on top of the tons of dependencies that your software has. If you are responsible for any kind of commercial software development involving open source components, you should take a look at this tool. Signup, import all your Github projects and play with this. It’s free to use for open source projects or to upload dependency files manually. They charge a very reasonable fee for connecting private repositories.

Jruby and Java at Localstream

Update. I’ve uploaded a skeleton project about the stuff in this post to Github. The presentation that I gave on this at Berlin Startup Culture on May 21st can be found here.

The server side code in the Localstream platform is a mix of Jruby and Java code. Over the past few months, I’ve gained a lot of experience using the two together and making the most of idioms and patterns in both worlds.

Ruby purist might wonder why you’d want to use Java at all. Likewise, Java purists might wonder why you’d waste your time doing Jruby at all instead of more hipster friendly languages such as Scala, Clojure, or Kotlin. In this article I want to steer clear of that particular topic and instead focus on more productive things such as what we use for deployment, dependency management, dependency injection, configuration, and logging. Also it is an opportunity to introduce two of my new Github projects:

The Java ecosystem provides a lot of good, very well supported technology. This includes the jvm itself but also libraries such as Google’s guava, misc. Apache frameworks such as httpclient, commons-lang, commons-io, commons-compress, the Spring framework, icu4j, and many others. Equivalents exist for Ruby, but mostly those equivalents leave a lot to be desired in terms of features, performance, design, etc. It didn’t take me long to conclude that a lot of the ruby stuff out there is sub-standard and not quite up to my level of expectations. That’s why I use Jruby instead of Ruby: it allows me to get the best of both worlds. The value of ruby is in its simplicity and the language. The value of Java is access to an enormous amount of good software. Jruby allows me to have both.

Continue reading “Jruby and Java at Localstream”

Maven and my GitHub projects

I have several Java projects on github. Generally these projects require maven to build and some of them require each other.

Deploying to maven central would be the nice way to provide binaries to my users. However, the process that Sonatype imposes for pushing out binaries via their free repository is somewhat tedious (involves jira fiddling, pgp signing), hard to automate, and given the time it takes and the frequency with which I want to push out binaries, it’s just not worth my time.

Basically I want to code, check that it builds, commit, release, deploy and move on with the minimum amount of fuss.

So, I set up my own maven repository. This is good enough for what we do in LocalStre.am. If you wish to use some of my github projects, you may utilize this repository by adding the following snippet to your maven settings.xml.

<!-- this bit goes into the profiles section -->
<profile>
    <id>jillesvangurp</id>
    <repositories>
        <repository>
            <id>release.repo</id>
            <url>http://repo.jillesvangurp.com/releases/</url>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>snapshot.repo</id>
            <url>http://repo.jillesvangurp.com/snapshots/</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>
</profile>
....
<activeProfiles>
    <activeProfile>sonatype</activeProfile>
    <activeProfile>jillesvangurp</activeProfile>
</activeProfiles>

Disclaimer. This repository is provided as is. Particularly the snapshot part is basically there to support my own builds. I may clean it out at any time and there is no guarantee that snapshots of my github projects are there or up to date. Likewise, the releases repository is there to support my own builds. I will likely only keep recent releases there and you shouldn’t use older releases of my projects in any case. Binaries in either maven repository may broken and do untold amounts of damage (which the LICENSE tells you is your problem).

If this worries you, you can always build from source of course. Simply check out the github project (and any dependencies) and maven clean install it or deploy it to your own maven repository.

If not, using my repository may be convenient for you and I generally deploy source and javadoc jars as well.

If any of my projects become popular enough to warrant me putting in some more energy, I can deploy to maven central. I’ve done this for the jsonj project up to 0.7 for example. If this would help you, simply drop me an email and I’ll look into it.

I’ll try to keep this post up to date if things should change.

Puppet

I recently started preparing for deploying our localstre.am codebase to an actual server. So, that means I’m currently having a lot of fun picking up some new skills and playing system administrator (and being aware of my short comings there). World plus dog seems to be recommending using either Chef or Puppet for this  and since I know few good people that are into puppet in a big way and since I’ve seen it used in Nokia, I chose the latter.

After getting things to a usable state, I have a few observations that come from my background of having engineered systems for some time and having a general gut feeling about stuff being acceptable or not that I wanted to share.

  • The puppet syntax is weird. It’s a so-called ruby DSL that tries to look a bit like json and only partially succeeds. So you end up with a strange mix of : and => depending on the context. I might be nit picking here but I don’t think this a particularly well designed DSL. It feels more like a collection of evolved conventions for naming directories and files that happen to be backed by ruby. The conventions, naming, etc. are mostly non sensical. For example the puppet notion of a class is misleading. It’s not a class. It doesn’t have state and you don’t instantiate it. No objects are involved either. It’s more close to a ruby module. But in puppet, a module is actually a directory of puppet stuff (so more like a package). Ruby leaks through in lots of places anyway so why not stick with the conventions of that language? For example by using gems instead of coming up with your own convoluted notion of a package (aka module in puppet). It feels improvised and gobbled together. Almost like the people who built this had no clue what they were doing and changed their minds several times. Apologies for being harsh if that offended you BTW ;-).
  • The default setup of puppet (and chef) assumes a lot of middleware that doesn’t make any sense whatsoever for most smaller deployments (or big deployments IMNSHO). Seriously, I don’t want a message broker anywhere near my servers any time soon, especially not ActiveMQ. The so-called masterless (puppet) or solo (chef) setups are actually much more appropriate for most people.  They are more simple and have less moving parts. That’s a good thing when it comes to deployments.
  • It tries to be declarative. This is mostly a good thing but sometimes it is just nice to have an implicit order of things following from the order in which you specify things. Puppet forces you to be explicit about order and thus ends up being very verbose about this. Most of that verbosity is actually quite pointless. Sometimes A really comes before B if I specify it in that order in one file.
  • It’s actually quite verbose compared to the equivalent bash script when it comes to basic stuff like for example starting a service, copying a file from a to b, etc. Sometimes a “cp foo bar; chmod 644 bar” just kinda does the trick. It kind of stinks that you end up with these five line blobs for doing simple things like that. Why make that so tedious?
  • Like maven and ant in the Java world it, it tries to be platform agnostic but only partially succeeds. A lot of platform dependencies creep in any way and generally puppet templates are not very portable. Things like package names, file locations, service names, etc. end up being platform specific anyway.
  • Speaking of which, puppet is more like ant than like maven. Like ant, all puppet does is provide the means to do different things. It doesn’t actually provide a notion of a sensible default way that things are done that you then customize, which is what maven does instead. Not that I’m a big fan of maven but with puppet you basically have to baby sit the deployment and worry about all the silly little details that are (or should be) bog standard between deployments: creating users, setting up & managing ssh keys, ensuring processes run with the appropriate restrictions, etc. This is a lot of work and like a lot of IT stuff it feels repetitive and thus is a good candidate for automation. Wait … wasn’t puppet supposed to be that solution? The puppet module community provides some reusable stuff but its just bits and pieces really and not nearly enough for having a sensible production ready setup for even the simplest of applications. It doesn’t look like I could get much value out of that community.

So, I think puppet at this stage is a bit of a mixed bag and I still have to do a lot of work to actually produce a production ready system. Much more than I think is justified by the simplicity of real world setups that I’ve seen in the wild. Mostly running a ruby or java application is not exactly rocket science. So, why exactly does this stuff continue to be so hard & tedious despite a multi billion dollar industry trying to fix this for the last 20 years or so?

I don’t think puppet is the final solution in devops automation. It is simply too hard to do things with puppet and way too easy to get it wrong as well. There’s too much choice, a lack of sensible policy, and way too many pitfalls. It being an improvement at all merely indicates how shit things used to be.

Puppet feels more like a tool to finish the job that linux distributors apparently couldn’t be bothered to do in arbitrary ways than like a tool to produce reliable & reproducible production quality systems at this point and I could really use a tool that does the latter without the drama and attitude. What I need is sensible out of the box experience for the following use case: here’s a  war file, deploy that on those servers.

Anyway, I started puppetizing our system last week and have gotten it to the point where I can boot a bunch of vagrant virtual machines with the latest LTS ubuntu and have them run localstre.am in a clustered setup. Not bad for a week of tinkering but I’m pretty sure I could have gotten to that point without puppet as well (possibly sooner even). And, I still have a lot of work to do to setup a wide range of things that I would have hoped would be solved problems (logging, backups, firewalls, chroot, load balancing a bog standard, stateless http server, etc). Most of this falls in the category of non value adding stuff that somebody simply has to do. Given that we are a two person company and I’m the CTO/server guy, that would be me.

I of course have the benefit of hindsight from my career in Nokia where I watched Nokia waste/invest tens of millions on deploying simple bog standard Java applications (mostly) to servers for a few years. It seems simple things like “given a war file, deploy the damn thing to a bunch of machines” get very complicated when you grow the number of people involved. I really want to avoid needing a forty people ops team to do stupid shit like that.

So, I cut some corners. My time is limited and my bash skills are adequate enough that I basically only use puppet to get the OS in a usable enough state that I can hand off to to a bash script to do the actual work of downloading, untarring, chmodding, etc. needed to  get our application running. Not very puppet like but at least it gets the job done in 40 lines of code or so without intruding too much on my time. In those 40 lines, I install the latest sun jdk (tar ball), latest jruby (another tar ball), our code base, and the two scripts to start elastic search and our jetty/mizuno based app server.

What would be actually useful is reusable machine templates for bog standard things like php and ruby capable servers, java tomcat servers, apache load balancers, etc with sensible hardened configurations, logging, monitoring, etc. The key benefit would be inheriting from a sensible setup and only changing the bits that actually need changing. It seems that is too much to ask for at this point and consequently hundreds of thousands of system administrators (or the more hipster devops if you are so inclined) continue to be busy creating endless minor variations of the same systems.

Jsonj: a new library for working with JSon in Java

Update 11 July 2011: I’ve done several commits on github since announcing the project here. New features have been added; bugs have been fixed; you can find jsonj on maven central now as well; etc. In short, head over to github for the latest on this.

I’ve just uploaded a weekend project to github. So, here it is, jsonj. Enjoy.

If you read my post on json a few weeks ago, you may have guessed that I’m not entirely happy with the state of json in Java relative to other languages that come with native support for json. I can’t fix that entirely but I felt I could do a better job than most other frameworks I’ve been exposed to.

So, I sat down on Sunday and started pushing this idea of just taking the Collections framework and combining that with the design of the Json classes in GSon, which I use at work currently, and throwing in some useful ideas that I’ve applied at work. The result is a nice, compact little framework that does most of what I need it to do. I will probably add a few more features to it and expand some of the existing ones. I use some static methods at work that I can probably do in a much nicer way in this framework.

Continue reading “Jsonj: a new library for working with JSon in Java”

log4j, maven, surefire, jetty and how to make it work

I spend some time yesterday on making log4j behave. Not for the first time (gave up on several occasions) and I was getting thoroughly frustrated with how my logs refuse to conform to my log4j configuration, or rather any type of configuration. This time, I believe I succeeded and since I know plenty of others must be facing the exact same misery and since most of the information out there is downright misleading in the sense of presenting all types of snake oil solutions that actually don’t change a thing, here’s a post that offers a proper analysis of the problem and a way out. That, and it’s a nice note to self as well. I just know that I’ll need to set this up again some day.

In a nutshell, the problem is that there are multiple ways of doing logging in Java and one in particular, Apache common-logging, is misbehaving. This trusty little library has not evolved significantly since about 2006 and is depended on by just about any dependency in the maven repository that does logging, mostly for historical reasons. Some others depend on log4j directly and yet some others depend on slf4j (Simple Logging Facade for Java). Basically, you are extremely likely to have a transitive dependency on all of these and even a few dependencies on JDK logging (introduced in Java 1.4).

The main goal of commons.logging is not having to choose log4j or JDK logging. It acts as a facade and picks one of them using some funky reflection. Nice but most sysadmins and developers I have worked with seem to favor log4j anyway and hate commons-logging with a passion. In our case, all our projects depend on log4j directly and that’s just the way it is.

One of the nasty things with commons-logging is that it behaves weirdly in complex class loading situations. Like in maven or a typical application server. As a result, it takes over orchestration of the logs for basically the whole application and wrongly assumes that you want to use jdk logging or some log4j configuration buried deep in one of your dependencies. WTF is up with that btw, don’t ship logging configuration with a library. Just don’t.

Symptoms: you configure logger foo in log4j.xml to STFU below ERROR level and when running your maven tests (or even from eclipse) or in your application server it barfs all over the place at INFO level, which is the unconfigured default. To top it off, it does this using an appender you sure as hell did not configure. Double check, yes log4j is on the classpath, it finds your configuration, it even creates the log file you defined an appender for but nothing shows up there. You can find out what log4j is up to with -Dlog4j.debug=true. So, log4j is there, configured and all but commons-logging is trying to be smart and is redirecting all logged stuff, including the stuff actually logged with log4j directly, to the wrong place. To add to your misery, you might have partly succeeded with your attempts to get log4j working so now you have stuff from different log libraries showing up in the console.

A decent workaround in that case is to define a file appender, which will be free from non log4j stuff. This more or less is the advice for production deployments: don’t use a console logger because dependencies are prone to hijacking the console for all sorts of purposes.

So, good advice, but less than satisfactory. To fix the problem properly, make sure you don’t have commons-logging on the classpath. At all. This will break all the stuff that depends on it being there. Fix that by using slf4j instead. Slf4j comes in several maven modules. I used the following ones:

  1. jcl-over-slf4j is a drop-in, API compatible replacement for commons logging. It writes messages logged through commons-logging using slf4j, which is similar to commons-logging but behaves much nicer (i.e. it actually works). It’s designed to fix the problem we are dealing with here. The only reason it exists is because commons-logging is hopelessly broken.
  2. slf4j-api is used by dependencies already depending on slf4j
  3. slf4j-log4j12 the backend for log4j. If this is on the classpath slf4j will use log4j for its output. You want this.

That’s it. Here’s what I had to do to get a properly working configuration:

  1. Use mvn dependency:tree to find out which dependencies are transitively/directly depending on commons-logging.
  2. fix all of these dependencies with a
    <exclusions>
        <exclusion>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
        </exclusion>
    </exclusions>
  3. You might have to iterate fixing the dependencies and rerunning mvn dependency:tree since only the first instance of commons-logging found will used transitively.
  4. Now add these dependencies to your pom.xml:
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.5.10</version>
    </dependency>                
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>jcl-over-slf4j</artifactId>
        <version>1.5.10</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>1.5.10</version>
    </dependency>
  5. Maven plugins have their own dependencies, separately from your normal dependencies. Make that you add the three slf4j dependencies to surefire, jetty, and other relevant plugins. At least jetty seems to already depend on slf4j.
  6. Finally make sure that your plugins have system properties defining log4j.configuration=file:[log4j config location]. Most of the googled advice on this topic covers this (and not much else). Some plugins can be a bit hard to configure due to the fact that they fork off separate processes.

That should do the trick, assuming you have log4j on the classpath of course.

Maven: the way forward

A bit longer post today. My previous blog post set me off pondering on a couple of things that I have been pondering on before that sort of fit nicely together in a potential way forward. In this previous post and also this post, I spent a lot of words criticizing maven. People would be right to criticize me for blaming maven. However, that would be the wrong way to take my criticism. There’s nothing wrong with maven, it just annoys the hell out of me that it is needed and that I need to spend so much time waiting for it. In my view, maven is a symptom of a much bigger underlying problem: the java server side world (or rather the entire solution space for pretty much all forms of development) is bloated with tools, frameworks, application servers, and other stuff designed to address tiny problems with each other. Together, they sort of work but it isn’t pretty. What if we’d wipe all of that away, very much like the Sun people did when they designed Java 20 years ago? What would be different? What would be the same? I cannot of course see this topic separately from my previous career as a software engineering researcher. In my view there have been a lot of ongoing developments in the past 20 years that are now converging and morphing into something that could radically improve over the existing state of the art. However, I’m not aware of any specific projects taking on this issue in full even though a lot of people are working on parts of the solution. What follows is essentially my thoughts on a lot of topics centered around taking Java (the platform, not necessarily the language) as a base level and exploring how I would like to see the platform morph into something worthy of the past 40 years of research and practice.

Architecture

Lets start with the architecture level. Java packages were a mistake, which is now widely acknowledged. .Net namespaces are arguably better and OSGi bundles with explicit required and provided APIs as well as API versioning are better still. To scale software into the cloud where it must coexist with other software, including different (or identical) versions of itself, we need to get a grip on architecture.

The subject has been studied extensively (see here fore a nice survey of some description languages) and I see OSGi as the most successful implementation to date that preserves important features that most other development platforms currently lack, omit, or half improvise. The main issue with OSGi is that it layers stuff on top of Java but is not really a part of it. Hence you end up with a mix of manifest files that go into jar files; annotations that go into your source code; and cruft in the form of framework extensions to hook everything up, complete with duplicate functionality for logging, publish subscribe patterns, and even web service frameworks. The OSGi people are moving away towards a more declarative approach. Bring this to its ultimate conclusion and you end up with language level support for basically all that OSGi is trying to do. So, explicit provided and required APIs, API versioning, events, dynamic loading/unloading, isolation.

A nice feature of Java that OSGi relies on is the class loader. When used properly, it allows you to create a class loader, let it load classes, execute the functionality, and then destroy the class loader and all the stuff it loaded which is then garbage collected. This is nice for both dynamic loading and unloading of functionality as well as isolating functionality (for security and stability reasons). OSGi heavily depends on this feature and many application servers try to use this. However, the mechanisms used are not exactly bullet proof and there exist enormous problems with e.g. memory leaking which causes engineers to be very conservative with relying on these mechanisms in a live environment.

More recently, people have started to use dependency injection where the need for something is expressed in the code (e.g. with an annotation) or externally in some configuration file). Then at run time a dependency injecting container tries to fulfill the dependencies by creating the right objects and injecting dependencies. Dependency injection improves testability and modularization enormously.

A feature in maven that people seem to like is its way of dealing with dependencies. You express what you need in the pom file and maven fetches the needed stuff from a repository. The maven, osgi, & spring combo, is about to happen. When it does, you’ll be specifying dependencies in four different places: java imports; annotations, the pom file, and the osgi manifest. But still, I think the combined feature set is worth having.

Language

Twenty years ago, Java was a pretty minimalistic language that took basically the best of 20 years (before that) of OO languages and kept a useful subset. Inevitably, lots got discarded or not considered at all. Some mistakes were made, and the language over time absorbed some less than perfect versions of the stuff that didn’t make it. So, Java has no language support for properties, this was sort of added on by the setter/getter convention introduced in JavaBeans. It has inner classes instead of closures and lambda functions. It has no pure generics (parametrizable types) but some complicated syntactic sugar that gets compiled to non generic code. The initial concurrent programming concepts in the language were complex, broken, and dangerous to use. Subsequent versions tweaked the semantics and added some useful things like the java concurrent package. The language is overly verbose and 20 years after the fact there is now quite a bit of competition from languages that basically don’t suffer from all this. The good news is that most of those have implementations on top of the JVM. Lets not let this degenerate into a language war but clearly the language needs a proper upgrade. IMHO scala could be a good direction but it too has already some compromise embedded and lacks support for the architectural features discussed above. Message passing and functional programming concepts are now seen as important features for scalability. These are tedious at best in Java and Scala supports these well while simultaneously providing a much more concise syntax. Lets just say a replacement of the Java language is overdue. But on the other hand it would be wrong to pick any language as the language. Both .Net and the JVM are routinely used as generic runtimes for all sorts of languages. There’s also the LLVM project, which is a compiler tool chain that includes dynamic compilation in a vm as an option for basically anything GCC can compile.

Artifacts should be transient

So we now have a hypothetical language, with support for all of the above. Lets not linger on the details and move on to deployment and run time. Basically the word compile comes from the early days of computing when people had to punch holes into cards and than compile those into stacks and hand feed them to big, noisy machines. In other words, compilation is a tedious & necessary evil. Java popularized the notion of just in time compilation and partial, dynamic compilation. The main difference here is that just in time compilation merely moves the compilation step to the moment the class is loaded whereas dynamic compilation goes a few steps further and takes into account run-time context to decide if and how to compile. IDEs tend to compile on the fly while you edit. So why, bother with compilation after you finish editing and before you need to load your classes? There is no real technical reason to compile ahead of time beyond the minor one time effort that might affect start up. You might want the option to do this but it should not default to doing it.

So, for most applications, the notion of generating binary artifacts before they are needed is redundant. If nothing needs to be generated, nothing needs to be copied/moved either. This is true for both compiled or interpreted and interpreted languages. A modern Java system basically uses some binary intermediate format that is generated before run-time. That too is redundant. If you have dynamic compilation, you can just take the source code and execute it (while generating any needed artifacts for that on the fly). You can still do in IDE compilation for validation and static analysis purposes. The distinction between interpreted and static compiled languages has become outdated and as scripting languages show, not having to juggle binary artifacts simplifies life quite a bit. In other words, development artifacts (other than the source code) are transient and with the transformation from code to running code automated and happening at run time, they should no longer be a consideration.

That means no more build tools.

Without the need to transform artifacts ahead of run-time, the need for tools doing and orchestrating this also changes. Much of what maven does is basically generating, copying, packaging, gathering, etc. artifacts. An artifact in maven is just a euphemism for a file. Doing this is actually pretty stupid work. With all of those artifacts redundant, why keep maven around at all? The answer to that is of course testing and continuous integration as well as application life cycle management and other good practices (like generating documentation). Except, lots of other different tools are involved with that as well. Your IDE is where you’d ideally review problems and issues. Something like Hudson playing together with your version management tooling is where you’d expect continuous integration to take place and application life cycle management is something that is part of your deployment environment. Architectural features of the language and run-time combined with good built in application and component life cycle removes much of the need of external tooling to support all this and improves interoperability.

Source files need to go as well

Visual age and smalltalk pioneered the notion of non file based program storage where you modify the artifacts in some kind of DB. Intentional programming research basically is about the notion that programs are essentially just interpretations of more abstract things that get transformed (just in time) to executable code or into different views (editable in some cases). Martin Fowler has long been advocating IP and what he refers to as the language workbench. In a nut shell, if you stop thinking of development as editing a text file and start thinking of it as manipulating abstract syntax trees with a variety of tools (e.g. rename refactoring), you sort of get what IP and language workbenches are about. Incidentally, concepts such as APIs, API versions, provided & required interfaces are quite easily implemented in a language workbench like environment.

Storage, versioning, access control, collaborative editing, etc.

Once you stop thinking in terms of files, you can start thinking about other useful features (beyond tree transformations), like versioning or collaborative editing for example. There have been some recent advances in software engineering that I see as key enablers here. Number 1 is that version management systems are becoming decentralized, replicated databases. You don’t check out from git, you clone the repository and push back any changes you make. What if your IDE were working straight into your (cloned) repository? Then deployment becomes just a controlled sequence of replicating your local changes somewhere else (either push based, pull based, or combinations of that. A problem with this is of course that version management systems are still about manipulating text files. So they sort of require you to serialize your rich syntax trees to text and you need tools to unserialize them in your IDE again. So, text files are just another artifact that needs to be discarded.

This brings me to another recent advance: couchdb. Couchdb is one of the non relational databases currently experiencing lots of (well deserved) attention. It doesn’t store tables, it stores structured documents. Trees in other words. Just what we need. It has some nice properties built in, one of which is replication. Its built from the ground up to replicate all over the globe. The grand vision behind couchdb is a cloud of all sorts of data where stuff just replicates to the place it is needed. To accomplish this, it builds on REST, map reduce, and a couple of other cool technology. The point is, couchdb already implements most of what we need. Building a git like revision control system for versioning arbitrary trees or collections of trees on top can’t be that challenging.

Imagine the following sequence of events. Developer A modifies his program. Developer B working on the same part of the software sees the changes (real time of course) and adds some more. Once both are happy they mark the associated task as done. Somewhere on the other side of the planet a test server locally replicates the changes related to the task and finds everything is OK. Eventually the change and other changes are tagged off as a new stable release. A user accesses the application on his phone and at the first opportunity (i.e. connected), the changes are replicated to his local database. End to end the word file or artifact appears nowhere. Also note that the bare minimum of data is transmitted: this is as efficient as it is ever going to get.

Conclusions

Anyway, just some reflections on where we are and where we need to go. Java did a lot of pioneering work in a lot of different domains but it is time to move on from the way our grand fathers operated computers (well, mine won’t touch a computer if he can avoid it but that’s a different story). Most people selling silver bullets in the form of maven, ruby, continuous integration, etc. are stuck in the current thinking. These are great tools but only in the context of what I see as a deeply flawed end to end system. A lot of additional cruft is under construction to support the latest cloud computing trends (which is essentially about managing a lot of files in a distributed environment). My point here is that taking a step back and rethinking things end to end might be worth the trouble. We’re so close to radically changing the way developers work here. Remove files and source code from the equation and what is left for maven to do? The only right answer here is nothing.

Why do I think this needs to happen: well, developers are currently wasting enormous amounts of time on what are essentially redundant things rather than developing software. The last few weeks were pretty bad for me, I was just handling deployment and build configuration stuff. Tedious, slow, and maven is part of this problem.

Update 26 October 2009

Just around the time I was writing this, some people decided to come up with Play, a framework + server inspired by Python Django that preserves a couple of cool features. The best feature: no application server restarts required, just hit F5. Works for Java source changes as well. Clearly, I’m not alone in viewing the Java server side world as old and bloated. Obviously it lacks a bit in functionality. But that’s easily fixed. I wonder how this combines with a decent dependency injection framework. My guess is not well, because dependency injection frameworks require a context (i.e.) state to be maintained and Play is designed to be stateless (like Django). Basically, each save potentially invalidates the context require a full reload of that as well (i.e. a server restart). Seems the play guys have identified the pain point in Java: server side state comes at a price.

maven: good ideas gone wrong

I’ve had plenty of time to reflect on the state of server side Java, technology, and life in general this week. The reason for all this extra ‘quality’ time was because I was stuck in an endless loop waiting for maven to do its thing, me observing it failed in subtly different ways, tweaking some more, and hitting arrow up+enter (repeat last command) and fiddling my thumbs for two more minutes. This is about as tedious as it sounds. Never mind the actual problem, I fixed it eventually. But the key thing to remember here is that I lost a week of my life on stupid book keeping.

On to my observations:

  • I have more xml in my maven pom files than I ever had with my ant build.xml files four years ago, including running unit tests, static code checkers, packaging jars & installers, etc. While maven does a lot of things when I don’t need them to happen, it seems to have an uncanny ability to not do what I want when I need it to or to first do things that are arguably redundant and time consuming.
  • Maven documentation is fragmented over wikis, javadoc of diverse plugins, forum posts, etc. Google can barely make sense of it. Neither can I. Seriously, I don’t care about your particular ideology regarding elegance: just tell me how the fuck I set parameter foo on plugin bar and what its god damn default is and what other parameters I might be interested in exist.
  • For something that is supposed to save me time, I sure as hell am wasting a shit load of time on making it do what I want and watching it do what it does (or not), and fixing the way it works. I had no idea compiling & packaging less than 30 .java files could be so slow.
  • Few people around me dare to touch pom files. It’s like magic and I hate magicians myself. When it doesn’t work they look at me to fix it. I’ve been there before and it was called ant. Maven just moved the problem and didn’t solve a single problem I had five years ago while doing the exact same shit with ant. Nor did it make it any easier.
  • Maven compiling, testing, packaging and deploying defeats the purpose of having incremental compilation and dynamic class (re)loading. It’s just insane how all this application server deployment shit throws you back right to the nineteen seventies. Edit, compile, test, integration-test, package, deploy, restart server, debug. Technically it is possible to do just edit, debug. Maven is part of the problem here, not of the solution. It actually insists on this order of things (euphemistically referred to as a life cycle) and makes you jump through hoops to get your work done in something resembling real time.
  • 9 out of 10 times when maven enters the unit + integration-test phase, I did not actually modify any code. Technically, that’s just a waste of time (which my employer gets to pay for). Maven is not capable of remembering the history of what you did and what has changed since the last time you ran it so like any bureaucrat it basically does maximum damage to compensate for its ignorance.
  • Life used to be simple with a source dir, an editor, a directory of jars and an incremental compiler. Back in 1997, java recompiles took me under 2 seconds on a 486, windows NT 3.51 machine with ‘only’ 32 MB, ultra edit, an IBM incremental java compiler, and a handful of 5 line batch files. Things have gotten slower, more tedious, and definitely not faster since then. It’s not like I have much more lines of code around these days. Sure, I have plenty of dependencies. But those are run-time resolved, just like in 1997, and are a non issue at compile time. However, I can’t just run my code but I have to first download the world, wrap things up in a jar or war, copy it to some other location, launch some application server, etc. before I am in a position to even see if I need to switch back to my editor to fix some minor detail.
  • Your deployment environment requires you to understand the ins and outs of how where stuff needs to be deployed, what directory structures need to be there, etc. Basically if you don’t understand this, writing pom files is going to be hard. If you do understand all this, pom files won’t save you much time and will be tedious instead. You’d be able to write your own bash scripts, .bat files or ant files to achieve the same goals. Really, there’s only so many ways you can zip a directory into a .jar or .war file and copy them over from A to B.
  • Maven is brittle as hell. Few people on your project will understand how to modify a pom file. So they do what they always do, which is copy paste bits and pieces that are known to more or less do what is needed elsewhere. The result is maven hell. You’ll be stuck with no longer needed dependencies, plugins that nobody has a clue about, redundant profiles for phases that are never executed, half broken profiles for stuff that is actually needed, random test failures. It’s ugly. It took me a week to sort out the stinking mess in the project I joined a month ago. I still don’t like how it works. Pom.xml is the new build.xml, nobody gives a shit about what is inside these files and people will happily copy paste fragments until things work well enough for them to move on with their lives. Change one tiny fragment and all hell will break loose because it kicks the shit out of all those wrong assumptions embedded in the pom file.

Enough whining, now on to the solutions.

  • Dependency management is a good idea. However, your build file is the wrong place to express those. OSGI gets this somewhat right, except it still externalizes dependency configuration from the language. Obviously, the solution is to integrate the component model into the language: using something must somehow imply depending on something. Possibly, the specific version of what you depend on is something that you might centrally configure but beyond that: automate the shit out of it, please. Any given component or class should be self describing. Build tools should be able to figure out the dependencies without us writing them down. How hard can it be? That means some none existing language to supersede the existing ones needs to come in existence. No language I know of gets this right.
  • Compilation and packaging are outdated ideas. Basically, the application server is the run-time of your code. Why doesn’t it just take your source code, derive its dependencies and runs it? Every step in between editing and running your code is a potential to introduce mistakes & bugs. Shortening the distance between editor and run-time is good. Compilation is just an optimization. Sure, it’s probably a good idea for the server to cache the results somewhere. But don’t bother us with having to spoon feed it stupid binaries in some weird zip file format. One of the reasons scripting languages are so popular is because it reduces the above mentioned cycle to edit, F5, debug. There’s no technical reason whatsoever why this would not be possible with statically compiled languages like java. Ideally, I would just tell the application server the url of the source repository, give it the necessary credentials and I would just be alt tabbing between my browser and my editor. Everything in between that is stupid work that needs to be automated away.
  • The file system hasn’t evolved since the nineteen seventies. At the intellectual level, you modify a class or lambda function or whatever and that changes some behavior in your program, which you then verify. That’s the conceptual level. In practice you have to worry about how code gets translated into binary (or asciii) blobs on the file system, how to transfer those blobs to some repository (svn, git, whatever), then how to transfer them from wherever they are to wherever they need to be, and how they get picked up by your run-time environment. Eh, that’s just stupid book keeping, can I please have some more modern content management here (version management, rollback, auditing, etc.)? Visual age actually got this (partially) right before it mutated into eclipse: software projects are just databases. There’s no need for software to exist as text files other than nineteen seventies based tool chains.
  • Automated unit, integration and system testing are good ideas. However, squeezing them in between your run-time and your editor is just counter productive. Deploy first, test afterwards, automate & optimize everything in between to take the absolute minimum of time. Inserting automated tests between editing and manual testing is a particularly bad idea. Essentially, it just adds time to your edit debug cycle.
  • XML files are just a fucking tree structures serialized in a particularly tedious way. Pom files are basically arbitrary, schema less xml tree-structures. It’s fine for machine readable data but editing it manually is just a bad idea. The less xml in my projects, the happier I get. The less I need to worry about transforming tree structures into object trees, the happier I get. In short, lets get rid of this shit. Basically the contents of my pom files is everything my programming language could not express. So we need more expressive programming languages, not entirely new ones to complement the existing ones. XML dialects are just programming languages without all of the conveniences of a proper IDE (debuggers, code completion, validation, testing, etc.).

Ultimately, maven is just a stop gap. And not even particularly good at what it does.

update 27 October 2009

Somebody produced a great study on how much time is spent on incremental builds with various build tools. This stuff backs my key argument up really well. The most startling out come:

Java developers spend 1.5 to 6.5 work weeks a year (with an average of 3.8 work weeks, or 152 hours, annually) waiting for builds, unless they are using Eclipse with compile-on-save.

I suspect that where I work, we’re close to 6.5 weeks. Oh yeah, they single out maven as the slowest option here:

It is clear from this chart that Ant and Maven take significantly more time than IDE builds. Both take about 8 minutes an hour, which corresponds to 13% of total development time. There seems to be little difference between the two, perhaps because the projects where you have to use Ant or Maven for incremental builds are large and complex.

So anyone who still doesn’t get what I’m talking about here, build tools like maven are serious time wasters. There exist tools out there that reduce this time to close to 0. I repeat, Pyhton Django = edit, F5, edit F5. No build/restart time whatsoever.

Server side development sucks

Warning: long rant 🙂

The past half year+ I’ve been ‘enjoying’ myself with lots of technical things related to working with Java, the spring framework, maven, unit testing and lots of command line stuff. And while I like my job, I have to say: it can be enormously tedious from time to time.

If you are coding Java in Eclipse, life is good. Eclipse is enormously helpful and gives you more or less real time feedback on how you are doing with respect to warnings, compilation errors, etc. This is great because it saves you from manual edit compile debug fix cycles, which take time and are frustrating. Frustrating however is what I’d call the current state of server side Java, which takes place mostly outside Eclipse. I spend a shit load of time on a day to day basis waiting for my application server to catch up with what I did in Eclipse, only to find that some minor typo is blocking my progress. So shut down server, edit, compile, package, deploy, wait a minute or so, get the server in the same state you had before and see if it works. Repeat, endlessly. That’s more or less my day.

There’s lots of tools to make this easier. Maven is nice, but it is also a huge time waster with its insistence on checking for updates of everything you have every time you try to use it (20 dependencies is 20 GET requests to your maven repository) and on top of that running the test suite as well. Useful, except if you’ve done this 20 times in the last hour already and you are pretty damn sure you are not interested in whether it will pass this time since you just want to know if that one typo you fixed was actually good enough. Then there are ways of hooking up application servers to eclipse and making them be somewhat more reasonable about requiring full shutdown, deploy and restart. Still it is tedious. And it doesn’t help that maven is completely oblivious about this feature, leaving you to set up things manually. Or not, since that’s not exactly trivial.

The maven way is “our way or the highway”. Eclipse and application servers are part of the highway and while there is the maven eclipse plugin and the eclipse maven plugin (yes, you read that right), the point of both is that there is a (huge) gap to bridge. They each have their own idea of where source code and binaries go and where dependencies come from, or indeed where stuff gets deployed to be debugged. Likewise, maven’s idea of application server integration is wrapping the start up process with some plugin. It does nothing to speed up the actual process of getting the application server up and running. It just saves you from having to start it manually. Eclipse plugins exist to do the same. And of course getting the maven and eclipse plugins to play nice with each other is kind of a challenge. Essentially, there are three worlds: maven, eclipse, and the application server and you’ll lose shit loads of development time watching one catch up with the other.

I have in the back of my mind still last year’s project which was based on python Django. Last year, my job was like this: edit, alt+tab to browser, F5, test, alt+tab back to eclipse, fix, etc. I had 1-3 second roundtrips between my browser and editor, apache was loading the python files straight from my svn work directory. We had a staging server that updated with a cron job that did nothing but “svn up”. Since then I’ve had the pleasure of debating the merits of using dynamic languages in a server side environment in a place where everybody is partying on the Java bandwagon like it’s 1999. Well here it is: you spend a shitload less time waiting for maven or application servers to finish whatever they think needs to be done. On top of that, quite a lot of server side Java is actually scripting. Don’t fool yourself into believing otherwise. We use spring web flow, which means my application logic is part Java, part xhtml with jsf (and a half dozen xml name spaces for that), part xml definitions for webflow, part definitions for spring beans and part Javascript. Guess where you can find bugs: all of them. Guess which ones Eclipse actually provides real time feedback on: Java only. So basically all the disadvantages of using a scripting environment (run-time bug discovery) without most of the advantages (like fast edit-test round trips).

So it is not surprising to me that scripting languages are winning over a lot of people lately. You write less code, in less languages, and on top you get more time to spend editing it because you are not waiting for tools and servers to catch up with your editor. This matters a lot and the performance and scalability benefits of Java are melting away rapidly lately.

On top of that, when I look at what we do, which is really straightforward web & REST stuff with a mysql db, and what a shitload of code, magic producing tools and frameworks, complexity, etc. we end up with something feels terribly wrong. We use hibernate for our database layer. Great stuff, until it doesn’t do what you want and starts basically throwing pretty random RunTimeExceptions at you because you forgot to add a column in your database schema (thus reducing Java to a scripting language because all of this happens run-time).

We have sort of a three way impedence mismatch going here straight out of the cookbook of Enterprise Architecture. Our services speak DTOs (data transfer objects), the database layer needs model classes and the database itself speaks SQL. So a typical REST call will go like this: xml/json comes in, gets translated to dtos, which are manipulated and get translated into model objects, which are manipulated, which results in sql being sent to the database, records coming back and translated into model objects, dto’s and back to xml/json. Basically stuff can go wrong in any of these transitions and a lot of development is basically babysitting your application through all these transitions instead of writing actual application logic.

To make this work, we need mappings from dto’s to model classes and from there to the database. So to add 1 field: I have to edit a model class, update the dto class, edit the mapping from models to dtos, the mapping from models to databases, the database schema itself. Then I have to write tests that verify all the mappings still work correctly and adapt any depending tests. 1 field, about a dozen of files touched. Re-fucking-diculous in my view. This is made ‘easier’ with Dozer that maps object hierarchies to each other, hibernate that takes care of the database and which uses annotations that make magic happen around the places where you use them, resteasy that does all of the incoming and outgoing xml/json magic, maven to download the world (the number of dependencies we have 30+). All to get one really straightforward REST service + 8 table mysql database going.

In short, I kind of miss the days where server side java development meant servlets+jdbc: read parameters from request, do some select/update query, write some stuff to the response. It might lack elegance but you get the same job done with a fraction of the number of components and without having to wait for Spring to figure out how to instantiate a bazillion little objects that go into your application context. I kind of miss the simplicity of edit, f5, edit.

Anyway, end of rant.