On Java, Json, and complexity

Time for a long overdue Saturday morning blog post.

Lately, I’ve been involved in some interesting new development at work that is all about key value stores, Json, Solr and a few other technologies I might have mentioned a few times. In other words, I’m having loads of fun at work doing stuff I really like. We’ve effectively tossed out XML and SQL and are now all about Json blobs that live in KVs and are indexed in Solr. Nothing revolutionary, except it is if you have been exposed to the wonderfully bloated and complicated world of Java Enterprise Edition and associated technologies. I’ve found that there is a bunch of knee-jerk reflexes that causes Java developers to be biased to be doing all the wrong things here. This is symptomatic of the growing gap between the enterprise Java people on one side and the cool hip kids wielding cool new languages and tools on the other hand.

The problem with Json in Java

One of the things I’ve been struggling with recently is Json. Json and the java world is not a perfect marriage. The whole point of Json is that it is valid Javascript, Python, and Ruby. That doesn’t mean that you should go off and execute it but it does mean that after parsing (or rather unmarshalling), you end up with dictionaries, lists, and native types that each of those languages supports. This means you can have light weight data structures that you can just serialize to and from Json at will. In Python, classes are just dictionaries of attributes and dictionaries and lists are native types. That means that Json is a very natural serialization of any Python object structure. Javascript doesn’t have classes since it is a prototype based language, but of course lists and dictionaries are native types there as well and objects are of course dictionaries. I’m not a ruby-ist but I know enough of it that it is similar to Python and Javascript in this respect as well.

Sadly, this not the case in the Java world. Dictionaries and Lists are not native types and instead people use the Collections framework (or alternatives such as Google collections). The classes in those frameworks are of course generic these days and classes bear very little resemblance to dictionaries in Java. That means Json is not a very natural serialisation for a Java based object structure. Frameworks exist to address this through mappings, which can be convenient. Except, it reduces the Json approach to yet another way to serialise Java objects rather than a beautifully simple and elegant solution to modeling your data. And that’s where the problem lies.

My main problem with doing things using mapping is that it’s not exactly lightweight anymore. Suddenly you need to worry about instantiating classes, stuffing the right bits of data in the right setters and constructors. Then inevitably you start doing some hashCode and equals methods and of course generics creeps in somewhere along the way (or even worse, inheritance). Then you generalize and add some interfaces, annotations, etc. Congratulations: your easy to read 50 line Json file representing a simple dictionary with some very simple, uncomplicated Json magically turned into a framework with dozens of classes and thousands of lines of code that does nothing else than represent what any javascript, python, or ruby programmer would trivially accomplish with a few lines of code. That’s some serious bloat. It’s the reason countless of software engineers have been fleeing the java enterprise world in favor of more sane worlds such as Python Django, Ruby on Rails, or even Node.js. It makes me want to flee as well.

The solution: do what others do and use Json

I’ve been fighting this bloat & complexity over the past few months and have been stubbornly reducing the number of model classes we have to 0. Instead I simply use the classes provided by the GSon framework (which is one of the popular options in the Java world for dealing with Json) to represent the core Json objects, I.e. JsonObject and JsonArray + the four primitive types (int, double, string, boolean). Where in the past I would have been constructing model classes and instantiating those, I instead do a “new JsonObject()” and start modifying it through its primitives for adding JsonObjects, JsonArrays, or one of the four JsonPrimitives. This is different than what most Json frameworks in Java try to achieve, which is providing convient ways to map to and from Json and your model classes. What I want is to get rid of model classes.

You can lose some serious amounts of source code this way. People seem terribly concerned with hiding Json from the Java programmer by adding layers of indirection to hide it. What they don’t realize is that there is a direct correlation between lines of code and complexity on one hand and maintenance cost on the other hand. Json world: you are adding string “foo” to a dictionary. Java world, you are creating an instance of class Foo and another of FooHolder<Foo>. Then you add your instance of Foo to the FooHolder and then you serialize it to Json. I’m exaggerating of course but I’ve seen people write code that essentially boiled down to setting a few properties on a dictionary and yet somehow required 10+ classes and an afternoon of coding to accomplish. That feels very wrong to me.

Problems with the Java language

The problem is actually quite subtle and strongly related to a few areas where Java is lacking relative to more modern languages:

Principles I stick to

So, does that mean I’d love to switch languages to something more modern? Absolutely, I’m bored out my mind with Java! But sadly, that’s not likely to be happening on any of the projects I’m currently involved with and I have to work with what I have. So, instead I’ve started to resist the traditional Java way since I believe a lot of these practices to be outdated and counter productive. I have started to incorporate and borrow patterns and idioms from elsewhere.

It’s not all good of course and there are downsides to rebelling against the Java way:

Why this is important

I’ve been refactoring quite a bit of code using the above principles and I can report to have shed many lines of code along the way while improving readability, reducing complexity, and gaining flexibility. My background as a software engineering researcher tells me this is a good thing (TM) because lines of code directly correlate to software development cost, complexity, and maintenance cost. Therefore, less lines of code and complexity is almost always the right thing to aim for unless you can convert the investment in cost into added value. More lines of code is not necessarily the same as adding value. If it is not adding value, it is in fact technical debt.

I’ve seen a lot of Java projects lately that are needlessly complicated and hard to maintain while trying to achieve seemingly simple goals that involve accepting some data, stuffing it in a data store, retrieving it and and returning it via some API. CRUD in other words. CRUD is simple but not if it has to go to several layers of indirection that involve mapping xml to data transfer objects and then to model objects, and then to SQL tables and back again. The system we are replacing with a new system based on the principles outlined here does this. Its business logic amounts to take in data, validate it, store it, mash it together in an interesting way and retrieve it. Its business logic should also include a bunch of other things that we never got around to because we got bogged down in non value adding activities related to mapping data in different ways and dealing with over engineered & bloated technologies that got abused in a terrible way by some junior engineers who thought they were in a candy store. In other words, I’ve gone through months of head shaking, swearing a lot, silently cursing misc. individuals no longer on the team, and mostly deleting code to replace it with small amounts of new code. In other words, I’ve taken a system that pretty much A** rapes the listed principles and am working hard on building a replacement that doesn’t do that.