Jruby and Java at Localstream

Update. I’ve uploaded a skeleton project about the stuff in this post to Github. The presentation that I gave on this at Berlin Startup Culture on May 21st can be found here.

The server side code in the Localstream platform is a mix of Jruby and Java code. Over the past few months, I’ve gained a lot of experience using the two together and making the most of idioms and patterns in both worlds.

Ruby purist might wonder why you’d want to use Java at all. Likewise, Java purists might wonder why you’d waste your time doing Jruby at all instead of more hipster friendly languages such as Scala, Clojure, or Kotlin. In this article I want to steer clear of that particular topic and instead focus on more productive things such as what we use for deployment, dependency management, dependency injection, configuration, and logging. Also it is an opportunity to introduce two of my new Github projects:

The Java ecosystem provides a lot of good, very well supported technology. This includes the jvm itself but also libraries such as Google’s guava, misc. Apache frameworks such as httpclient, commons-lang, commons-io, commons-compress, the Spring framework, icu4j, and many others. Equivalents exist for Ruby, but mostly those equivalents leave a lot to be desired in terms of features, performance, design, etc. It didn’t take me long to conclude that a lot of the ruby stuff out there is sub-standard and not quite up to my level of expectations. That’s why I use Jruby instead of Ruby: it allows me to get the best of both worlds. The value of ruby is in its simplicity and the language. The value of Java is access to an enormous amount of good software. Jruby allows me to have both.

Continue reading

Maven: the way forward

A bit longer post today. My previous blog post set me off pondering on a couple of things that I have been pondering on before that sort of fit nicely together in a potential way forward. In this previous post and also this post, I spent a lot of words criticizing maven. People would be right to criticize me for blaming maven. However, that would be the wrong way to take my criticism. There’s nothing wrong with maven, it just annoys the hell out of me that it is needed and that I need to spend so much time waiting for it. In my view, maven is a symptom of a much bigger underlying problem: the java server side world (or rather the entire solution space for pretty much all forms of development) is bloated with tools, frameworks, application servers, and other stuff designed to address tiny problems with each other. Together, they sort of work but it isn’t pretty. What if we’d wipe all of that away, very much like the Sun people did when they designed Java 20 years ago? What would be different? What would be the same? I cannot of course see this topic separately from my previous career as a software engineering researcher. In my view there have been a lot of ongoing developments in the past 20 years that are now converging and morphing into something that could radically improve over the existing state of the art. However, I’m not aware of any specific projects taking on this issue in full even though a lot of people are working on parts of the solution. What follows is essentially my thoughts on a lot of topics centered around taking Java (the platform, not necessarily the language) as a base level and exploring how I would like to see the platform morph into something worthy of the past 40 years of research and practice.

Architecture

Lets start with the architecture level. Java packages were a mistake, which is now widely acknowledged. .Net namespaces are arguably better and OSGi bundles with explicit required and provided APIs as well as API versioning are better still. To scale software into the cloud where it must coexist with other software, including different (or identical) versions of itself, we need to get a grip on architecture.

The subject has been studied extensively (see here fore a nice survey of some description languages) and I see OSGi as the most successful implementation to date that preserves important features that most other development platforms currently lack, omit, or half improvise. The main issue with OSGi is that it layers stuff on top of Java but is not really a part of it. Hence you end up with a mix of manifest files that go into jar files; annotations that go into your source code; and cruft in the form of framework extensions to hook everything up, complete with duplicate functionality for logging, publish subscribe patterns, and even web service frameworks. The OSGi people are moving away towards a more declarative approach. Bring this to its ultimate conclusion and you end up with language level support for basically all that OSGi is trying to do. So, explicit provided and required APIs, API versioning, events, dynamic loading/unloading, isolation.

A nice feature of Java that OSGi relies on is the class loader. When used properly, it allows you to create a class loader, let it load classes, execute the functionality, and then destroy the class loader and all the stuff it loaded which is then garbage collected. This is nice for both dynamic loading and unloading of functionality as well as isolating functionality (for security and stability reasons). OSGi heavily depends on this feature and many application servers try to use this. However, the mechanisms used are not exactly bullet proof and there exist enormous problems with e.g. memory leaking which causes engineers to be very conservative with relying on these mechanisms in a live environment.

More recently, people have started to use dependency injection where the need for something is expressed in the code (e.g. with an annotation) or externally in some configuration file). Then at run time a dependency injecting container tries to fulfill the dependencies by creating the right objects and injecting dependencies. Dependency injection improves testability and modularization enormously.

A feature in maven that people seem to like is its way of dealing with dependencies. You express what you need in the pom file and maven fetches the needed stuff from a repository. The maven, osgi, & spring combo, is about to happen. When it does, you’ll be specifying dependencies in four different places: java imports; annotations, the pom file, and the osgi manifest. But still, I think the combined feature set is worth having.

Language

Twenty years ago, Java was a pretty minimalistic language that took basically the best of 20 years (before that) of OO languages and kept a useful subset. Inevitably, lots got discarded or not considered at all. Some mistakes were made, and the language over time absorbed some less than perfect versions of the stuff that didn’t make it. So, Java has no language support for properties, this was sort of added on by the setter/getter convention introduced in JavaBeans. It has inner classes instead of closures and lambda functions. It has no pure generics (parametrizable types) but some complicated syntactic sugar that gets compiled to non generic code. The initial concurrent programming concepts in the language were complex, broken, and dangerous to use. Subsequent versions tweaked the semantics and added some useful things like the java concurrent package. The language is overly verbose and 20 years after the fact there is now quite a bit of competition from languages that basically don’t suffer from all this. The good news is that most of those have implementations on top of the JVM. Lets not let this degenerate into a language war but clearly the language needs a proper upgrade. IMHO scala could be a good direction but it too has already some compromise embedded and lacks support for the architectural features discussed above. Message passing and functional programming concepts are now seen as important features for scalability. These are tedious at best in Java and Scala supports these well while simultaneously providing a much more concise syntax. Lets just say a replacement of the Java language is overdue. But on the other hand it would be wrong to pick any language as the language. Both .Net and the JVM are routinely used as generic runtimes for all sorts of languages. There’s also the LLVM project, which is a compiler tool chain that includes dynamic compilation in a vm as an option for basically anything GCC can compile.

Artifacts should be transient

So we now have a hypothetical language, with support for all of the above. Lets not linger on the details and move on to deployment and run time. Basically the word compile comes from the early days of computing when people had to punch holes into cards and than compile those into stacks and hand feed them to big, noisy machines. In other words, compilation is a tedious & necessary evil. Java popularized the notion of just in time compilation and partial, dynamic compilation. The main difference here is that just in time compilation merely moves the compilation step to the moment the class is loaded whereas dynamic compilation goes a few steps further and takes into account run-time context to decide if and how to compile. IDEs tend to compile on the fly while you edit. So why, bother with compilation after you finish editing and before you need to load your classes? There is no real technical reason to compile ahead of time beyond the minor one time effort that might affect start up. You might want the option to do this but it should not default to doing it.

So, for most applications, the notion of generating binary artifacts before they are needed is redundant. If nothing needs to be generated, nothing needs to be copied/moved either. This is true for both compiled or interpreted and interpreted languages. A modern Java system basically uses some binary intermediate format that is generated before run-time. That too is redundant. If you have dynamic compilation, you can just take the source code and execute it (while generating any needed artifacts for that on the fly). You can still do in IDE compilation for validation and static analysis purposes. The distinction between interpreted and static compiled languages has become outdated and as scripting languages show, not having to juggle binary artifacts simplifies life quite a bit. In other words, development artifacts (other than the source code) are transient and with the transformation from code to running code automated and happening at run time, they should no longer be a consideration.

That means no more build tools.

Without the need to transform artifacts ahead of run-time, the need for tools doing and orchestrating this also changes. Much of what maven does is basically generating, copying, packaging, gathering, etc. artifacts. An artifact in maven is just a euphemism for a file. Doing this is actually pretty stupid work. With all of those artifacts redundant, why keep maven around at all? The answer to that is of course testing and continuous integration as well as application life cycle management and other good practices (like generating documentation). Except, lots of other different tools are involved with that as well. Your IDE is where you’d ideally review problems and issues. Something like Hudson playing together with your version management tooling is where you’d expect continuous integration to take place and application life cycle management is something that is part of your deployment environment. Architectural features of the language and run-time combined with good built in application and component life cycle removes much of the need of external tooling to support all this and improves interoperability.

Source files need to go as well

Visual age and smalltalk pioneered the notion of non file based program storage where you modify the artifacts in some kind of DB. Intentional programming research basically is about the notion that programs are essentially just interpretations of more abstract things that get transformed (just in time) to executable code or into different views (editable in some cases). Martin Fowler has long been advocating IP and what he refers to as the language workbench. In a nut shell, if you stop thinking of development as editing a text file and start thinking of it as manipulating abstract syntax trees with a variety of tools (e.g. rename refactoring), you sort of get what IP and language workbenches are about. Incidentally, concepts such as APIs, API versions, provided & required interfaces are quite easily implemented in a language workbench like environment.

Storage, versioning, access control, collaborative editing, etc.

Once you stop thinking in terms of files, you can start thinking about other useful features (beyond tree transformations), like versioning or collaborative editing for example. There have been some recent advances in software engineering that I see as key enablers here. Number 1 is that version management systems are becoming decentralized, replicated databases. You don’t check out from git, you clone the repository and push back any changes you make. What if your IDE were working straight into your (cloned) repository? Then deployment becomes just a controlled sequence of replicating your local changes somewhere else (either push based, pull based, or combinations of that. A problem with this is of course that version management systems are still about manipulating text files. So they sort of require you to serialize your rich syntax trees to text and you need tools to unserialize them in your IDE again. So, text files are just another artifact that needs to be discarded.

This brings me to another recent advance: couchdb. Couchdb is one of the non relational databases currently experiencing lots of (well deserved) attention. It doesn’t store tables, it stores structured documents. Trees in other words. Just what we need. It has some nice properties built in, one of which is replication. Its built from the ground up to replicate all over the globe. The grand vision behind couchdb is a cloud of all sorts of data where stuff just replicates to the place it is needed. To accomplish this, it builds on REST, map reduce, and a couple of other cool technology. The point is, couchdb already implements most of what we need. Building a git like revision control system for versioning arbitrary trees or collections of trees on top can’t be that challenging.

Imagine the following sequence of events. Developer A modifies his program. Developer B working on the same part of the software sees the changes (real time of course) and adds some more. Once both are happy they mark the associated task as done. Somewhere on the other side of the planet a test server locally replicates the changes related to the task and finds everything is OK. Eventually the change and other changes are tagged off as a new stable release. A user accesses the application on his phone and at the first opportunity (i.e. connected), the changes are replicated to his local database. End to end the word file or artifact appears nowhere. Also note that the bare minimum of data is transmitted: this is as efficient as it is ever going to get.

Conclusions

Anyway, just some reflections on where we are and where we need to go. Java did a lot of pioneering work in a lot of different domains but it is time to move on from the way our grand fathers operated computers (well, mine won’t touch a computer if he can avoid it but that’s a different story). Most people selling silver bullets in the form of maven, ruby, continuous integration, etc. are stuck in the current thinking. These are great tools but only in the context of what I see as a deeply flawed end to end system. A lot of additional cruft is under construction to support the latest cloud computing trends (which is essentially about managing a lot of files in a distributed environment). My point here is that taking a step back and rethinking things end to end might be worth the trouble. We’re so close to radically changing the way developers work here. Remove files and source code from the equation and what is left for maven to do? The only right answer here is nothing.

Why do I think this needs to happen: well, developers are currently wasting enormous amounts of time on what are essentially redundant things rather than developing software. The last few weeks were pretty bad for me, I was just handling deployment and build configuration stuff. Tedious, slow, and maven is part of this problem.

Update 26 October 2009

Just around the time I was writing this, some people decided to come up with Play, a framework + server inspired by Python Django that preserves a couple of cool features. The best feature: no application server restarts required, just hit F5. Works for Java source changes as well. Clearly, I’m not alone in viewing the Java server side world as old and bloated. Obviously it lacks a bit in functionality. But that’s easily fixed. I wonder how this combines with a decent dependency injection framework. My guess is not well, because dependency injection frameworks require a context (i.e.) state to be maintained and Play is designed to be stateless (like Django). Basically, each save potentially invalidates the context require a full reload of that as well (i.e. a server restart). Seems the play guys have identified the pain point in Java: server side state comes at a price.

Server side development sucks

Warning: long rant :-)

The past half year+ I’ve been ‘enjoying’ myself with lots of technical things related to working with Java, the spring framework, maven, unit testing and lots of command line stuff. And while I like my job, I have to say: it can be enormously tedious from time to time.

If you are coding Java in Eclipse, life is good. Eclipse is enormously helpful and gives you more or less real time feedback on how you are doing with respect to warnings, compilation errors, etc. This is great because it saves you from manual edit compile debug fix cycles, which take time and are frustrating. Frustrating however is what I’d call the current state of server side Java, which takes place mostly outside Eclipse. I spend a shit load of time on a day to day basis waiting for my application server to catch up with what I did in Eclipse, only to find that some minor typo is blocking my progress. So shut down server, edit, compile, package, deploy, wait a minute or so, get the server in the same state you had before and see if it works. Repeat, endlessly. That’s more or less my day.

There’s lots of tools to make this easier. Maven is nice, but it is also a huge time waster with its insistence on checking for updates of everything you have every time you try to use it (20 dependencies is 20 GET requests to your maven repository) and on top of that running the test suite as well. Useful, except if you’ve done this 20 times in the last hour already and you are pretty damn sure you are not interested in whether it will pass this time since you just want to know if that one typo you fixed was actually good enough. Then there are ways of hooking up application servers to eclipse and making them be somewhat more reasonable about requiring full shutdown, deploy and restart. Still it is tedious. And it doesn’t help that maven is completely oblivious about this feature, leaving you to set up things manually. Or not, since that’s not exactly trivial.

The maven way is “our way or the highway”. Eclipse and application servers are part of the highway and while there is the maven eclipse plugin and the eclipse maven plugin (yes, you read that right), the point of both is that there is a (huge) gap to bridge. They each have their own idea of where source code and binaries go and where dependencies come from, or indeed where stuff gets deployed to be debugged. Likewise, maven’s idea of application server integration is wrapping the start up process with some plugin. It does nothing to speed up the actual process of getting the application server up and running. It just saves you from having to start it manually. Eclipse plugins exist to do the same. And of course getting the maven and eclipse plugins to play nice with each other is kind of a challenge. Essentially, there are three worlds: maven, eclipse, and the application server and you’ll lose shit loads of development time watching one catch up with the other.

I have in the back of my mind still last year’s project which was based on python Django. Last year, my job was like this: edit, alt+tab to browser, F5, test, alt+tab back to eclipse, fix, etc. I had 1-3 second roundtrips between my browser and editor, apache was loading the python files straight from my svn work directory. We had a staging server that updated with a cron job that did nothing but “svn up”. Since then I’ve had the pleasure of debating the merits of using dynamic languages in a server side environment in a place where everybody is partying on the Java bandwagon like it’s 1999. Well here it is: you spend a shitload less time waiting for maven or application servers to finish whatever they think needs to be done. On top of that, quite a lot of server side Java is actually scripting. Don’t fool yourself into believing otherwise. We use spring web flow, which means my application logic is part Java, part xhtml with jsf (and a half dozen xml name spaces for that), part xml definitions for webflow, part definitions for spring beans and part Javascript. Guess where you can find bugs: all of them. Guess which ones Eclipse actually provides real time feedback on: Java only. So basically all the disadvantages of using a scripting environment (run-time bug discovery) without most of the advantages (like fast edit-test round trips).

So it is not surprising to me that scripting languages are winning over a lot of people lately. You write less code, in less languages, and on top you get more time to spend editing it because you are not waiting for tools and servers to catch up with your editor. This matters a lot and the performance and scalability benefits of Java are melting away rapidly lately.

On top of that, when I look at what we do, which is really straightforward web & REST stuff with a mysql db, and what a shitload of code, magic producing tools and frameworks, complexity, etc. we end up with something feels terribly wrong. We use hibernate for our database layer. Great stuff, until it doesn’t do what you want and starts basically throwing pretty random RunTimeExceptions at you because you forgot to add a column in your database schema (thus reducing Java to a scripting language because all of this happens run-time).

We have sort of a three way impedence mismatch going here straight out of the cookbook of Enterprise Architecture. Our services speak DTOs (data transfer objects), the database layer needs model classes and the database itself speaks SQL. So a typical REST call will go like this: xml/json comes in, gets translated to dtos, which are manipulated and get translated into model objects, which are manipulated, which results in sql being sent to the database, records coming back and translated into model objects, dto’s and back to xml/json. Basically stuff can go wrong in any of these transitions and a lot of development is basically babysitting your application through all these transitions instead of writing actual application logic.

To make this work, we need mappings from dto’s to model classes and from there to the database. So to add 1 field: I have to edit a model class, update the dto class, edit the mapping from models to dtos, the mapping from models to databases, the database schema itself. Then I have to write tests that verify all the mappings still work correctly and adapt any depending tests. 1 field, about a dozen of files touched. Re-fucking-diculous in my view. This is made ‘easier’ with Dozer that maps object hierarchies to each other, hibernate that takes care of the database and which uses annotations that make magic happen around the places where you use them, resteasy that does all of the incoming and outgoing xml/json magic, maven to download the world (the number of dependencies we have 30+). All to get one really straightforward REST service + 8 table mysql database going.

In short, I kind of miss the days where server side java development meant servlets+jdbc: read parameters from request, do some select/update query, write some stuff to the response. It might lack elegance but you get the same job done with a fraction of the number of components and without having to wait for Spring to figure out how to instantiate a bazillion little objects that go into your application context. I kind of miss the simplicity of edit, f5, edit.

Anyway, end of rant.

Time for a little update

Hmm, it’s been more than two months since I last posted. Time for an update. A lot has happened since January.

So,

  • I moved out of Finland as planned.
  • I stayed in a temporary apartment for a month. Central-home is the company managing the facility where I lived (on Habersaathstrasse 24) and if you’re looking for temporary housing in Berlin, look no further.
  • I managed to find a nice apartment for long term in Berlin Mitte, in the Bergstrasse, which is more or less walking distance from tourist attractions like Alexanderplatz, Hackeschermarkt, Friedrichstrasse and of course the Brandenburger Tor.
  • I re-aquainted myself with Java, Java development, and lately also release management. Fun days of hacking but the normal Nokia routine of meetings creeping into my calendar is sadly kicking in.
  • I learned tons of new stuff
  • Unfortunately German is not yet one of those things. My linguistic skills are ever pathetic and English remains the only foreign language I ever managed to master more or less properly. On paper German should be dead easy since I can get by mumbling in my native language and people can still figure out what I want. In practice, I can understand it if spoken slowly (and clearly). Speaking back is challenging.
  • I’m working on it though, once a week, in a beginners class. Relearning stuff that 3 years of trying to stuff German grammar in my head in High-school did not accomplish.

Moving is tedious and tiresome. But the end result is some genuine improvement in life. I absolutely love Berlin and am looking forward to an early Spring. I was in a telco with some Finnish people today discussion the weather. They, so how’s Berlin. Any snow there still? Me: no about 20 degrees outside right now :-). Nice to have spring start at the normal time again. Not to mention the more sane distribution of daylight and darkness, throughout the year.

A shitload of updates is overdue. For several months already. I have a ton of photos to upload. WordPress needs upgrading. And some technical stuff might need some blogging about as well. Then there is still some unfinnished papers in the pipeline. So, I’ll be back with more. Some day.

server side osgi, a myth?

Two years ago, I started using OSGI, the popular Java dependency injecting component standard, for an internal project. Fast forward to now and I have a nice set of bundles that depend on, amongst other the OSGI HTTP service.

All along, I’ve been reading how great OSGI is and how flexible it is and how it is the future of server side Java. I was ready to believe it. But to cut to the meat of this blog post: server side OSGI is vaporware. It doesn’t exist. None of the vendors actually support it. Support it as in production quality, well documented, widely used product available right now. I’ve looked at Felix, Tomcat, Equinox,  Jetty, Glassfish, JBoss, etc. and came up with nothing but a few obscure, unsupported, undocumented components. The default HTTP service implementation is not my idea of scalable & production quality. And the connections of existing production quality OSGI containers to existing production quality application servers is sketchy at best.

Frankly, I’m very surprised at this.I know lots of people that claim use OSGI serverside and there are are lots of announcements of vendor X endorsing OSGI bla bla bla fully modularized bla bla bla dependency injection  bla bla bla. That’s great but after two years of OSGI hacking I was hoping for something a little more substantial than what I have found so far:

The best option I came up with is the HTTP servlet bridge from equinox. The documentation for this is either hopelessly out of date or this is a case of abandonware. Basically all the page says is download this bridge.war and good luck. Problem #1 this bridge.war is from 1997 .. eh 2007 :-). Problem #2, I’d like to use a bit newer version of Equinox. Does this work at all? Are people still working on this? Problem #3, this page hasn’t changed substantially since I started using OSGI. Is anyone still working on this or is this a dead project? Are there any users?

Option #2 is to use Apache Felix which apparently can embed Jetty. That’s great but I’m a tomcat guy and am more interested in using tomcat as the outer container than Jetty. Neither the jetty nor the tomcat option is documented properly. I’m not even sure the tomcat option is possible/advisable. Some people hint at this being possible. A particular concern for me is that I need to cluster the damn thing, potentially on a large scale. Is this possible at all? I’m pretty sure people have done this but in terms of production quality code and documentation they have not left much of a trail. The Felix people don’t seem to much documentation in general. There’s of course the gratuitous OSGI tutorial and some hints of how you could use it but that’s it.

This situation is not something I can sell here at Nokia. I need something more substantial, preferably Tomcat or JBoss based that is 1) scalable in a cluster 2) production quality 3) well documented. I’m now pretty far convinced that what I’m looking for doesn’t exist. If I don’t find something soon, I’m going to just have to rip out all the OSGI stuff and switch to a proper dependency injecting container. Spring 3.0 is looking pretty neat for example but a bit heavyweight in my opinion.

Anyway, comments are open and please point out how wrong I am and what information I overlooked :-). My main gripe here is that I just have very little to base a decision on. Sketchy documentation, bits and pieces on blogs and mailinglists but nothing solid. Either OSGI is a genuine server side option or it is just an urban legend (some people have heard of other people that have done this). Everything I’ve seen so far hints at the latter.

I know Jboss 4, Glassfish 3, and Spring Application server are all going to be OSGI based of course. These are far from vaporware but also not exactly production ready. Additionally, being OSGI based is one thing, being able to deploy servlets from OSGI bundles is another thing. Most things I’ve read on this suggests that these servers are not really designed to allow application developers to interact with the OSGI container directly (i.e. deploying bundles, using http service instead of WAR files, etc.).

New photos

I’ve been taking photos with my new Canon S80. However, editing (correcting color balance and curves mainly) all that is quite a bit of work. I finally found some time this morning and three new directories have been added to photos.jillesvangurp.com
Check here for some April pictures of Helsinki. Here for a few family snaps from my Eastern holiday and here for my visit to Talling late April. Notice that in the same month we went from frozen harbor to enjoyable spring weather in Tallinn. Basically, spring lasts about two weeks here.

Paticularly the Tallinn photos are very nice. The weather was a bit grey at first but turned quite nice at the end of the afternoon.

market square