Git and agile

I’ve been working with Subversion since 2004 (we used a pre 1.0 version at GX). I started hearing about git around the 2006-2007 time frame when Linus Torvalds’ replacement for Bitkeeper started maturing enough for other people to use it. I met people working on Maemo (the Debian based OS for the N770, N800, N810, and recently the N900) in Nokia who were really enthusiastic about it in 2008. They had to use it to work with all the upstream projects Maemo depends on and they loved it. When I moved to Berlin everybody there was using subversion so I just conformed and ignored git/mercurial and all those other cool versioning systems out there for an entire year. It turns out that was lost time, I should have switched around 2007/2008. I’m especially annoyed by this because I’ve been aware of decentralized versioning being superior to centralized versioning since 2006. If you don’t believe me, I had a workshop paper at SPLC 2006 on version management and variability management that pointed out the emerging of DVCSes in that context. I’ve wasted at least three years. Ages for the early adopter type guy I still consider myself to be.

Anyway, after weighing the pros and cons for way too long, I switched from subversion to git last week. What triggered me to do this was, oddly, an excellent tutorial on Mercurial by Joel Spolsky. Nothing against Mercurial, but Git has the momentum in my view and it definitely appears to be the band wagon to be jumping right now. I don’t see any big technical argument for using Mercurial instead of Git. There’s github and no mercurial hub as far as I know. So, I took Joel’s good advice on Mercurial as a hint that it was time to get off my ass and get more serious about switching to anything else than Subversion. I had already decided in favor of git based on stuff I’ve been reading on both versioning systems.

My colleagues of course haven’t switched (yet, mostly) but that is not an issue with git-svn, which allows me to interface with svn repositories. I’d like to say making the switch was an easy ride, except it wasn’t. The reason is not git but me. Git is a powerful tool that has quite a bit more features than Subversion. Martin Fowler has a nice diagram on “recommendability” and “required skill”. Git is in the top right corner (highly recommended but you’ll need to learn some new skills) and Subversion is lower right (recommended, not much skill needed). The good news is that you will need only a small subset of commands to cover the feature set provided by svn and you can gradually expand what you use from there. Even with this small subset git is worth the trouble IMHO, if only because world + dog are switching. The bad news is that you will just have to sit down and spend a few hours learning the basics. I spent a bit more than I planned to on this but in the end I got there.

I should have switched around 2007/2008

The mistake I made that caused me to delay the switch for years was not realizing that git adds loads of value even when your colleagues are not using it: you will be able to collaborate more effectively if you are the only one using git! There are two parts to my mistake.

The first part is that the whole point of git is branching. You don’t have a working copy, you have a branch. It’s exactly the same with git-svn: you don’t have a svn working copy but a branch forked of svn trunk. So what, you might think. Git excels at merging between branches. With svn branching and merging is painful, so instead of having branches and merging between them, you avoid conflicts by updating often and committing often. With git-svn, you don’t update from svn trunk, you merge its changes in your local branch. You are working on a branch by default and creating more than 1 is really not something to be scared of. It’s is painless, even if you have a large amount of uncommitted work (which would get you in trouble with svn). Even if that work includes renaming the top level directories in your project (I did this). Even if other people are doing big changes in svn trunk. That’s a really valuable feature to have around. It means I can work on big changes to the code without having to worry about upstream svn commits. The type of changes nobody dares to take on because it would be too disruptive to deal with branching and merging and because there are “more important things” to do and we don’t want to “destabilize” trunk. Well, not any more. I can work on changes locally on a git branch for weeks if needed and push it back to trunk when it is ready while at the same time me and my colleagues keep committing big changes on trunk. The reason I’m so annoyed right now is the time I spent on resolving svn conflicts in the past four years was essentially unnecessary. Not switching four years ago was a big mistake.

The second part of my mistake was assuming I needed IDE support for git to be able to deal with refactoring and particularly class renames (which I do all the time in Eclipse). While there is egit now, it is still pretty immature. It turns out that assuming I needed Eclipse support was a false assumption. If you rename a file in a git repository and commit the file, Git will automatically figure out that the file was renamed, you don’t need to tell git that the file was renamed. A simple “mv foo.java bar.java” will work. On directories too. This is a really cool feature. So I can develop in eclipse without it even being aware of any git specifics, refactor and rename as much as I like, and git will keep tracking the changes for me. Even better, certain types of refactorings that are quite tricky with subclipse and subversive just work in git. I’ve corrupted svn work directories on several occasions when trying to rename packages and moving stuff around. Git will handle this effortlessly. Merges work so well because git can handle the situation where a locally renamed file needs changes from upstream merged into it. It’s a core feature, not an argument against using it. My mistake. I probably spent even more time on corrupted svn directories than conflict resolution in the last three years.

Git is an Agile enabler

We have plenty of pending big changes and refactorings that we have been delaying because they are disruptive. Git allows me to work on these changes whenever I feel like it without having to finish them before somebody else starts introducing conflicting changes.

This is not just a technical advantage. It is a process advantage as well. Subversion forces you to serialize change so that you minimize the interactions between the changes. That’s another way of saying that subversion is all about waterfall. Git allows you to decouple change instead and parallelize the work more effectively. Think multiple teams working on the same code base on unrelated changes. Don’t believe me? The linux kernel community has thousands of developers from hundreds of companies working on the same code base touching large portions of the entire source tree. Git is why that works at all and why they push out stable releases every 6 weeks. Linux kernel development speed is measured in thousands of lines of code modified or added per day. Evaluating the incoming changes every day is a full time job for several people.

Subversion is causing us to delay necessary changes, i.e. changes that we would prefer to do if only it wouldn’t be so disruptive. Delayed changes pile up to become technical debt. Think of git as a tool to manage your technical debt. You can work on business value adding changes (and keep the managers happy) and disruptive changes at the same time without the two interfering. In other words you can be more agile. Agile has always been about technical enablers (refactoring tooling, unit testing frameworks, continuous integration infrastructure, version control, etc) as much as it was about process. Having the infrastructure to do rapid iterations and release frequently is critical to the ability to release every sprint. You can’t do one without the other. Of course, tools don’t fix process problems. But then, process tends to be about workarounds for lacking tools as well. Decentralized version management is another essential tool in this context. You can compensate not using it with process. IMHO life is to short to play bureaucrat.

Not an easy ride

But as I said, switching from svn to git wasn’t a smooth ride. Getting familiar with the various git commands and how they are different from what I am used to in svn has been taking some time despite the fact that I understand how it works and how I am supposed to use it. I’m a git newby and I’ve been making lots of beginners mistakes (mainly using the wrong git commands for the things I was trying to do). The good news is that I managed to get some pretty big changes committed back to the central svn repository without losing any work (which is the point of version management). The bad news is that I got stuck several times trying to figure out how to rebase properly, how to undo certain changes, how to recover a messed up checkout on top of my local work directory from the local git repository. In short, I learned a lot on this and I have still some more things to learn. On the other hand, I can track changes from svn trunk, have local topic branches, merge from those to the local git master, and dcommit back to trunk. That about covers all my basic needs.

5 Comments

log4j, maven, surefire, jetty and how to make it work

I spend some time yesterday on making log4j behave. Not for the first time (gave up on several occasions) and I was getting thoroughly frustrated with how my logs refuse to conform to my log4j configuration, or rather any type of configuration. This time, I believe I succeeded and since I know plenty of others must be facing the exact same misery and since most of the information out there is downright misleading in the sense of presenting all types of snake oil solutions that actually don’t change a thing, here’s a post that offers a proper analysis of the problem and a way out. That, and it’s a nice note to self as well. I just know that I’ll need to set this up again some day.

In a nutshell, the problem is that there are multiple ways of doing logging in Java and one in particular, Apache common-logging, is misbehaving. This trusty little library has not evolved significantly since about 2006 and is depended on by just about any dependency in the maven repository that does logging, mostly for historical reasons. Some others depend on log4j directly and yet some others depend on slf4j (Simple Logging Facade for Java). Basically, you are extremely likely to have a transitive dependency on all of these and even a few dependencies on JDK logging (introduced in Java 1.4).

The main goal of commons.logging is not having to choose log4j or JDK logging. It acts as a facade and picks one of them using some funky reflection. Nice but most sysadmins and developers I have worked with seem to favor log4j anyway and hate commons-logging with a passion. In our case, all our projects depend on log4j directly and that’s just the way it is.

One of the nasty things with commons-logging is that it behaves weirdly in complex class loading situations. Like in maven or a typical application server. As a result, it takes over orchestration of the logs for basically the whole application and wrongly assumes that you want to use jdk logging or some log4j configuration buried deep in one of your dependencies. WTF is up with that btw, don’t ship logging configuration with a library. Just don’t.

Symptoms: you configure logger foo in log4j.xml to STFU below ERROR level and when running your maven tests (or even from eclipse) or in your application server it barfs all over the place at INFO level, which is the unconfigured default. To top it off, it does this using an appender you sure as hell did not configure. Double check, yes log4j is on the classpath, it finds your configuration, it even creates the log file you defined an appender for but nothing shows up there. You can find out what log4j is up to with -Dlog4j.debug=true. So, log4j is there, configured and all but commons-logging is trying to be smart and is redirecting all logged stuff, including the stuff actually logged with log4j directly, to the wrong place. To add to your misery, you might have partly succeeded with your attempts to get log4j working so now you have stuff from different log libraries showing up in the console.

A decent workaround in that case is to define a file appender, which will be free from non log4j stuff. This more or less is the advice for production deployments: don’t use a console logger because dependencies are prone to hijacking the console for all sorts of purposes.

So, good advice, but less than satisfactory. To fix the problem properly, make sure you don’t have commons-logging on the classpath. At all. This will break all the stuff that depends on it being there. Fix that by using slf4j instead. Slf4j comes in several maven modules. I used the following ones:

  1. jcl-over-slf4j is a drop-in, API compatible replacement for commons logging. It writes messages logged through commons-logging using slf4j, which is similar to commons-logging but behaves much nicer (i.e. it actually works). It’s designed to fix the problem we are dealing with here. The only reason it exists is because commons-logging is hopelessly broken.
  2. slf4j-api is used by dependencies already depending on slf4j
  3. slf4j-log4j12 the backend for log4j. If this is on the classpath slf4j will use log4j for its output. You want this.

That’s it. Here’s what I had to do to get a properly working configuration:

  1. Use mvn dependency:tree to find out which dependencies are transitively/directly depending on commons-logging.
  2. fix all of these dependencies with a
    <exclusions>
        <exclusion>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
        </exclusion>
    </exclusions>
  3. You might have to iterate fixing the dependencies and rerunning mvn dependency:tree since only the first instance of commons-logging found will used transitively.
  4. Now add these dependencies to your pom.xml:
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.5.10</version>
    </dependency>                
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>jcl-over-slf4j</artifactId>
        <version>1.5.10</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>1.5.10</version>
    </dependency>
  5. Maven plugins have their own dependencies, separately from your normal dependencies. Make that you add the three slf4j dependencies to surefire, jetty, and other relevant plugins. At least jetty seems to already depend on slf4j.
  6. Finally make sure that your plugins have system properties defining log4j.configuration=file:[log4j config location]. Most of the googled advice on this topic covers this (and not much else). Some plugins can be a bit hard to configure due to the fact that they fork off separate processes.

That should do the trick, assuming you have log4j on the classpath of course.

, ,

2 Comments

Missing the point

Like most of you (probably), I’ve been reading the news around Google Buzz with interest. At this point, the regular as clockwork announcements from Google are treated somewhat routinely by the various technology blogs. Google announced foo, competitor bar says this and expert John Doe says that. Bla bla bla, revolutionary, bla bla similar to bla, bla. Etc. You might be tempted to dismiss Buzz as yet another Google service doomed to be ignored by most users. And you’d be right. Except it’s easy to forget that most of those announcements actually do have some substance. Sure, there have been a few less than exciting ones lately and not everything Google touches turns into gold but there is some genuinely cool stuff being pushed out into the world from Mountain View on a monthly, if not more frequent, basis.

So this week it’s Google Buzz. Personally, I think Buzz won’t last. At least not in its current gmail centric form. Focusing on Buzz is missing the point however. It will have a lasting effect similar to what happened with RSS a few years back. The reason is very simple, Google is big enough to cause everybody else to implement their APIs, even if buzz is not going to be a huge success. They showed this with open social, which world + dog now implements, despite it being very unsuccessful in user space. Google wave, same thing so far. The net effect of Buzz and the APIs that come with it will be internet wide endorsement of a new real time notification protocol, pubsubhubbub. In effect this will take twitter (already an implementer) to the next level. Think pubsubhubbub sinks and sources all over the internet and absolutely massive traffic between those sources and sinks. Every little internet site will be able to notify the world of whatever updates it has, every person on the internet will be able to subscribe to such notifications directly, or more importantly, indirectly to whichever other websites choose to consume, funnel and filter those notifications on their behalf. It’s so easy to implement that few will resist the temptation to do so.

Buzz is merely the first large scale consumer of pubsubhub notifications. Friendfeed tried something similar with RSS, was bought by Facebook and successfully eliminated as a Facebook competitor. However, Pubsubhubbub is the one protocol that Facebook won’t be able to ignore. For now they seem to stick with their closed everything model. This means there is Facebook and the rest of the world and well guarded boundaries between those. As the rest of the world becomes more interesting in terms of notifications, keeping Facebook isolated as it is today will become harder. Technically, there are no obstacles. The only reason Facebook is isolated is because it chooses to be isolated. Anybody who is not Facebook has a stake in committing to pubsubhubbub to be able to compete with Facebook. So Facebook becoming a consumer of pubsubhubbub type notifications is a matter of time, if only because it will simply be the easiest way for them to syndicate third party notifications (which is their core business). I’d be very surprised if they hadn’t got something implemented already. Facebook becoming a source of notifications is a different matter though. The beauty of the whole thing is that the more notifications originate outside of Facebook, the less this will matter. Already some of their status updates are simply syndicated from elsewhere (e.g. mine go through Twitter). Facebook is merely a place people go to see an aggregated view on what their friends do. It is not a major source of information, and ironically the limitations imposed by Facebook make it less competitive as such.

So, those dismissing Buzz for whatever reason are missing the point: it’s the APIs stupid! Open APIs, unrestricted syndication and aggregation of notifications, events, status updates, etc. It’s been talked about for ages, it’s about to happen in the next few months. First thing to catch up will be those little social network sites that almost nobody uses but collectively are used by everybody. Hook them up to buzz, twitter, etc. Result, more detailed event streams popping up outside of Facebook. Eventually people will start hooking up Facebook as well, with or without the help of Facebook. By this time endorsement will seem like a good survival strategy for Facebook.

, , ,

6 Comments

CouchDB

We did a little exercise at work to come up with a plan to scale to absolutely massive levels. Not an entirely academic problem where I work. One of the options I am (strongly) in favor of is using something like couchdb to scale out. I was aware of couchdb before this but over the past few days I have learned quite a bit more that and am now even more convinced that couchdb is a perfect match for our needs. For obvious reasons I can’t dive in what we want to do with it exactly. But of course itemizing what I like in couchdb should give you a strong indication that it involves shitloads (think hundreds of millions) of data items served up to shitloads of users (likewise). Not unheard of in this day and age (e.g. Facebook, Google). But also not something any off the shelf solution is going to handle just like that.

Or so I thought …

The couchdb wiki has a nice two line description:

Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.

This is not the whole story but it gives a strong indication that quite a lot is being claimed here. So, lets dig into the details a bit.

Document oriented and schema less storage. CouchDB stores json documents. So, a document is nothing more than a JSON data structure. Fair enough. No schemas to worry about, just data. A tree with nodes, attributes and values. Up to you to determine what goes in the tree.

Conflict resolution. It has special attributes for the identity and revision of a document and some other couchdb stuff. Both id and revision are globally unique uuids. UPDATE revision is not a uuid (thanks Matt).That means that any document stored in any instance of couchdb anywhere on this planet is uniquely identifiable and that any revision of such a document in any instance of couchdb is also uniquely identifiable. Any conflicts are easily identified by simply examining the id and revision attributes. A (simple) conflict resolution mechanism is part of couchdb. Simple but effective for simple day to day replication.

Robust incremental replication. Two couchdb nodes can replicate to each other. Since documents are globally unique, it is easy to figure out which document is on which node. Additionally, the revision id allows couchdb to figure out what the correct revision is. Should you be so unlucky to have conflicting changes on both nodes, there are ways of dealing with conflict resolution as well. What this means is that any node can replicate to any other node. All it takes is bandwidth and time. It’s bidirectional so you can have a master-master setup where both nodes consume writes and propagate changes to each other. The couchdb use the concept of “eventual consistency” to emphasize the fact that a network of couchdb nodes replicating to each other will eventually have the same data and be consistent with each other, regardless of the size of the network or how out of sync the nodes are at the beginning.

Fault tolerant.Couchdb uses a file as its datastore. Any write to a couchdb instance appends stuff to this file. Data in the file already is never overwritten. That’s why it is fault tolerant. The only part of the file that can possibly get corrupted is at the end of the file, which is easily detected (on startup). Aside from that, couchdb is rock solid and guaranteed to never touch your data once it has been committed to disk. New revisions don’t overwrite old ones, they are simply appended to the file (in full) to the end of the file with a new revision id. You. Never. Overwrite. Existing. Data. Ever. Fair enough, it doesn’t get more robust than that. Allegedly, kill -9 is a supported shutdown mechanism.

Cleanup by replicating. Because it is append only, a lot of cruft can accumulate in the bit of the file that is never touched again. Solution: add an empty node, tell the others to replicate to it. Once they are done replicating, you have a clean node and you can start cleaning up the old ones. Easy to automate. Data store cleanup is not an issue. Update. As Jan and Matt point out in the comments, you can use a compact function, which would be a bit more efficient.

Restful. CouchDBs native protocol is REST operations over HTTP. This means several things. First of all, there are no dedicated binary protocols, couchdb clients, drivers, etc. Instead you use normal REST and service related tooling to access couchdb. This is good because this is exactly what has made the internet work for all these years. Need caching? Pick your favorite caching proxy. Need load balancing? Same thing. Need access from language x on platform y? If it came with http support you are ready to roll.

Incremental map reduce. Map reduce is easy to explain if you understand functional programming. If you’re not familiar with that, it’s a divide and conquer type strategy to calculate stuff concurrently from lists of items. Very long lists with millions/billions of items. How it works is as follows: the list is chopped into chunks. The chunks are processed concurrently in a (large) cluster to calculate something. This is called the map phase. Then the results are combined by collecting the results from processing each of the chunks. This is called the reduce phase. Basically, this is what Google uses to calculate e.g. pagerank and many thousands of other things on their local copy of the web (which they populate by crawling) the web regularly. CouchDB uses the same strategy as a generic querying mechanism. You define map and reduce functions in Javascript and couchdb takes care of applying them to the documents in its store. Moreover, it is incremental. So if you have n documents and those have been map reduced and you add another document, it basically incrementally calculates the map reduce stuff. I.e. it catches up real quick. Using this feature you can define views and query simply by accessing the views. The views are calculated on write (Update. actually it’s on read), so accessing a view is cheap whereas writing involves the cost of storing and the background task of updating all the relevant views, which you control yourself by writing good map reduce functions. It’s concurrent, so you can simply add nodes to scale. You can use views to index specific attributes, run clustering algorithms, implement join like query views, etc. Anything goes here. MS at one point had an experimental query optimizer backend for ms sql that was implemented using map reduce. Think expensive datamining SQL queries running as map reduce jobs on a generic map reduce cluster.

It’s fast. It is implemented in erlang which is a language that is designed from the ground up to scale on massively parallel systems. It’s a bit of a weird language but one with a long and very solid track record in high performance, high throughput type systems. Additionally, couchdb’s append only and lock free files are wickedly fast. Basically, the primary bottleneck is the available IO to disk. Couchdb developers are actually claiming sustained write throughput that is above 80% of the IO bandwidth to disk. Add nodes to scale out.

So couchdb is an extremely scalable & fast storage system for documents that provides incremental map reduce for querying and mining the data; http based access and replication; and a robust append only, overwrite never, and lock free storage.

Is that all?

No.

Meebo decided that this was all nice and dandy but they needed to partition and shard their data instead of having all their data in every couchdb node. So they came up with CouchDB Lounge. Basically what couchdb lounge does is enabled by the REST like nature of couchdb. It’s a simple set of scripts on top of nginx (a popular http proxy) and the python twisted framework (a popular IO oriented framework for python) that dynamically routes HTTP messages to the right couchdb node. Each node hosts not one but several (configurable) couchdb shards. As the shards fill up, new nodes can be added and the existing shards are redistributed among them. Each shard calculates its map reduce views, the scripts in front of the loadbalancer take care of reducing these views across all nodes to a coherent ‘global’ view. I.e. from the outside world, a couchdb lounge cluster looks just like any other couchdb node. It’s sharded nature is completely transparent. Except it is effectively infinitely scalable both in the number of documents it can store as well in the read/write throughput. Couchdb looks just like any other couchdb instance in the sense that you can run the full test suite that comes with couchdb against and it will basically pass all tests. There’s no difference from a functional perspective.

So, couchdb with couchdb lounge provides an off the shelf solution for storing, accessing and querying shitloads of documents. Precisely what we need. If shitloads of users come that need access, we can give them all the throughput they could possibly need by throwing more hardware in the mix. If shitloads is redefined to mean billions instead of millions, same solution. I’m sold. I want to get my hands dirty now. I’m totally sick and tired of having to deal with retarded ORM solutions that are neither easy, scalable, fast, robust, or even remotely convenient. I have some smart colleagues who are specialized in this stuff and way more who are not. The net result is a data layer that requires constant fire fighting to stay operational. The non experts routinely do stuff they shouldn’t be doing that then requires lots of magic from our DB & ORM gurus. And to be fair, I’m not an expert. CouchDB is so close to being a silver bullet here that you’d have to be a fool to ignore the voices telling you that it is all too good to be true. But then again, I’ve been looking for flaws and so far have not come up with something substantial.

Sure, I have lots of unanswered questions and I’m hardly a couchdb expert since technically, any newby with more than an hour experience coding stuff for the thing outranks me here. But if you put it all together you have an easy to understand storage solution that is used successfully by others in rather large deployments that seem to be doing quite well. If there are any limits in terms of the number of nodes, the number of documents, or indeed the read/write throughput, I’ve yet to identify it. All the available documentation seems to suggest that there are no such limits, by design.

Some good links:

20 Comments

OVI Prime Place and Happy Holidays!

Wow another busy month without me posting. So, here’s what will probably be the last post of this year.

In the past few weeks we rolled out OVI Prime Places, http:/www.primeplace.ovi.com, which people may use to register their businesses on OVI maps. Check out the demo video here (click the arrows thingy for full screen).

Ovi Prime Place Introduction from beatbandit on Vimeo.

Anyway, busy few weeks fixing bugs, getting the release out and using the ops induced bureaucracy & downtime for some major refactoring for the next release. All more or less done now and quite ready for a vacation.

Happy Holidays & 2010.

No Comments

Wordpress automated backup

I finally sat down and wrote a little script to automatically backup my wordpress site and database. My site is hosted by pcextreme.nl, a nice Dutch startup providing hosting services that are perfect for a small site like mine. Part of the package they have is shell access via ssh, which makes this very straightforward. I use two scripts that do all the work:

dumpdb.sh:

mysqldump -u user -ppassword \
     -h host db --single-transaction \
     | gzip -9 > ~/wordpress.sql.gz

This script is stored in my home dir on the web server. The –single-transaction is needed to get rid of an error I ran into. I run it remotely using a second script backupsite.sh:

#!/bin/bash
BACKUPDIR=/Users/jilles/backup
date >> $BACKUPDIR/sitebackup.log
ssh user@host sh ~/dumpdb.sh
rsync -i -v -a --delete \
     user@host:~ $BACKUPDIR \
     >> $BACKUPDIR/sitebackup.log

This first backups the db remotely and then rsyncs everything I have on the server to a local directory. This local directory in turn is backed up by Time Machine, which provides me with nice snapshots of my site. For this to work without interaction, I have of course set up ssh authentication using my public key. I keep a little log in sitebackup.log to keep an eye on whether this all continues to work.

Anyway, nice opportunity to show of the wp-syntax plugin that I finally got around to installing in wordpress and which was actually the whole point of this blog post.

2 Comments

Google Go

Like every week there were the usual announcements of new stuff from Google this week. Highlight this week must be the experimental language Go. My initial response, not hindered by any knowledge about what Go is about, was: yet another language, why bother? I watched the rather interesting techtalk about this new language and I have a bit more insight now. My initial response is still prevailing, but I can see how Go could influence things in the right direction since it does do a few cool things.

Things I like about Go:

  • Channels and “go routines” aka. jobs or threads if you insist (not necessarily the OS supported ones) supported in the language, this is the essential superglue needed for parallel computing. The only mainstream language that comes close so far here is erlang, which is not exactly very popular but is being applied successfully for e.g. xmpp implementations. Anybody who has ever done any programming involving threads should be able to see the benefit of these features. I need this, yesterday.
  • Interfaces and typing. Any interface declared is automatically a type of anything that implements its contents. This is a useful formalization of duck typing (if it walks & talks like a duck, it must be a duck) that I haven’t really seen elsewhere other than scripting languages. It’s a bit of a philosophical thing. Should objects describe themselves, or or should an outsider be able to classify the world? Go takes the outsider point of view and basically allows programmers to treat objects as a Duck if they implement a quack() method. Like it or not, concepts like casting, the friendly keyword in C and untyped languages exist because of this. Go doesn’t need those.
  • Orthogonal & KISS design of the language where things can be combined freely. Some pretty cool stuff was demoed with channels, consts and other language features being combined to do pretty complicated stuff in the above linked techtalk. Separation of concern is highly useful, as any good software designer knows, and the right thing to have at the language level.
  • Implicit typing of arbitrary precision numbers, floats, etc. This is a brilliant idea. Prematurely narrowing down the precision of your floats, ints, and what not is a source of bugs. Leave it up to the compiler to figure out the right precision and leave it to static code analysis tools to identify potential performance bottlenecks that might result from this.

What I don’t like:

  • C-like syntax & weird syntax for types. C syntax was never that great (without macros, C would be unusable) and essentially most of the details were bolted on over time. Preserving the error prone & and * notations is in my view a big mistake, as are several other C constructs. They are an open invitation for programmers to make extremely hard to find mistakes.
  • Lack of generics, although they are working on a solution here apparently. Like it or not, meta programming and related subjects are enabled by this key feature. Too important to omit nowadays.
  • Yet another vm/runtime and no support for JVM, .Net, llvm, or similar run-times. In my view native is overrated (relative to dynamic compilation) and I don’t buy the “runs within 10-20% of native C code” claim (what about optimizations & 40 years of compiler tweaking?). But I’m sure that is true for the trivial code that has so far been written in Go. As is the case for trivial Java code, which under controlled circumstances kicks some ass relative to any statically compiled language BTW. But show me some real software doing real stuff.
  • Burn everything and start from scratch approach to libraries. Is that really necessary? Seriously, I spent the past few days sorting out the dependencies to external libraries I have in a ’simple’ java project (i.e. updating to more recent versions, pruning unused stuff). Contrary to the popular belief, some of those ~50 direct and transitive dependencies do some really useful things and are not just bloat. 5000 lines of Java + a few hundred thousand LOC in dependencies is hard to beat with a new language plus ~100KLOC worth of libraries. Any productivity gains would be wiped out by having to reinvent a lot of wheels. Not necessary in my view.
  • The nineteen seventies architecture for compiler and run-time. As I outlined in my last post, I believe compilation is fundamentally a run-time optimization these days. With Go, the compilation work is allegedly light to begin with due to the wise decision to keep the language simple. So why bother at all with compiling and why not just leave things to the run-time to figure out when you actually run the code? Integrate the compiler into the run-time and remove compilation as a concern for developers.

In short, nice concepts but I don’t see the need for tearing down everything we have and starting from scratch. Basically this language is scala + a couple of tweaks and minus all of the integration work and maturity. There are lots of contenders in this space and Go is just yet another one. They probably should have called it E since it is just repeating the mistakes of C, C++, and D but is not quite the same as F#.

No Comments

Maven: the way forward

A bit longer post today. My previous blog post set me off pondering on a couple of things that I have been pondering on before that sort of fit nicely together in a potential way forward. In this previous post and also this post, I spent a lot of words criticizing maven. People would be right to criticize me for blaming maven. However, that would be the wrong way to take my criticism. There’s nothing wrong with maven, it just annoys the hell out of me that it is needed and that I need to spend so much time waiting for it. In my view, maven is a symptom of a much bigger underlying problem: the java server side world (or rather the entire solution space for pretty much all forms of development) is bloated with tools, frameworks, application servers, and other stuff designed to address tiny problems with each other. Together, they sort of work but it isn’t pretty. What if we’d wipe all of that away, very much like the Sun people did when they designed Java 20 years ago? What would be different? What would be the same? I cannot of course see this topic separately from my previous career as a software engineering researcher. In my view there have been a lot of ongoing developments in the past 20 years that are now converging and morphing into something that could radically improve over the existing state of the art. However, I’m not aware of any specific projects taking on this issue in full even though a lot of people are working on parts of the solution. What follows is essentially my thoughts on a lot of topics centered around taking Java (the platform, not necessarily the language) as a base level and exploring how I would like to see the platform morph into something worthy of the past 40 years of research and practice.

Architecture

Lets start with the architecture level. Java packages were a mistake, which is now widely acknowledged. .Net namespaces are arguably better and OSGi bundles with explicit required and provided APIs as well as API versioning are better still. To scale software into the cloud where it must coexist with other software, including different (or identical) versions of itself, we need to get a grip on architecture.

The subject has been studied extensively (see here fore a nice survey of some description languages) and I see OSGi as the most successful implementation to date that preserves important features that most other development platforms currently lack, omit, or half improvise. The main issue with OSGi is that it layers stuff on top of Java but is not really a part of it. Hence you end up with a mix of manifest files that go into jar files; annotations that go into your source code; and cruft in the form of framework extensions to hook everything up, complete with duplicate functionality for logging, publish subscribe patterns, and even web service frameworks. The OSGi people are moving away towards a more declarative approach. Bring this to its ultimate conclusion and you end up with language level support for basically all that OSGi is trying to do. So, explicit provided and required APIs, API versioning, events, dynamic loading/unloading, isolation.

A nice feature of Java that OSGi relies on is the class loader. When used properly, it allows you to create a class loader, let it load classes, execute the functionality, and then destroy the class loader and all the stuff it loaded which is then garbage collected. This is nice for both dynamic loading and unloading of functionality as well as isolating functionality (for security and stability reasons). OSGi heavily depends on this feature and many application servers try to use this. However, the mechanisms used are not exactly bullet proof and there exist enormous problems with e.g. memory leaking which causes engineers to be very conservative with relying on these mechanisms in a live environment.

More recently, people have started to use dependency injection where the need for something is expressed in the code (e.g. with an annotation) or externally in some configuration file). Then at run time a dependency injecting container tries to fulfill the dependencies by creating the right objects and injecting dependencies. Dependency injection improves testability and modularization enormously.

A feature in maven that people seem to like is its way of dealing with dependencies. You express what you need in the pom file and maven fetches the needed stuff from a repository. The maven, osgi, & spring combo, is about to happen. When it does, you’ll be specifying dependencies in four different places: java imports; annotations, the pom file, and the osgi manifest. But still, I think the combined feature set is worth having.

Language

Twenty years ago, Java was a pretty minimalistic language that took basically the best of 20 years (before that) of OO languages and kept a useful subset. Inevitably, lots got discarded or not considered at all. Some mistakes were made, and the language over time absorbed some less than perfect versions of the stuff that didn’t make it. So, Java has no language support for properties, this was sort of added on by the setter/getter convention introduced in JavaBeans. It has inner classes instead of closures and lambda functions. It has no pure generics (parametrizable types) but some complicated syntactic sugar that gets compiled to non generic code. The initial concurrent programming concepts in the language were complex, broken, and dangerous to use. Subsequent versions tweaked the semantics and added some useful things like the java concurrent package. The language is overly verbose and 20 years after the fact there is now quite a bit of competition from languages that basically don’t suffer from all this. The good news is that most of those have implementations on top of the JVM. Lets not let this degenerate into a language war but clearly the language needs a proper upgrade. IMHO scala could be a good direction but it too has already some compromise embedded and lacks support for the architectural features discussed above. Message passing and functional programming concepts are now seen as important features for scalability. These are tedious at best in Java and Scala supports these well while simultaneously providing a much more concise syntax. Lets just say a replacement of the Java language is overdue. But on the other hand it would be wrong to pick any language as the language. Both .Net and the JVM are routinely used as generic runtimes for all sorts of languages. There’s also the LLVM project, which is a compiler tool chain that includes dynamic compilation in a vm as an option for basically anything GCC can compile.

Artifacts should be transient

So we now have a hypothetical language, with support for all of the above. Lets not linger on the details and move on to deployment and run time. Basically the word compile comes from the early days of computing when people had to punch holes into cards and than compile those into stacks and hand feed them to big, noisy machines. In other words, compilation is a tedious & necessary evil. Java popularized the notion of just in time compilation and partial, dynamic compilation. The main difference here is that just in time compilation merely moves the compilation step to the moment the class is loaded whereas dynamic compilation goes a few steps further and takes into account run-time context to decide if and how to compile. IDEs tend to compile on the fly while you edit. So why, bother with compilation after you finish editing and before you need to load your classes? There is no real technical reason to compile ahead of time beyond the minor one time effort that might affect start up. You might want the option to do this but it should not default to doing it.

So, for most applications, the notion of generating binary artifacts before they are needed is redundant. If nothing needs to be generated, nothing needs to be copied/moved either. This is true for both compiled or interpreted and interpreted languages. A modern Java system basically uses some binary intermediate format that is generated before run-time. That too is redundant. If you have dynamic compilation, you can just take the source code and execute it (while generating any needed artifacts for that on the fly). You can still do in IDE compilation for validation and static analysis purposes. The distinction between interpreted and static compiled languages has become outdated and as scripting languages show, not having to juggle binary artifacts simplifies life quite a bit. In other words, development artifacts (other than the source code) are transient and with the transformation from code to running code automated and happening at run time, they should no longer be a consideration.

That means no more build tools.

Without the need to transform artifacts ahead of run-time, the need for tools doing and orchestrating this also changes. Much of what maven does is basically generating, copying, packaging, gathering, etc. artifacts. An artifact in maven is just a euphemism for a file. Doing this is actually pretty stupid work. With all of those artifacts redundant, why keep maven around at all? The answer to that is of course testing and continuous integration as well as application life cycle management and other good practices (like generating documentation). Except, lots of other different tools are involved with that as well. Your IDE is where you’d ideally review problems and issues. Something like Hudson playing together with your version management tooling is where you’d expect continuous integration to take place and application life cycle management is something that is part of your deployment environment. Architectural features of the language and run-time combined with good built in application and component life cycle removes much of the need of external tooling to support all this and improves interoperability.

Source files need to go as well

Visual age and smalltalk pioneered the notion of non file based program storage where you modify the artifacts in some kind of DB. Intentional programming research basically is about the notion that programs are essentially just interpretations of more abstract things that get transformed (just in time) to executable code or into different views (editable in some cases). Martin Fowler has long been advocating IP and what he refers to as the language workbench. In a nut shell, if you stop thinking of development as editing a text file and start thinking of it as manipulating abstract syntax trees with a variety of tools (e.g. rename refactoring), you sort of get what IP and language workbenches are about. Incidentally, concepts such as APIs, API versions, provided & required interfaces are quite easily implemented in a language workbench like environment.

Storage, versioning, access control, collaborative editing, etc.

Once you stop thinking in terms of files, you can start thinking about other useful features (beyond tree transformations), like versioning or collaborative editing for example. There have been some recent advances in software engineering that I see as key enablers here. Number 1 is that version management systems are becoming decentralized, replicated databases. You don’t check out from git, you clone the repository and push back any changes you make. What if your IDE were working straight into your (cloned) repository? Then deployment becomes just a controlled sequence of replicating your local changes somewhere else (either push based, pull based, or combinations of that. A problem with this is of course that version management systems are still about manipulating text files. So they sort of require you to serialize your rich syntax trees to text and you need tools to unserialize them in your IDE again. So, text files are just another artifact that needs to be discarded.

This brings me to another recent advance: couchdb. Couchdb is one of the non relational databases currently experiencing lots of (well deserved) attention. It doesn’t store tables, it stores structured documents. Trees in other words. Just what we need. It has some nice properties built in, one of which is replication. Its built from the ground up to replicate all over the globe. The grand vision behind couchdb is a cloud of all sorts of data where stuff just replicates to the place it is needed. To accomplish this, it builds on REST, map reduce, and a couple of other cool technology. The point is, couchdb already implements most of what we need. Building a git like revision control system for versioning arbitrary trees or collections of trees on top can’t be that challenging.

Imagine the following sequence of events. Developer A modifies his program. Developer B working on the same part of the software sees the changes (real time of course) and adds some more. Once both are happy they mark the associated task as done. Somewhere on the other side of the planet a test server locally replicates the changes related to the task and finds everything is OK. Eventually the change and other changes are tagged off as a new stable release. A user accesses the application on his phone and at the first opportunity (i.e. connected), the changes are replicated to his local database. End to end the word file or artifact appears nowhere. Also note that the bare minimum of data is transmitted: this is as efficient as it is ever going to get.

Conclusions

Anyway, just some reflections on where we are and where we need to go. Java did a lot of pioneering work in a lot of different domains but it is time to move on from the way our grand fathers operated computers (well, mine won’t touch a computer if he can avoid it but that’s a different story). Most people selling silver bullets in the form of maven, ruby, continuous integration, etc. are stuck in the current thinking. These are great tools but only in the context of what I see as a deeply flawed end to end system. A lot of additional cruft is under construction to support the latest cloud computing trends (which is essentially about managing a lot of files in a distributed environment). My point here is that taking a step back and rethinking things end to end might be worth the trouble. We’re so close to radically changing the way developers work here. Remove files and source code from the equation and what is left for maven to do? The only right answer here is nothing.

Why do I think this needs to happen: well, developers are currently wasting enormous amounts of time on what are essentially redundant things rather than developing software. The last few weeks were pretty bad for me, I was just handling deployment and build configuration stuff. Tedious, slow, and maven is part of this problem.

Update 26 October 2009

Just around the time I was writing this, some people decided to come up with Play, a framework + server inspired by Python Django that preserves a couple of cool features. The best feature: no application server restarts required, just hit F5. Works for Java source changes as well. Clearly, I’m not alone in viewing the Java server side world as old and bloated. Obviously it lacks a bit in functionality. But that’s easily fixed. I wonder how this combines with a decent dependency injection framework. My guess is not well, because dependency injection frameworks require a context (i.e.) state to be maintained and Play is designed to be stateless (like Django). Basically, each save potentially invalidates the context require a full reload of that as well (i.e. a server restart). Seems the play guys have identified the pain point in Java: server side state comes at a price.

, , , ,

1 Comment

maven: good ideas gone wrong

I’ve had plenty of time to reflect on the state of server side Java, technology, and life in general this week. The reason for all this extra ‘quality’ time was because I was stuck in an endless loop waiting for maven to do its thing, me observing it failed in subtly different ways, tweaking some more, and hitting arrow up+enter (repeat last command) and fiddling my thumbs for two more minutes. This is about as tedious as it sounds. Never mind the actual problem, I fixed it eventually. But the key thing to remember here is that I lost a week of my life on stupid book keeping.

On to my observations:

  • I have more xml in my maven pom files than I ever had with my ant build.xml files four years ago, including running unit tests, static code checkers, packaging jars & installers, etc. While maven does a lot of things when I don’t need them to happen, it seems to have an uncanny ability to not do what I want when I need it to or to first do things that are arguably redundant and time consuming.
  • Maven documentation is fragmented over wikis, javadoc of diverse plugins, forum posts, etc. Google can barely make sense of it. Neither can I. Seriously, I don’t care about your particular ideology regarding elegance: just tell me how the fuck I set parameter foo on plugin bar and what its god damn default is and what other parameters I might be interested in exist.
  • For something that is supposed to save me time, I sure as hell am wasting a shit load of time on making it do what I want and watching it do what it does (or not), and fixing the way it works. I had no idea compiling & packaging less than 30 .java files could be so slow.
  • Few people around me dare to touch pom files. It’s like magic and I hate magicians myself. When it doesn’t work they look at me to fix it. I’ve been there before and it was called ant. Maven just moved the problem and didn’t solve a single problem I had five years ago while doing the exact same shit with ant. Nor did it make it any easier.
  • Maven compiling, testing, packaging and deploying defeats the purpose of having incremental compilation and dynamic class (re)loading. It’s just insane how all this application server deployment shit throws you back right to the nineteen seventies. Edit, compile, test, integration-test, package, deploy, restart server, debug. Technically it is possible to do just edit, debug. Maven is part of the problem here, not of the solution. It actually insists on this order of things (euphemistically referred to as a life cycle) and makes you jump through hoops to get your work done in something resembling real time.
  • 9 out of 10 times when maven enters the unit + integration-test phase, I did not actually modify any code. Technically, that’s just a waste of time (which my employer gets to pay for). Maven is not capable of remembering the history of what you did and what has changed since the last time you ran it so like any bureaucrat it basically does maximum damage to compensate for its ignorance.
  • Life used to be simple with a source dir, an editor, a directory of jars and an incremental compiler. Back in 1997, java recompiles took me under 2 seconds on a 486, windows NT 3.51 machine with ‘only’ 32 MB, ultra edit, an IBM incremental java compiler, and a handful of 5 line batch files. Things have gotten slower, more tedious, and definitely not faster since then. It’s not like I have much more lines of code around these days. Sure, I have plenty of dependencies. But those are run-time resolved, just like in 1997, and are a non issue at compile time. However, I can’t just run my code but I have to first download the world, wrap things up in a jar or war, copy it to some other location, launch some application server, etc. before I am in a position to even see if I need to switch back to my editor to fix some minor detail.
  • Your deployment environment requires you to understand the ins and outs of how where stuff needs to be deployed, what directory structures need to be there, etc. Basically if you don’t understand this, writing pom files is going to be hard. If you do understand all this, pom files won’t save you much time and will be tedious instead. You’d be able to write your own bash scripts, .bat files or ant files to achieve the same goals. Really, there’s only so many ways you can zip a directory into a .jar or .war file and copy them over from A to B.
  • Maven is brittle as hell. Few people on your project will understand how to modify a pom file. So they do what they always do, which is copy paste bits and pieces that are known to more or less do what is needed elsewhere. The result is maven hell. You’ll be stuck with no longer needed dependencies, plugins that nobody has a clue about, redundant profiles for phases that are never executed, half broken profiles for stuff that is actually needed, random test failures. It’s ugly. It took me a week to sort out the stinking mess in the project I joined a month ago. I still don’t like how it works. Pom.xml is the new build.xml, nobody gives a shit about what is inside these files and people will happily copy paste fragments until things work well enough for them to move on with their lives. Change one tiny fragment and all hell will break loose because it kicks the shit out of all those wrong assumptions embedded in the pom file.

Enough whining, now on to the solutions.

  • Dependency management is a good idea. However, your build file is the wrong place to express those. OSGI gets this somewhat right, except it still externalizes dependency configuration from the language. Obviously, the solution is to integrate the component model into the language: using something must somehow imply depending on something. Possibly, the specific version of what you depend on is something that you might centrally configure but beyond that: automate the shit out of it, please. Any given component or class should be self describing. Build tools should be able to figure out the dependencies without us writing them down. How hard can it be? That means some none existing language to supersede the existing ones needs to come in existence. No language I know of gets this right.
  • Compilation and packaging are outdated ideas. Basically, the application server is the run-time of your code. Why doesn’t it just take your source code, derive its dependencies and runs it? Every step in between editing and running your code is a potential to introduce mistakes & bugs. Shortening the distance between editor and run-time is good. Compilation is just an optimization. Sure, it’s probably a good idea for the server to cache the results somewhere. But don’t bother us with having to spoon feed it stupid binaries in some weird zip file format. One of the reasons scripting languages are so popular is because it reduces the above mentioned cycle to edit, F5, debug. There’s no technical reason whatsoever why this would not be possible with statically compiled languages like java. Ideally, I would just tell the application server the url of the source repository, give it the necessary credentials and I would just be alt tabbing between my browser and my editor. Everything in between that is stupid work that needs to be automated away.
  • The file system hasn’t evolved since the nineteen seventies. At the intellectual level, you modify a class or lambda function or whatever and that changes some behavior in your program, which you then verify. That’s the conceptual level. In practice you have to worry about how code gets translated into binary (or asciii) blobs on the file system, how to transfer those blobs to some repository (svn, git, whatever), then how to transfer them from wherever they are to wherever they need to be, and how they get picked up by your run-time environment. Eh, that’s just stupid book keeping, can I please have some more modern content management here (version management, rollback, auditing, etc.)? Visual age actually got this (partially) right before it mutated into eclipse: software projects are just databases. There’s no need for software to exist as text files other than nineteen seventies based tool chains.
  • Automated unit, integration and system testing are good ideas. However, squeezing them in between your run-time and your editor is just counter productive. Deploy first, test afterwards, automate & optimize everything in between to take the absolute minimum of time. Inserting automated tests between editing and manual testing is a particularly bad idea. Essentially, it just adds time to your edit debug cycle.
  • XML files are just a fucking tree structures serialized in a particularly tedious way. Pom files are basically arbitrary, schema less xml tree-structures. It’s fine for machine readable data but editing it manually is just a bad idea. The less xml in my projects, the happier I get. The less I need to worry about transforming tree structures into object trees, the happier I get. In short, lets get rid of this shit. Basically the contents of my pom files is everything my programming language could not express. So we need more expressive programming languages, not entirely new ones to complement the existing ones. XML dialects are just programming languages without all of the conveniences of a proper IDE (debuggers, code completion, validation, testing, etc.).

Ultimately, maven is just a stop gap. And not even particularly good at what it does.

update 27 October 2009

Somebody produced a great study on how much time is spent on incremental builds with various build tools. This stuff backs my key argument up really well. The most startling out come:

Java developers spend 1.5 to 6.5 work weeks a year (with an average of 3.8 work weeks, or 152 hours, annually) waiting for builds, unless they are using Eclipse with compile-on-save.

I suspect that where I work, we’re close to 6.5 weeks. Oh yeah, they single out maven as the slowest option here:

It is clear from this chart that Ant and Maven take significantly more time than IDE builds. Both take about 8 minutes an hour, which corresponds to 13% of total development time. There seems to be little difference between the two, perhaps because the projects where you have to use Ant or Maven for incremental builds are large and complex.

So anyone who still doesn’t get what I’m talking about here, build tools like maven are serious time wasters. There exist tools out there that reduce this time to close to 0. I repeat, Pyhton Django = edit, F5, edit F5. No build/restart time whatsoever.

, , , , ,

7 Comments

N900 & Slashdot

I just unleashed the stuff below in a slashdot thread. 10 years ago I was a regular there (posting multiple times per day) and today I realized that I hadn’t actually even bothered to sign into slashdot since buying a mac a few months ago. Anyway, since I spent time writing this I might as well repost here. On a side note, they support OpenID for login now! Cool!

…. The next-gen Nokia phone [arstechnica.com] on the other hand (successor to the N900) will get all the hardware features of the iPhone, but with the openness of a linux software stack. Want to make an app that downloads podcasts? Fine! Want to use your phone as a modem? No problem! In fact, no corporation enforcing their moral or business rules on how you use your phone, or alienation of talented developers [macworld.com]!

You might make the case that the N900 already has the better hardware when you compare it to the iphone. And for all people dismissing Nokia as just a hardware company, there’s tons of non trivial Nokia IPR in the software stack as well (not all OSS admittedly), that provides lots of advantages in the performance or energy efficiency domain; excellent multimedia support (something a lot of smart phones are really bad at), hardware acceleration, etc. Essentially most vendors ship different combinations of chips coming from a very small range of companies so from that point of view it doesn’t really matter what you buy. The software on top makes all the difference and the immaturity of newer platforms such as Android can be a real deal breaker when it comes to e.g. battery life, multimedia support, support for peripherals, etc. There’s a difference between running linux on a phone and running it well. Nokia has invested heavily in the latter and employs masses of people specialized in tweaking hardware and software to get the most out of the hardware.

But the real beauty of the N900 for the slashdot crowd is simply the fact that it doesn’t require hacks or cracks: Nokia actively supports & encourages hackers with features, open source developer tools, websites, documentation, sponsoring, etc. Google does that to some extent with Android but the OS is off limits for normal users. Apple actively tries to stop people from bypassing the appstore and is pretty hostile to attempts to modify the OS in ways they don’t like. Forget about other platforms. Palm technically uses linux but they are still keeping even the javascript + html API they have away from users. It might as well be completely closed source. You wouldn’t know the difference.

On the other hand, the OS on the N900 is Debian. Like on Debian, the package manager is configured in /etc/sources.list which is used by dpkg and apt-get, which work just as you would expect on any decent Debian distribution. You have root access, therefore you can modify any file, including sources.list. Much of Ubuntu actually compiles with little or no modification and most of the problems you are likely to encounter relate to the small screen size. All it takes to get to that software is pointing your phone at the appropriate repositories. There was at some point a Nokia sponsored Ubuntu port to ARM even, so there is no lack of stuff that you can install. Including stuff that is pretty pointless on a smart phone (like large parts of KDE). But hey, you can do it! Games, productivity tools, you name it and there probably is some geek out there who managed to get it to build for Maemo. If you can write software and package it as a Debian package and can cross compile it to ARM (using the excellent OSS tooling of course), there’s a good chance it will just work.

So, you can modify the device to your liking at a level no other mainstream vendor allows. Having a modifiable Debian linux system with free access to all of the OS on top of what is essentially a very compact touch screen device complete with multiple radios (bluetooth, 3G, wlan), sensors (GPS, motion, light, sound), graphics, dsp, should be enough to make any self respecting geek drool.

Now with the N900 you get all of that, shipped as a fully functional smart phone with all of the features Nokia phones are popular for such as excellent voice quality and phone features, decent battery life (of course with all the radios turned on and video & audio playing none stop, your mileage may vary), great build quality and form factor, good support for bluetooth and other accessories, etc. It doesn’t get more open in the current phone market currently and this is still the largest mobile phone manufacturer in the world.

In other words, Nokia is sticking out its neck for you by developing and launching this device & platform while proclaiming it to be the future of Nokia smart phones. It’s risking a lot here because there are lots of parties in the market that are in the business of denying developers freedom and securing exclusive access to mobile phone software. If you care about stuff like this, vote with your feet and buy this or similarly open (suggestions anyone?) devices from operators that support instead of prevent you from doing so. If Nokia succeeds here, that’s a big win for the OSS community.

Disclaimer: I work for Nokia and I’m merely expressing my own views and not representing my employer in any way. That being said, I rarely actively promote any of our products and I choose to do so with this one for one reason: I believe every single word of it.

, , , , , , ,

No Comments