Enforcing code conventions in Java

After many years of working with Java, I finally got around to enforcing code conventions in our project. The problem with code conventions is not agreeing on them (actually this is hard since everybody seems to have their own preferences but that’s beside the point) but enforcing them. For the purpose of enforcing conventions you can choose from a wide variety of code checkers such as checkstyle, pmd, and others. My problem with this approach is that checkers usually end up being a combination of too strict, too verbose, or too annoying. In any case nobody ever checks their output and you need to have the discipline to fix things yourself for any issues detected. Most projects I’ve tried checkstyle on, it finds thousands of stupid issues using the out of the box configuration. Pretty much every Java project I’ve ever been involved with had somewhat vague guidelines on code conventions and a very loose attitude to enforcing these. So, you end up with loads of variation in whitespace, bracket placement, etc. Eventually people stop caring. It’s not a problem worthy of a lot of brain cycles and we are all busy.

Anyway, I finally found a solution to this problem that is completely unintrusive: format source code as part of your build. Simply add the following blurb to your maven build section and save some formatter settings in XML format in your source tree. It won’t fix all your issues but formatting related diffs should be a thing of the past. Either your code is fine, in which case it will pass the formatter unmodified or you messed up, in which case the formatter will fix it for you.

<plugin><!-- mvn java-formatter:format -->


This plugin formats the code using the specified formatting settings XML file and it executes every build before compilation. You can create the settings file by exporting the Eclipse code formatter settings. Intellij users can use these settings as well since recent versions support the eclipse formatter settings file format. The only thing you need to take care off is the organize imports settings in both IDEs. Eclipse comes with a default configuration that is very different from what Intellij does and it is a bit of a pain to fix on the Intellij side. Eclipse has a notion of import groups that are each sorted alphabetically. It comes with four of these groups that represent imports with different prefixes so, javax.* and java.*, etc. are different groups. This behavior is very tedious to emulate in Intellij and out of the scope of the exported formatter settings. For that reason, you may want to consider modifying things on the Eclipse side and simply remove all groups and simply sort all imports alphabetically. This behavior is easy to emulate on Intellij and you can configure both IDEs to organize imports on save, which is good practice. Also, make sure to not allow .* imports and only import what you actually use (why load classes you don’t need?). If everybody does this, the only people causing problems will be those with poorly configured IDEs and their code will get fixed automatically over time.

Anyone doing a mvn clean install to build the project will automatically fix any formatting issues that they or others introduced. Also, the formatter can be configured conservatively and if you set it up right, it won’t mess up things like manually added new lines and other manual formatting that you typically want to keep. But it will fix the small issues like using the right number of spaces (or tabs, depending on your preferences), having whitespace around brackets, braces, etc. The best part: it only adds about 1 second to your build time. So, you can set this up and it basically just works in a way that is completely unintrusive.

Compliance problems introduced by people with poor IDE configuration skills/a relaxed attitude to code conventions (you know who you are) will automatically get fixed this way. Win win. There’s always the odd developer out there who insists on using vi, emacs, notepad, or something similarly archaic that most IDE users would consider cruel and unusual punishment. Not a problem anymore, let them. These masochists will notice that whatever they think is correctly formatted Java might cause the build to create a few diffs on their edits. Ideally, this happens before they commit. And if not, you can yell at them for committing untested code: no excuses for not building your project before a commit.

Accessing Elasticsearch clusters via a localhost node

I’m a regular at the Elasticsearch meetup here in Berlin and there are always lots of recent converts that are trying to wrap their head around the ins and outs of what it means to run an elasticsearch cluster. One issue that seems to baffle a lot of new users is the question of which node in the cluster has the master role. The correct answer is that it depends on what you mean by master. Yes, there is a master node in elasticsearch but that does not mean what you think it means: it merely means that a single node is elected to be the node that holds the truth about which nodes have which data and crucially where the master copies of shards live. What it does NOT mean is that that node has the master copy of all the data in the cluster. It also does NOT mean that you have to talk to specifically this node when writing data. Data in elasticsearch is sharded and replicated and shards and their replicas are copied all over the cluster and out of the box clients can talk to any of the nodes for both read and write traffic. You can literally put a load balancer in front of your cluster and round robin all the requests across all the nodes.

When nodes go down or are added to the cluster, shards may be moved around. All nodes synchronize information about which nodes have which shards and replicas of those shards. The elasticsearch master merely is the ultimate authority on this information. Elasticsearch masters are elected at runtime by the nodes in the cluster. So, by default, any of the nodes in the cluster can become elected as the master. By default, all nodes know how to look up information about which shards live where and know how to route requests around in the cluster.

A common pattern in larger cluster is to reserve the master role for nodes that do not have any data. You can specialize what nodes do via configuration. Having three or more such nodes means that if one of them goes down, the remaining ones can elect a new master and the rest of the cluster can just continue spinning. Having an odd number of nodes is a good thing when you are holding elections since you always have an obvious majority of n/2 + 1. With an even number you can end up with two equally sized network partitions.

The advantage of not having data on a node is that it is far less likely for such nodes to get into trouble with e.g. OutOfMemoryExceptions, excessive IO that slows the machine, or excessive CPU usage due to expensive queries. If that happens, the availability of the node becomes an issue and the risk emerges for bad things to happen. This is a bad thing on a node that is supposed to hold the master data for your cluster configuration. It becoming unavailable will cause other nodes to elect a new node as the maste. There’s a fine line between being unavailable and slow to respond, which makes this a particularly hard problem. The now, infamous call me maybe article highlights several different cluster failure scenarios abd most of these involve some sort of network partioning due to temporary master node failures or unavailability. If you are worried about this, also be sure to read the Elasticsearch response to this article. The botton line is that most of the issues have by now been addressed and are now far less likely to become an issue. Also, if you have declined to update to Elasticsearch 1.4.x with your production setup, now might be a good time to read up on the many known ways in which things can go bad for you.

In any case, empty nodes still do useful work. They can for example be used to serve traffic to elasticsearch clients. Most things that happen in Elasticsearch involve internal node communication since the data can be anywhere in the cluster. So, there are typically two or more network hops involved one from the client to what is usually called a routing node and from there to any other nodes that hold shards needed to complete the request that perform the logic for either writing new data to the shard or retrieving data from the shard.

Another common pattern in the Elasticsearch world is to implements clients in Java and make the embedd a cluster node inside the process. This embedded node is typically configured to be a routing only node. The big advantage of this is that it saves you from having to do one network hop. The embedded node already knows where all the shards live so application servers with an embedded node already know where all the shards are in the cluster and can talk directly to the nodes with these shards using the more efficient network protocol that the Elasticsearch nodes use to communicate with each other.

A few months ago in one of the meetups I was discussing this topic with one of the organizers of the meetup, Felix Gilcher. He mentioned an interesting variant of this pattern. Embedding a node inside an application only works for Java nodes and this is not possible if you use something else. Besides, dealing with the Elasticsearch internal API can be quite a challenge as well. So it would be convenient if non Java applications could get similar benefits. Then he suggested the obvious solution that actually you get most of the same benefits of embedding a node simply running a standalone, routing only elasticsearch node on each application server. The advantage of this approach is that each of the application servers communicates with elasticsearch via localhost, which is a lot faster than sending REST requests over the network. You still have a bit of overhead related to serializing and deserializing requests and doing the REST requests. However, all of that happens on localhost and you avoid the network hop. So, effectively, you get most of the benefits of the embedded node approach.

We recently implemented this at Inbot. We now have a cluster of three elasticsearch nodes and two application servers that each run two additional nodes that talk to the three nodes. We use a mix of Java, Javascript and ruby components on our server and doing this allows us to keep things simple. The eleasticsearch nodes on the application server have a comparatively small heap of only 1GB and typically consume few resources. We could probably reduce the heap size a bit further to 512MB or even 256MB since all these nodes do is pass around requests and data from the cluster to the application server. However, we have plenty of memory and have so far had little need to tune this. Meanwhile, our elasticsearch cluster nodes run on three fast 32GB machines and we allocate half of the memory for heap and reserve the rest for file caching (as per the Elasticsearch recommendations). This works great and it also simplifies application configuration since you can simply configure all applications to talk to localhost and elasticsearch takes care of the cluster management.

Eventual Consistency Now! using Elasticsearch and Redis

Elasticsearch promises real-time search and nearly delivers on this promise. The problem with ‘nearly; is that in interactive systems, it is actually unacceptable to not have user changes reflect in the any query results. Eventual consistency is nice but it also means occasionally being inconsistent, which is not so nice for users, or worse, product managers, who typically don’t understand these things and report them as bugs. At Inbot, this aspect of using Elasticsearch has been keeping us busy. It would be awfully convenient if it never returned stale data.

Mostly things actually work fine but when a user updates something and then within a second navigates back to a list of stuff that includes what he/she just updated, chances are that it still has the old version because elasticsearch has not yet committed the change to the index. In any interactive system this is going to be a an issue and one way or another, a solution is needed. The reality is that elasticsearch is an eventually consistent cluster when it comes to search and not a proper transactional store that is immediately consistent after modifications. And while it is reasonably good at catching up in a second, that leaves plenty of room for inconsistencies to surface. While you can immediately get any changed document, it actually takes a bit of time for search results to get updated as well. Out of the box, the commit frequency is once every second, which is enough time for a user to click something and then something else and see results that are inconsistent with actions he/she just performed.

We started addressing this with a few client side hacks like simply replacing list results with what we just edited via the API, updating local caches, etc. Writing such code is error prone and tedious. So we came up with a better solution: use Redis. The same DAO I described in my recent article on optimistic locking with elasticsearch also stores the id of any modified documents in a shortlived data structure in Redis. Redis provides in memory data structures such as lists, sets, and hash maps and comes with a ton of options. The nice thing about Redis is that it scales quite well for small things and has a very low latency API. So, it is quite cheap to use it for things like caching.

So, our idea was very simple: use Redis to keep track of recently changed documents and change any results that include these objects on the fly with the latest version of the object. The bit of Java code that we use to talk to Redis uses something called JedisPool. However, this should pretty much work in a similar way from other languages.

try(Jedis jedis = jedisPool.getResource()) {
  Transaction transaction = jedis.multi();
  transaction.lpush(key, value);
  transaction.ltrim(key, 0, capacity); // right away trim to capacity so that we can pretend it is a circular list
  transaction.expire(key, expireInSeconds); // don't keep the data forever

This creates a circular list with a fixed length that expires after a few seconds. We use it to store the ids of any document ids we modify for a particular index or belonging to a particular user. Using this, we can easily find out when returning results from our API whether we should replace some of the results with newer versions. Having the list expire after a few seconds means that it is enough for elasticsearch to catch up and the list will stay short or will not be there at all. Under continuous load, it will simply be trimmed to the latest ids that were added (capacity). So, it stays fast as well.

Each of our DAOs exposes an additional function that tells you which document ids have been recently modified. When returning results, we loop over the results and check the id against this list and swap in the latest version. Simple, easy to implement, and it solves most of the problem and more importantly, it solves it on the server and does not burden our API users or clients with this.

However, It doesn’t fix the problem completely. Your query may match the old document but not the new document and replacing the old document with the new document in the results will make it appear that the changed document actually still matches the query. But it is a lot better than showing stale data to the user. Also, we’re not handling deletes currently but that is trivially supported with a similar solution.

Update 2016-02-10 I recently released our Elasticsearch client code on github. It includes support for the strategy I outline above; and loads more goodies. Simply create a dao using the CrudOperationsFactory, be sure to enable redis caching there, and use the modifiedIds() on the dao to retrieve a list of recently modified ids. If you use the pagedSearch or iterableSearch methods on the dao, you can easily create a ProcessingSearchResponse that applies a lambda function that does the lookup if the id is contained in these modified ids.

Optimistic locking for updates in Elasticsearch

In a post in 2012, I expanded a bit on the virtues of using elasticsearch as a document store, as opposed to using a separate database. To my surprise, I still get hits on that article on a daily basis. This indicates that there is some interest in using elasticsearch as described there. So, I’m planning to start blogging a bit more again after more or less being too busy with building Inbot to do so since last February.

Continue reading “Optimistic locking for updates in Elasticsearch”

Nokia Android Phone

It appears that hell is freezing over and there are now strong rumors that on the evening of the completion of the deal with Microsoft, Nokia is going to push out an Android phone.

I’ve been more than a bit puzzled about this apparent move for a few weeks but I think I’ve figured out a possible universe where this actually makes sense. Disclaimer, I’ve been outside of Nokia for quite some time now and don’t have any information that I shouldn’t be sharing. I’m just speculating.

A few days ago Ars Technica published an article where they were recommending that in fact Nokia should not be forking Android, which is what it appears to be doing. One of the big arguments against this was that this isn’t working that well for Amazon either. Amazon has not licensed Google Play Services, which basically is what you need to license to get access to the play store, chrome, google maps, and all the rest of the Google circus. So while Amazon’s Kindles with Android are perfectly nice tablets to use, most Android apps are not available for it because of compatibility issues and because most app developers don’t look beyond the Google store. Blackberry has exactly the same problem (in so far they still have any ambitions in this respect).

Companies like HTC and Samsung have signed licensing deals with Google and this means they have to ship whatever Google tells them to ship and in fact the software updates for anything related to Play Services completely bypass whatever firmware these companies ship and instead updates over the air constantly. This is Google’s fix for the problem that these companies are normally hopelessly behind with updates. I recently played with a Samsung and most of their added value software wise is dubious at best. Most of it is outright crap and most tech savvy users prefer stock android. I know I like my Nexus 5 a lot better at least. Samsung is a hardware manufacturer without a solid software play. Amazon doesn’t want to be in that position, for them the software and hardware business is just a means towards an end: selling Amazon content. They compete with Google on this front and for this reason a deal between the two is unlikely.

So, I was thinking: exactly. It doesn’t make sense for Amazon to be doing this alone. Amazon needs a partner. What if that partner was Nokia + Microsoft? That would change the game substantially.

Amazon has already done a lot of work of trying to provide an implementation of Google’s proprietary APIs. Amazon is already a licensee of Nokia maps and together they could knock up an ecosystem that is big enough to convince application developers that it’s worth porting over to their app store. Microsoft and Nokia need to compete with Android not based on the notion that it is a better platform (because arguably it is not) but primarily based on the notion that it’s app store is filled with third party goodies. It’s the one thing that comes up in every review of a windows phone, blackberry (throwing them in for good measure), and Amazon device. Amazon + Nokia + Microsoft could fix this together. If you fix it for (very) low end phones, you can shove tens of millions of devices into the market in a very short time. That creates a whole new reality.

It seems that is exactly what Nokia is doing (if the rumors and screenshots are right): a low end Android phone with a windows phone like shell and without any of the Google services. One step up from this would be open sourcing the API layer that Amazon has done to provide compatibility with Google’s proprietary play services but instead plugged into competing services from Nokia, Microsoft, and Amazon. That would also be portable to other platforms. Other platforms like for example windows phone that also has had some app store related challenges. Microsoft actually has a lot of code that already makes a lot of sense on Android. For example, mono runs C# and other .Net stuff just fine on Android. With a bit of work, a lot of code could be ported over quite easily. Also, Microsoft and Nokia currently have a lot of Android manufacturers as paying customers. All they are currently getting is a license for the patents they are infringing on. And don’t forget that a lot of Android manufacturers are not necessarily happy with the power grab Google has been executing with Android. Their Play Services is a classic bait and switch strategy where they lured licensees in with open source which is now slowly being replaced with Google proprietary code. That’s why Samsung is making a big push with Tizen in the low end market this year. And it is also why people are eying Ubuntu, Firefox OS, and Sailfish as alternatives to Google.

In short, I’d be very surprised if Nokia was doing this by itself just before it sells the whole phone division. It doesn’t make sense. So, Microsoft has to be in it. And the only way that makes sense is if they want to take this all the way.

Will it work? I don’t know. I’ve seen both Microsoft and Nokia shoot themselves in their collective foots more than enough over the past few years. Both companies have done some amazingly stupid things. There is plenty of room for them to mess this up and they don’t have history on their side at this point. But it could work if they get their act together.


During our recent acquisition, we had to do a bit of due diligence as well to cover various things related to the financials, legal strucuturing, etc. of Localstream. Part of this process was also doing a license review of our technical assets.

Doing license reviews is one of those chores that software architects are forced to do once in a while. If you are somewhat knowledgable about open source licensing, you’ll know that there are plenty of ways that companies can get themselves in trouble by e.g. inadvertently licensing their code base under GPLv3 simply by using similarly licensed libraries, violating the terms of licenses, reusing inappropriately licensed Github projects, etc. This is a big deal in the corporate world because it exposes you to nasty legal surprises. Doing a license review basically means going through the entire list of dependencies and transitive dependencies (i.e. dependencies of the dependencies) and reviewing the way these are licensed. Basically everything that gets bundled with your software is in scope for this review.

I’ve done similar reviews in Nokia where the legal risks were large enough to justify having a very large legal department that concerned themselves with doing such reviews. They had built tools on top of Lotus Notes to support that job, and there was no small amount of process involved in getting software past them. So, this wasn’t exactly my favorite part of the job. A big problem with these reviews is that software changes constantly and that the review is only valid for the specific combination of versions that you reviewed. Software dependencies change all the time and keeping track of the legal stuff is a hard problem and requires a lot of bookkeeping. This is tedious and big companies get themselves into trouble all the time. E.g. Microsoft has had to withdraw products from the market on several occasions, Oracle and Google have been bickering over Android for ages, and famously Sco ended up suing world + dog over code they thought they owned the copyright of (Linux).

Luckily there’s a new Berlin based company called Versioneye that makes keeping track of dependencies very easy. Versioneye is basically a social network for software. What it does is genius: it connects to your public or private source repositories (Bitbucket and Github are fully supported currently) and then picks apart your projects to look for dependencies in maven pom files, bundler Gemfiles, npm, bower and many other files that basically list all the dependencies for your software. It then builds lists of dependencies and transitive dependencies and provides details on the licenses as well. It does all this automatically. Even better, it also alerts you of outdated dependencies, allows you to follow specific dependencies, and generally solves a lot of headaches when it comes to keeping track of dependencies.

I’ve had the pleasure of drinking more than a few beers with founder Robert Reiz of Versioneye and gave him some feedback early on. I was very impressed with how responsive he and his co-founder Timo were. Basically they delivered all the features I asked for (and more) and they are constantly adding new features. Currently they already support most dependency management tooling out there so chances are very good that whatever you are using is already supported. If not, give them some feedback and chances are that they add it if it makes sense.

So, when the time came to do the Localstream due diligence, using their product was a no-brainer and it got the job done quickly. Versioneye gave me a very detailed overview of all Localstream dependencies across our Java, ruby, and javascript components and made it trivially easy to export a complete list of all our dependencies, versions, and licenses for the Localstream due diligence.

Versioneye is a revolutionary tool that should be very high on the wish list of any software architect responsible for keeping track of software dependencies. This is useful for legal reasons but also a very practical way to stay on top of the tons of dependencies that your software has. If you are responsible for any kind of commercial software development involving open source components, you should take a look at this tool. Signup, import all your Github projects and play with this. It’s free to use for open source projects or to upload dependency files manually. They charge a very reasonable fee for connecting private repositories.

Localstream and Linko

A few weeks ago Linko issued a press release that basically stated that they acquired Localstream (i.e. the company me and my friend Mark founded last year), had gotten some funding (2.6M $), and were now accepting customers. The Localstream acquisition means that Localstream ceases to exist and that its technology and people (i.e. me and Mark) are now part of Linko. We have big plans with Linko and of course part of that will be some level of reuse of the Localstream assets.

That should have ended about nearly three months of radio silence and I should have celebrated with a blog post here on this topic. The reason I didn’t is basically that I’ve been working my ass off for Linko in the past few weeks. Working there is great fun and there is a lot to work on. So, that leaves very little time for updating my blog.

But lets rewind a little. Basically, the story is that me and Mark left Nokia in the summer of 2012 to start our own company. Localstream was our shot at fixing the location based web, which remains somewhat broken, as I outlined a few months ago. Localstream’s solution is very simple: the web is made of links and the location based web should not be different. Instead of coordinates, we use links to locations and instead of radius search we use an algorithm inspired by Google’s page-rank that sorts by relevance to a location based on how it is linked to the location and how the locations are linked to each other, rather than merely by proximity measured in meters. Rather than geocoding content, we instead location tag it using Flickr style rel=tag links in the content. The link decouples the content from the coordinate. Therefore, the location meta data behind the link can evolve separately from the content.

The mediahackday demo we did in October showcased this approach and combined it with entity recognition to automatically extract textual references to locations from content and then disambiguate those to links to actual locations in our location graph. The result was a searchable archive of news that could produce lists of article for streets, neighborhoods, cities, restaurants, etc. sorted by their relevance to that location.

The reason we did this demo was not because we were planning to become a news company. This use case is potentially very interesting both from a business angle and a content angle. However, for us it was merely a technology show case. Localstream was always a technology company and never about the use cases enabled by our technology. There are a ton of startups that do this the other way around and focus on the UX first and technology second. We figured that with our technology we could help some of those companies out and maybe stumble on a way to monetize. We had a pretty clear idea of what we wanted to build technically and for that reason we sidestepped the issue who was actually going to use it.

My experience is that if you focus on your strengths, good things can happen. Our strength in Localstream was always the technology. By summer 2013 we had the platform in private beta and by October we were looking for partners and investors and had generated several leads that would have allowed us to engage in some interesting projects, consulting, and possibly funding. We had given ourselves to the end of the year, which was when things would start to get more awkward financially for the both of us.

Then a good thing happened and we met Mikko Alasaarela, the CEO of Linko, at Bubble over Berlin, a satellite event of Techcrunch Disrupt. Mikko is a great guy and I admire him for being brutally honest and opinionated. He doesn’t hold back and he told us something along the lines of “I love your technology but your business model sucks”, which is exactly what we needed to hear. Then we started chatting about what Linko does and how Localstream could add value to that. We continued that dialog over the next few days. We were invited to Linko’s shiny new office penthouse on Torstrasse, and showcased Localstream to the Linko team. We discovered that we had a huge amount in common in terms of technical vision, technology stack, and most importantly about being ambitious and thinking big. In short, we liked each other.

Linko’s CRM product works very differently from current products in the market. Instead of filling in reports and “submitting” those using some tedious enterprise application, Linko taps into what sales people do on their mobile phone and in the cloud when they are doing their job. It’s a free consumer app that happens to be highly valuable for business use. Linko’s Android, IOS, and Windows Phone apps provide deep integration with a wide range of other applications and cloud services used by sales people such as email, calendar, document sharing, social networks, as well as native phone and text functionality. The activity in all those tools is gathered by the application and used to provide real-time and highly accurate and valuable insight for sales teams. All the user needs to do is do what they would do anyway: use their phone to sell whatever it is they are selling using whatever tools they prefer to use. Linko does the rest. It connects the dots and reports on sales activity, funnel status, and loads of other things. It’s a very bold vision and doing this at a global scale is exactly the kind of thing that gets a technology minded person like me excited. It involves massive amounts of data and requires a lot of the same technology that we were using or planning to use at Localstream as well.

Location and location references are of course increasingly important on mobile given the mass distribution of smart phones with GPS that know where their users are most of the time and given the huge amount of services that make use of this. Additionally there is a lot of location data embedded in many of the tools that Linko connects with (e.g. addresses, company locations, calendar event locations, gps coordinates etc.). So, the Localstream technology is ideally suited for disambiguating and utilizing that information to e.g. allow CRM reports to be sliced and diced by location and act on information related to proximity to potential leads, customers, sales points, etc. In short, Localstream can add a ton of value to CRM (and other enterprise applications) and help create something that doesn’t exist today: location based business applications.

So, we parted ways with Mikko after having had a pretty interesting meeting and we thought we had another potential lead for showcasing our technology and possibly some minor revenue. Then Mikko completely surprised us by basically thinking big instead of small. He saw a great fit between our team and theirs and figured that a location based CRM would never happen unless he acquired us. So, he did that. Adding location to CRM adds a killer feature to what is otherwise already a pretty compelling proposition and therefore it makes a lot of sense. So, instead of half committing to doing some project together, he proposed to acquire Localstream.

Linko has already proven to be a great place to work for us in the past few months. It’s a dream job for a guy like me. It’s a complete greenfield approach to building a global company that is going to have to scale very rapidly. You don’t get such opportunities very often and I love doing this stuff. The people in Linko have a very diverse set of skills and it’s a fantastic team to work in. It’s also a remarkably complete and experienced team. We have great app developers, a very solid machine learning and AI developer, a sales team that is delivering customers faster than we can handle, and an absolutely rock solid business case. And it turns out that me and Mark fit right in complementing those skills with our own skills related to automating deployments, elastic search, big data, leading development teams, front end development, etc.

Bitcoin: cutting through the hype

Bitcoin has had a lot of publicity over the past year and has gone through several cycles of its value fluctuating wildly. This has given reason to some to discard it as something doomed to fail eventually and the people who have invested money in it to lose their all money. However, I believe that the current turbulence in the bitcoin world is more akin to the gold rush of the nineteenth century than to the tulip bubble in my home country in the seventeenth century. Ultimately tulips became worthless and gold has been a stable factor in the world’s economy ever since people learned how to melt it.

Tulips were rare flowers that only grew in particular regions (Turkey) and consequently, they were somewhat rare in seventeenth century Amsterdam where the stock market had just been invented. Fortunes were made and lost by people trading bits of paper and they were essentially gambling on the success or failure of trade missions to the far east, the prices of common goods, and indeed tulips. Basically the value of tulips continued to increase until finally somebody figured out how to grow them in large quantities near Amsterdam and they ceased to be a rare good. At that point the price collapsed and the first stock crisis happened.

Very interesting, but Bitcoin is fundamentally very different from tulips. By design it is more similar to gold and consequently what we are currently seeing is more akin to the gold rush than to the tulip crisis. Basically, during the gold rush, people and economy flocked to regions where gold was rumored to be available in large quantities. This resulted in a lot of economic activity that ultimately collapsed when it became clear there was little gold to be found. Gold remained scarce and kept its value.

Like bitcoin, gold has very little practical utility beyond some niche applications in electronics and as a way of decorating wealthy people with a severe lack of good taste. Its value is primarily based on the notion of scarcity. The notion of money is based on trading goods and services (i.e. stuff that results from economic activity) for something abstract that has the properties of being scarce and somewhat stable in value. Actually, the seventeenth century is quite illustrative regarding this as well as the discovery of large quantities of silver in South America by the Spanish wreaked havoc on the value of the associated coinage.

My favorite author Neal Stephenson has written several novels that have currency as a central theme. I’m a big fan of his work and would recommend dedicating substantial amounts of time to digesting his oeuvre to anyone who is not afraid of picking up novels of around a thousand pages each. His Baroque cycle, a historical science fiction trilogy set in the seventeenth century, is all about emergence of many modern economic concepts such as bank notes, stock markets, etc. The Cryptonomicon is partly about cryptographic currency backed by gold (sounds familiar?) and the both the Diamond Age and Snow crash feature bitcoin like digital currency that has largely displaced conventional currency and in the process also has transformed society as governments no longer control the flow of money and consequently are starved of taxes. Perhaps these latter two novels are the best way to think of bitcoin as it is obviously influenced by these novels (and similar ones) and because it has been explicitly designed to disrupt the monetary system in a similar way.

Perhaps it is better to step back and consider this design. Bitcoin is all about artificial scarcity. First of all there is a finite amount of bitcoin: there are only 21 million of them and not all of them have been ‘mined’ yet. This number is important in the sense that it is very different from e.g. dollars of which there are trillions in circulation and of which more are printed constantly. This cannot happen with bitcoin. It’s more similar to gold than to dollars in the sense that there is a finite amount of gold on this planet of which we have harvested most of the easily accessible bits by now.

Bitcoin mining is the process by which bitcoins come into existence and it is the source of much confusion and bullshit in the media. The best way to think about it is as solving hard mathematical problems that require increasing amounts of computer infrastructure to resolve. People willing to invest in such infrastructure can harvest fresh bitcoin, until it runs out of course. The process of mining is similarly hard to mining for gold and gets progressively harder, just like finding new supplies of gold in the earth is quite hard. This is by design. The technicalities of this mining process are a lot harder to understand than digging for gold but fundamentally it is the same, down to the notion of people making good money of selling the proverbial shovels (i.e. computer hardware). The important thing to realize is that there is a finite amount of it and once that has been mined, we’ll have to do with 21 billion bitcoins.

Assuming the many experts that have vetted the technicalities are right, there are no known ways to compromise either the mining process or the hard cap of 21 million bitcoins. Only compromising the technology would cause the bitcoin bubble to burst. Assuming that is impossible (and that’s the big question), the bubble won’t burst and bit coin could be here to stay.

So, lets look at its value. The big news this week was that bit coin has crossed the 1000$ valuation mark. That sounds great for people who bought it in the days that it was worth mere dollars and there are indeed plenty of stories of people who have made (or lost) a fortune in bit coin. Given the hard cap of 21 million bitcoins, the current size of the bitcoin economy is around 10 Billion when measured in dollars (half of mined 21 billion bitcoins). That’s not a lot considering the total size of the world economy is in the order of about 80 trillion dollar. And this brings me to the crucial point: if bitcoin starts replacing conventional currency in any significant way, its current valuation is going to be peanuts. Lets round up the world gross product to 100 trillion. Surely, growth, inflation, etc will cause it to be close to that soon anyway and it is a convenient number. Now assume that one thousanth of the world economy is going to use bit coin at some point in the future. That would require the exchange rate to be about ten times it current valuation, i.e. a 100 billion rather than 10 billion or a nice 10000K for a single bitcoin. If you expand that to a single percent of the world economy, the exchange rate would be closer to 100K. I could see that happen, all it takes is more people using it.

People investing in bitcoin as a way to get rich are speculating on the growth of the bit coin economy proportional to the rest of the economy. There are a lot of question marks around bitcoin still that have to do with legislation, government control, etc. My view on this is simple: there’s nothing stopping national banks keeping stashes of bitcoin around in the same way that they are keeping gold around to back the value of their currencies (or at least pretend to do so). Indeed, assuming they would do that, it would vastly boost the legitimacy of bitcoin as a currency and thus help to ensure that the value of its economy increases. Ultimately the value of money is a function of economic activity. So this should lead to some notion of stability in the bitcoin world. The current bitcoin market is volatile because there are very few bitcoins around and a handful of players have disproportionally large amounts of them. These few players can influence the supply and demand and there are some strong hints that their speculative behavior is currently causing these wild fluctuations in valuation.

Consider for a moment the hypothetical situation that the US federal reserve was robbed by a bunch of evil crooks who could destroy the economy at will simply by buying/selling gold (which is what national banks do on a pretty large scale). That’s essentially the plot of Goldfinger. And it is sort of the equivalent of shady characters getting their hands on and deciding to buy/sell bitcoin in large quantities. That’s both disruptive and disturbing. National banks wielding such power is one thing but at least they are somewhat kept in check by their governments (which in some cases can be pretty shady as well) but leaving such power to criminals, drug dealers, etc. might not be in the interest of society. So, a big problem would be somebody owning a 1000000 bitcoins today, which is a small fortune today already but insignificant to the economy as a whole. Suppose this person just sits on this stash while bitcoin revolutionizes the economy as we know it over the next few decades or so. Fast forward to 2030 and this person, who is now a very rich billionaire (equivalent of Bill Gates), decides to spend/sell all of it. You can cause quite a bit of trouble spending billions of cash. However, mostly people hoarding cash is bad for economies and people spending it is actually good since it increases economic activity. Ultimately it is like having a huge stash of gold and suddenly deciding to spend it. It would temporarily destabilize the value of gold but given that gold continues to be scarce overall, it would ultimately recover as well. In the end people like Bill Gates have little interest in destabilizing the economy and arguably his spending behavior is actually a lot more beneficial for society than e.g. the US spending behavior in the middle east. Bad things could happen but bad things happen today as well. That’s simply the nature of power (and money is power).

So, I see very little reason to be afraid of bitcoin. I believe the current financial system manages to be corrupt and evil without bitcoin just fine. Ultimately wealth is a reflection of economic power and activity. If bitcoin is successful, it will naturally accumulate with the same people who control our gold supplies today. So in that sense nothing will change. What will change is the role of governments and taxation. Basically governments tax wealth and income. Bitcoin won’t change that. But the way they tax will have to evolve. In the old days. The tax man would come and literally collect the coinage and you had little choice but to hand it over. Nowadays, taxation is based on declared income, VAT, etc. which basically boils down to the government taking a percentage of income of companies and individuals and being very anal about auditing their books. In the bitcoin world, transactions are both auditable and anonymous. They are anonymous in the sense that they are not explicitly tied to people (currently). However, auditing is built in to bitcoin in the sense that each transaction is an event that is connected to all the transactions that came before that. All the government needs to know is which transactions originated from who and when, which is simply something they can demand people to declare. Hiding certain transactions is of course possible but it would be impossible to spend bitcoin without disclosing its transaction history (i.e. where did you get it). It’s similar to a bankrobber trying to spend stolen hundred dollar bills with known serial numbers: kind of risky. I wouldn’t be surprised to see taxes evolve to utilize this bitcoin feature.

Bitcoins and similar cryptographic currencies solve some real problems. They make banks obsolete as a means to conduct transactions. Bitcoin transactions require no middle men. That’s a good thing in my view since the current banking system (here in Germany particular) is very inefficient and I see little reason for having to rely on them in the future. This convenience will drive adoption. Ultimately this is how gold became valuable as well: it was more convenient to hand out small gold coins than e.g. a truck load of manure or ten goats. Ultimately, the value is proportional to the amount it is used in economic transactions. Bitcoin could merely be the first of many digital currencies. In that case, it might never become a dominant currency and you’d be ill advised to bet on its value increasing. On the other hand, it’s good enough now and might benefit from early adoption. Probably a lot of countries currently dependent on overly inflated currency would be better off depending on bitcoin and there are some signs of people in such countries actually using bitcoin quite heavily already. On the other hand, bitcoin could prove to be a lot more stable than bits of paper with the former US president’s faces printed on them.

I certainly wouldn’t mind owning a few percents of a percent of sizable chunk of the world’s economy. But I’m too risk averse to bother with speculating. Ultimately, I didn’t do a startup during the dotcom bubble (fresh out of college at the time), I didn’t buy shares in alternative energy companies a few years ago when I honestly believed the hype around them (most have gone bankrupt) and I didn’t act on my hunch that this bitcoin guys might be onto something last year. I did buy into some worthless pension schemes a few years ago and thankfully never acted on peer pressure to get a mortage and a house in the Netherlands (which would have meant I’d be in debt now on a house that is worth substantially less than a few years ago). I could see myself using electronic currency in the near future though simply because it is convenient. In that case, I’d actually want it to be stable coinage with somewhat predictable and reliable value. Bitcoin can’t provide that yet but it’s arguably a lot better already than some currencies in use today. Owning a hundred bitcoin today would be very stressful because having a 100K $ knowing it could be worth only half that tomorrow would be very stressful as well. What worries me today is the suspicion that my pension is going to be worthless and that the value of my savings is deflating as well.

De-globalization or why local matters

Over the weekend we built a local news app at the Mediahackday in Berlin. The purpose of this event was to find new ways to utilize the back catalogs of media companies such as the Guardian, Axel Springer, and others. This use-case is a perfect fit for the Localstream platform that I have been involved with over the past year. So, we went in and did our thing. My colleague Mark MacMahon wrote up a nice article about this on our Tumblr blog.

One of the things that struck me again while focusing on this particular use-case over the weekend is a phenomenon that has been bouncing around in my head for a while. For lack of a better word, I’d like to call it de-globalization.

As an example of this, consider this advertisement that I spotted in the subway on my way to the Mediahackday venue:

Location based advertising. Very much an offline thing still.

This is an advertisement that promotes the existence of a mobile application. Alexa is a shopping mall in Berlin (near Alexander Platz) and apparently they thought it a good investment to spend money on the development of a mobile application for the people in their mall. Then to tell these people about the existence of this application, they invested in a (presumably) expensive advertising campaign as well. This is location based advertising in the wild. Big money is spent on it and it is mostly an offline business.

In fact most of the economic activity world wide is driven by locals engaging with small and medium enterprises locally. Despite globalization and mega corporations, there is an enormous long tail of very small companies and it is growing. The EU states that:

Small and medium-sized enterprises (SMEs) are the main drivers of job creation, economic growth and social cohesion in Europe. They have local roots, provide local jobs but also exploit the benefits of globalisation. SMEs indeed constitute the dominant category of business organisation in all EU countries with some 23 million enterprises (99.8%); their share in total employment creation is massive (81.6%) as well as their contribution to the EU-GDP (up to 60%).

Alexa is spending what must be a sizable budget for them on bespoke mobile app development and offline advertising for the resulting app. That strikes me as a particularly expensive and ineffective way of promoting themselves. Despite ongoing globalization, massive growth in online channels, and widespread adoption of internet, Alexa is forced to go offline to address their audience: Berlin locals. And just like other locals, they are finding it difficult.

The reason for this is that existing online channels for the likes of Alexa to promote themselves in or for people to discover Alexa’s mobile application, and other content lack a local focus. Where do people living near the Alexa mall go to learn about what is happening around them? There’s no such thing.

People in the industry have been talking about location based services and associated revenue streams for ages. But one glance at the advertisement above makes it very clear that despite this, local is still very much an offline business for most of the locals. This applies to commerce but also to other things. What is happening around my house, on my street, in my neighborhood and in my city? Who is writing about my area and what are they saying about it? What events are on and what cool historical facts can I find out about my area? The online answer today involves search engines and a lot of hard work filtering through blogs, wikipedia, event sites, social media, location based services, etc. Because that is so impractical, nobody bothers and consequently Alexa has to spend big money on subway advertising just to tell people that there is an app that they are very excited about.

At Localstream we want to change this and enable locals go online to engage with each other locally to share news, knowledge, and other information about their area. Through the Localstream platform, we can filter content by location and provide a view of the available content specific to where it is about as opposed to what it is about (search engines) or whom it is about (social networks).

Localstream de-globalizes the internet. The internet is full of location relevant information ranging from venue specific applications such as the Alexa app above, local news about the area, historical facts, events, etc. However, the existing channels for this content rely on people stumbling upon this content in search engines or social media. With Localstream, you can stumble upon it by location.

At Mediahackday we specialized our concept for news and tried to turn the back catalogs of news organizations such as the Guardian and Axel Springer into a location browsable channel. While we definitely still have some challenges with respect to our ability to tag the content correctly and rank accordingly, the raw value of this was immediately obvious when we started browsing the news content about Berlin.

Living in Berlin I of course entered my street name (Bergstrasse) as a search criteria:

Screen Shot 2013-10-07 at 18.34.41

As you can see here, Localstream found some news content that isn’t about my street but does mention nearby streets and venues. For example, Tucholskystrasse and Ackerstrasse, which are both near my street (first hit). So, despite the fact that none of these articles were marked up with coordinates, Localstream is able to recognize the street name and deduce that articles that mention a nearby streets are relevant to my location.

Now bearing in mind that we hadn’t seen the content before and that our location graph, ranking and location tagging are very much works in progress, this is a pretty good result. We were able to ingest content we hadn’t seen before that lacked any structural location information and turn it in a location browsable news application in under 24 hours. We believe we are just scratching the surface here in terms of what is possible.

Httpclient 4.3 with FutureRequestExecutionService

A while ago, I contributed some of my github code to Apache Httpclient, which is now out in a new 4.3 GA release with my contributed functionality. If you use Java and make http requests, you probably already use httpclient (or something similar). If not, you might want to try it. Anyway, now you can wrap your Httpclient requests with Futures.

This is very useful for any server that needs to make multiple requests in one transaction. Using futures you can concurrently schedule the requests and use a timeout to guarantee that one rogue request doesn’t end up blocking your response for too long.

Read the documentation for FutureRequestExecutionService here or download httpclient 4.3.