Mobile Coverage according to Deutsche Bahn

Yesterday I was traveling by train and it struck me how poor connectivity is in Germany. Pretty much when traveling from Berlin to Hengelo (first stop across the border in NL), I typically plan to have no coverage whatsoever for what I guestimate is at least 80% plus of the trip. Apperently in places like Bad Bentheim, Rheine, and Osnabruck it is normal to have little or no coverage, even when the train stops on the damn railway station.
I found this nice tweet in my twitter feed this morning mentioning that Deutsche Bahn is providing some nice open data files. One of these files maps coverage for the different mobile providers in Germany along the rail tracks. I downloaded the file and did some very low tech analysis on the file basically taking their stability metric and finding the number of non zero values for each provider using a bit of old school command line voodoo.

# metrics with non 0 value data points (higher is better)

ip-10-0-1-28:~ $ cat connectivity_2015_09.geojson | grep o2_stability | grep  -E -v '0,$' | wc -l
    2851
ip-10-0-1-28:~ $ cat connectivity_2015_09.geojson | grep t-mobile_stability | grep  -E -v '0,$' | wc -l
    6089
ip-10-0-1-28:~ $ cat connectivity_2015_09.geojson | grep e-plus_stability | grep  -E -v '0,$' | wc -l
    2743
ip-10-0-1-28:~ $ cat connectivity_2015_09.geojson | grep vodafone_stability | grep  -E -v '0,$' | wc -l
    4511

# metrics with 0 value data points (lower is better)
ip-10-0-1-28:~ $ cat connectivity_2015_09.geojson | grep o2_stability | grep  -E  '0,$' | wc -l
   20876
ip-10-0-1-28:~ $ cat connectivity_2015_09.geojson | grep t-mobile_stability | grep  -E  '0,$' | wc -l
   17638
ip-10-0-1-28:~ $ cat connectivity_2015_09.geojson | grep e-plus_stability | grep  -E  '0,$' | wc -l
   20984
ip-10-0-1-28:~ $ cat connectivity_2015_09.geojson | grep vodafone_stability | grep  -E  '0,$' | wc -l
   19216

As I suspected, O2 is the worst and T-mobile has more than twice the coverage. However, that still amounts to pretty shit coverage since the vast majority of all metrics for all providers is 0. In fact my guestimate was quite accurate and even for t-mobile they have no connection stability for a whopping 70% of the metric points, which I assume are normally distributed along the tracks (if not, it could be worse). For O2, it is more like 85%. The total number of metrics for all providers appears to be roughly the same, which suggest that the numbers should be comparable.

Wtf Germany? Please fix your infrastructure and stop being a digital backwater.

Accessing Elasticsearch clusters via a localhost node

I’m a regular at the Elasticsearch meetup here in Berlin and there are always lots of recent converts that are trying to wrap their head around the ins and outs of what it means to run an elasticsearch cluster. One issue that seems to baffle a lot of new users is the question of which node in the cluster has the master role. The correct answer is that it depends on what you mean by master. Yes, there is a master node in elasticsearch but that does not mean what you think it means: it merely means that a single node is elected to be the node that holds the truth about which nodes have which data and crucially where the master copies of shards live. What it does NOT mean is that that node has the master copy of all the data in the cluster. It also does NOT mean that you have to talk to specifically this node when writing data. Data in elasticsearch is sharded and replicated and shards and their replicas are copied all over the cluster and out of the box clients can talk to any of the nodes for both read and write traffic. You can literally put a load balancer in front of your cluster and round robin all the requests across all the nodes.

When nodes go down or are added to the cluster, shards may be moved around. All nodes synchronize information about which nodes have which shards and replicas of those shards. The elasticsearch master merely is the ultimate authority on this information. Elasticsearch masters are elected at runtime by the nodes in the cluster. So, by default, any of the nodes in the cluster can become elected as the master. By default, all nodes know how to look up information about which shards live where and know how to route requests around in the cluster.

A common pattern in larger cluster is to reserve the master role for nodes that do not have any data. You can specialize what nodes do via configuration. Having three or more such nodes means that if one of them goes down, the remaining ones can elect a new master and the rest of the cluster can just continue spinning. Having an odd number of nodes is a good thing when you are holding elections since you always have an obvious majority of n/2 + 1. With an even number you can end up with two equally sized network partitions.

The advantage of not having data on a node is that it is far less likely for such nodes to get into trouble with e.g. OutOfMemoryExceptions, excessive IO that slows the machine, or excessive CPU usage due to expensive queries. If that happens, the availability of the node becomes an issue and the risk emerges for bad things to happen. This is a bad thing on a node that is supposed to hold the master data for your cluster configuration. It becoming unavailable will cause other nodes to elect a new node as the maste. There’s a fine line between being unavailable and slow to respond, which makes this a particularly hard problem. The now, infamous call me maybe article highlights several different cluster failure scenarios abd most of these involve some sort of network partioning due to temporary master node failures or unavailability. If you are worried about this, also be sure to read the Elasticsearch response to this article. The botton line is that most of the issues have by now been addressed and are now far less likely to become an issue. Also, if you have declined to update to Elasticsearch 1.4.x with your production setup, now might be a good time to read up on the many known ways in which things can go bad for you.

In any case, empty nodes still do useful work. They can for example be used to serve traffic to elasticsearch clients. Most things that happen in Elasticsearch involve internal node communication since the data can be anywhere in the cluster. So, there are typically two or more network hops involved one from the client to what is usually called a routing node and from there to any other nodes that hold shards needed to complete the request that perform the logic for either writing new data to the shard or retrieving data from the shard.

Another common pattern in the Elasticsearch world is to implements clients in Java and make the embedd a cluster node inside the process. This embedded node is typically configured to be a routing only node. The big advantage of this is that it saves you from having to do one network hop. The embedded node already knows where all the shards live so application servers with an embedded node already know where all the shards are in the cluster and can talk directly to the nodes with these shards using the more efficient network protocol that the Elasticsearch nodes use to communicate with each other.

A few months ago in one of the meetups I was discussing this topic with one of the organizers of the meetup, Felix Gilcher. He mentioned an interesting variant of this pattern. Embedding a node inside an application only works for Java nodes and this is not possible if you use something else. Besides, dealing with the Elasticsearch internal API can be quite a challenge as well. So it would be convenient if non Java applications could get similar benefits. Then he suggested the obvious solution that actually you get most of the same benefits of embedding a node simply running a standalone, routing only elasticsearch node on each application server. The advantage of this approach is that each of the application servers communicates with elasticsearch via localhost, which is a lot faster than sending REST requests over the network. You still have a bit of overhead related to serializing and deserializing requests and doing the REST requests. However, all of that happens on localhost and you avoid the network hop. So, effectively, you get most of the benefits of the embedded node approach.

We recently implemented this at Inbot. We now have a cluster of three elasticsearch nodes and two application servers that each run two additional nodes that talk to the three nodes. We use a mix of Java, Javascript and ruby components on our server and doing this allows us to keep things simple. The eleasticsearch nodes on the application server have a comparatively small heap of only 1GB and typically consume few resources. We could probably reduce the heap size a bit further to 512MB or even 256MB since all these nodes do is pass around requests and data from the cluster to the application server. However, we have plenty of memory and have so far had little need to tune this. Meanwhile, our elasticsearch cluster nodes run on three fast 32GB machines and we allocate half of the memory for heap and reserve the rest for file caching (as per the Elasticsearch recommendations). This works great and it also simplifies application configuration since you can simply configure all applications to talk to localhost and elasticsearch takes care of the cluster management.

Versioneye

During our recent acquisition, we had to do a bit of due diligence as well to cover various things related to the financials, legal strucuturing, etc. of Localstream. Part of this process was also doing a license review of our technical assets.

Doing license reviews is one of those chores that software architects are forced to do once in a while. If you are somewhat knowledgable about open source licensing, you’ll know that there are plenty of ways that companies can get themselves in trouble by e.g. inadvertently licensing their code base under GPLv3 simply by using similarly licensed libraries, violating the terms of licenses, reusing inappropriately licensed Github projects, etc. This is a big deal in the corporate world because it exposes you to nasty legal surprises. Doing a license review basically means going through the entire list of dependencies and transitive dependencies (i.e. dependencies of the dependencies) and reviewing the way these are licensed. Basically everything that gets bundled with your software is in scope for this review.

I’ve done similar reviews in Nokia where the legal risks were large enough to justify having a very large legal department that concerned themselves with doing such reviews. They had built tools on top of Lotus Notes to support that job, and there was no small amount of process involved in getting software past them. So, this wasn’t exactly my favorite part of the job. A big problem with these reviews is that software changes constantly and that the review is only valid for the specific combination of versions that you reviewed. Software dependencies change all the time and keeping track of the legal stuff is a hard problem and requires a lot of bookkeeping. This is tedious and big companies get themselves into trouble all the time. E.g. Microsoft has had to withdraw products from the market on several occasions, Oracle and Google have been bickering over Android for ages, and famously Sco ended up suing world + dog over code they thought they owned the copyright of (Linux).

Luckily there’s a new Berlin based company called Versioneye that makes keeping track of dependencies very easy. Versioneye is basically a social network for software. What it does is genius: it connects to your public or private source repositories (Bitbucket and Github are fully supported currently) and then picks apart your projects to look for dependencies in maven pom files, bundler Gemfiles, npm, bower and many other files that basically list all the dependencies for your software. It then builds lists of dependencies and transitive dependencies and provides details on the licenses as well. It does all this automatically. Even better, it also alerts you of outdated dependencies, allows you to follow specific dependencies, and generally solves a lot of headaches when it comes to keeping track of dependencies.

I’ve had the pleasure of drinking more than a few beers with founder Robert Reiz of Versioneye and gave him some feedback early on. I was very impressed with how responsive he and his co-founder Timo were. Basically they delivered all the features I asked for (and more) and they are constantly adding new features. Currently they already support most dependency management tooling out there so chances are very good that whatever you are using is already supported. If not, give them some feedback and chances are that they add it if it makes sense.

So, when the time came to do the Localstream due diligence, using their product was a no-brainer and it got the job done quickly. Versioneye gave me a very detailed overview of all Localstream dependencies across our Java, ruby, and javascript components and made it trivially easy to export a complete list of all our dependencies, versions, and licenses for the Localstream due diligence.

Versioneye is a revolutionary tool that should be very high on the wish list of any software architect responsible for keeping track of software dependencies. This is useful for legal reasons but also a very practical way to stay on top of the tons of dependencies that your software has. If you are responsible for any kind of commercial software development involving open source components, you should take a look at this tool. Signup, import all your Github projects and play with this. It’s free to use for open source projects or to upload dependency files manually. They charge a very reasonable fee for connecting private repositories.

De-globalization or why local matters

Over the weekend we built a local news app at the Mediahackday in Berlin. The purpose of this event was to find new ways to utilize the back catalogs of media companies such as the Guardian, Axel Springer, and others. This use-case is a perfect fit for the Localstream platform that I have been involved with over the past year. So, we went in and did our thing. My colleague Mark MacMahon wrote up a nice article about this on our Tumblr blog.

One of the things that struck me again while focusing on this particular use-case over the weekend is a phenomenon that has been bouncing around in my head for a while. For lack of a better word, I’d like to call it de-globalization.

As an example of this, consider this advertisement that I spotted in the subway on my way to the Mediahackday venue:

Location based advertising. Very much an offline thing still.

This is an advertisement that promotes the existence of a mobile application. Alexa is a shopping mall in Berlin (near Alexander Platz) and apparently they thought it a good investment to spend money on the development of a mobile application for the people in their mall. Then to tell these people about the existence of this application, they invested in a (presumably) expensive advertising campaign as well. This is location based advertising in the wild. Big money is spent on it and it is mostly an offline business.

In fact most of the economic activity world wide is driven by locals engaging with small and medium enterprises locally. Despite globalization and mega corporations, there is an enormous long tail of very small companies and it is growing. The EU states that:

Small and medium-sized enterprises (SMEs) are the main drivers of job creation, economic growth and social cohesion in Europe. They have local roots, provide local jobs but also exploit the benefits of globalisation. SMEs indeed constitute the dominant category of business organisation in all EU countries with some 23 million enterprises (99.8%); their share in total employment creation is massive (81.6%) as well as their contribution to the EU-GDP (up to 60%).

Alexa is spending what must be a sizable budget for them on bespoke mobile app development and offline advertising for the resulting app. That strikes me as a particularly expensive and ineffective way of promoting themselves. Despite ongoing globalization, massive growth in online channels, and widespread adoption of internet, Alexa is forced to go offline to address their audience: Berlin locals. And just like other locals, they are finding it difficult.

The reason for this is that existing online channels for the likes of Alexa to promote themselves in or for people to discover Alexa’s mobile application, and other content lack a local focus. Where do people living near the Alexa mall go to learn about what is happening around them? There’s no such thing.

People in the industry have been talking about location based services and associated revenue streams for ages. But one glance at the advertisement above makes it very clear that despite this, local is still very much an offline business for most of the locals. This applies to commerce but also to other things. What is happening around my house, on my street, in my neighborhood and in my city? Who is writing about my area and what are they saying about it? What events are on and what cool historical facts can I find out about my area? The online answer today involves search engines and a lot of hard work filtering through blogs, wikipedia, event sites, social media, location based services, etc. Because that is so impractical, nobody bothers and consequently Alexa has to spend big money on subway advertising just to tell people that there is an app that they are very excited about.

At Localstream we want to change this and enable locals go online to engage with each other locally to share news, knowledge, and other information about their area. Through the Localstream platform, we can filter content by location and provide a view of the available content specific to where it is about as opposed to what it is about (search engines) or whom it is about (social networks).

Localstream de-globalizes the internet. The internet is full of location relevant information ranging from venue specific applications such as the Alexa app above, local news about the area, historical facts, events, etc. However, the existing channels for this content rely on people stumbling upon this content in search engines or social media. With Localstream, you can stumble upon it by location.

At Mediahackday we specialized our concept for news and tried to turn the back catalogs of news organizations such as the Guardian and Axel Springer into a location browsable channel. While we definitely still have some challenges with respect to our ability to tag the content correctly and rank accordingly, the raw value of this was immediately obvious when we started browsing the news content about Berlin.

Living in Berlin I of course entered my street name (Bergstrasse) as a search criteria:

Screen Shot 2013-10-07 at 18.34.41

As you can see here, Localstream found some news content that isn’t about my street but does mention nearby streets and venues. For example, Tucholskystrasse and Ackerstrasse, which are both near my street (first hit). So, despite the fact that none of these articles were marked up with coordinates, Localstream is able to recognize the street name and deduce that articles that mention a nearby streets are relevant to my location.

Now bearing in mind that we hadn’t seen the content before and that our location graph, ranking and location tagging are very much works in progress, this is a pretty good result. We were able to ingest content we hadn’t seen before that lacked any structural location information and turn it in a location browsable news application in under 24 hours. We believe we are just scratching the surface here in terms of what is possible.

Leaving Nokia

Seven years ago, I joined Nokia Research Center in Helsinki, Finland to pick up my research career again after a few years of working as a software engineer. Since then, I’ve been working on many different things but all somehow related to location-based services. Three and a half years ago I moved to Berlin to continue in what has recently been re-labeled as Nokia’s Location & Commerce unit where I continued to work on location-based services but this time in a more technical role. I’ve had loads of fun building some of the core server components in Nokia’s places and search products; I’ve had the pleasure to work with a lot of very smart and talented people with a lot of different backgrounds; and I got to work with all sorts of interesting technologies and server-side infrastructure.

The last seven years were fun, very intense, and I learned a lot. I joined Nokia at the peak of its success. And while I have witnessed the steady erosion of market share and related changes over the past few years, I consider myself lucky to have been in Berlin where despite the relentless re-organizations and re-structurings elsewhere in Nokia, things have stayed positive and optimistic. Today’s Location & Commerce unit is in a great shape for the future and has some really solid and exciting products and services. I am genuinely proud to have been a part of that.

But now the time has come for me to move on and seek new challenges and I will leave Nokia at the end of November. I’ll still be in Berlin, which has a very lively and interesting start-up scene, and I will be looking there and in other places for new exciting things for me to work on next.

In the past few days, I’ve been busy polishing my CV and linkedin profile. If you know me, feel free to connect there; or if you don’t contact me via linkedin or one of the other ways listed on my contact page.

Photos Nov 2008 -Now

I finally found some time to upload some photos.

I went to Berlin in November to apply for the job I now have. Then I spent Christmas in France. After that, I moved into a temporary flat in Berlin on February 1st. On my first visit back in Finland, I visited a friend in Espoo who lives close to the sea, which was frozen. Finally I took some nice photos of Berlin in my first few weeks here.

The nicest of which is the view I had from my temporary flat:

View from temporary apartment

I no longer live there and photos of my new place are coming soon. I probably will take a few at my upcoming house warming party: Friday 17th from 21:00, feel free to drop by if you are near.

Time for a little update

Hmm, it’s been more than two months since I last posted. Time for an update. A lot has happened since January.

So,

  • I moved out of Finland as planned.
  • I stayed in a temporary apartment for a month. Central-home is the company managing the facility where I lived (on Habersaathstrasse 24) and if you’re looking for temporary housing in Berlin, look no further.
  • I managed to find a nice apartment for long term in Berlin Mitte, in the Bergstrasse, which is more or less walking distance from tourist attractions like Alexanderplatz, Hackeschermarkt, Friedrichstrasse and of course the Brandenburger Tor.
  • I re-aquainted myself with Java, Java development, and lately also release management. Fun days of hacking but the normal Nokia routine of meetings creeping into my calendar is sadly kicking in.
  • I learned tons of new stuff
  • Unfortunately German is not yet one of those things. My linguistic skills are ever pathetic and English remains the only foreign language I ever managed to master more or less properly. On paper German should be dead easy since I can get by mumbling in my native language and people can still figure out what I want. In practice, I can understand it if spoken slowly (and clearly). Speaking back is challenging.
  • I’m working on it though, once a week, in a beginners class. Relearning stuff that 3 years of trying to stuff German grammar in my head in High-school did not accomplish.

Moving is tedious and tiresome. But the end result is some genuine improvement in life. I absolutely love Berlin and am looking forward to an early Spring. I was in a telco with some Finnish people today discussion the weather. They, so how’s Berlin. Any snow there still? Me: no about 20 degrees outside right now :-). Nice to have spring start at the normal time again. Not to mention the more sane distribution of daylight and darkness, throughout the year.

A shitload of updates is overdue. For several months already. I have a ton of photos to upload. WordPress needs upgrading. And some technical stuff might need some blogging about as well. Then there is still some unfinnished papers in the pipeline. So, I’ll be back with more. Some day.

Nearly leaving Helsinki

I realize, I haven’t posted to my blog for quite a while. Part of the reason is that I have been busy organizing my move to Berlin. Another part is that the hosting solution I use for this blog has some scalability problem that I seem to run into whenever I try to do anything. I’ve given up several times on posting. No time to resolve it right now, but I will after my move to Berlin (i.e. adios suckers: I’m going to find more competent hosting). For the same reason, I have been neglecting photos.jillesvangurp.com. There’s some Berlin photos waiting to be uploaded (from my job interview in December) and of course Christmas photos from my parents place in France (again).

Speaking of Berlin, I’m moving there next Sunday already. My plan is to move into a nice furnished apartment on the Habersaathstrasse courtesy of a company specialized in this sort of thing: central-home.de. They were one of a few places recommended by a new colleague (thanks!) who also moved to Berlin recently. The place seems nice enough and the killer feature is that it is 200 meter from work, which should help cut down on those commute times. I’ll use this as a base to find a nice new apartment to move my stuff in. I’ll have to fly back to Helsinki a couple of times in February for work and to get my things moved at the end of the month.

Aside from that, it seems my stay in Helsinki is coming to an end. I had imagined my last week to be a little more fun but as it is, I’m in bed with a pretty bad flu. I’ve been flu free for nearly a year but last Friday it came back with a vengeance. Fever, sore throat, total absence of desire for anything resembling food, tired, etc. In short, flu sucks. In any case, by Friday I should have recovered enough for a little get together with some colleagues in the one pint pub in Ruoholahti (16:00). Feel welcome to drop by if I somehow managed to not invite you.

Moving to Berlin

A bit more than a month ago, I posted a little something on the reorganization in Nokia Research Center where I work and announced my availability on the job market. This was a bit of a shock of course and it has been a hectic few weeks but the end result is really nice. For me at least. Unfortunately some of my colleagues are not so lucky and are now at risk of losing their job.

In any case, a few weeks ago I visited Nokia Gate5 in Berlin for a job interview. Gate5 is a navigation software company that Nokia bought in 2006. Their software is powering what is now known as OVI Maps and whereas the whole industry is shrinking, they are growing like crazy now and rolling out one cool product after another. Today, they sent me a proposal for a contract. Barring contractual details, this means that I will be based in Berlin from February. This is something I’ve known for a few weeks but having all the necessary approvals from Nokia management and a concept contract is about as good as it gets in terms of certainty. So, since I know a few people are curious what I’ll be up to next year, I decided on this little update.

I can’t say too much about what I will do there except that it more or less matches my Java server side interests and experience perfectly. This means back to being a good old Java hacker which is just fine with me and something I’ve not had enough time to focus on lately (much to my annoyance). Just today I submitted an article and I have one or two other things to finish off in January. After that, my research will be put on hold for a while. That’s fine with me as well. After returning to a research career three years ago, I’ve done a few nice papers but to be honest, I’m not enjoying this as much as I used to.

Of course Berlin is a great place to move to. I’ve been there now twice. I remember thinking the first time in 2005 that “hmm, I wouldn’t mind living here” and when I was there three weeks ago I had the same feeling again. It’s a big city with a rich history, nice culture and lots of stuff to see and do. I also learned that this is one of the few cities in Europe where life is actually cheap. Apartment prices, food, drink and all the essentials in life are really affordable there and of excellent quality too.

Anyway, I’ll be off to France the next week visiting my parents.

Happy holidays

Photos Stockholm and Seurasaari

I’ve uploaded two batches of photos to my photo site: photos.jillesvangurp.com from a recent trip to Stockholm and some photos I took from Seurasaari just behind my house.

Here’s a few samples:

My neighbourhood seen from Seurasaari.

This beautiful picture was taken in August when me and my friend Mark who was over for a weekend visited the open air museum on Seurasari which is behind my house on a island connected to the mainland by a bridge. To get to where I took this photo is a very nice 5 KM walk along the coast. In the distance you can see Hietaniemie beach and a few rows of buildings. One of the rooftops behind the electricity pole on the right is where I live.

View of stockholm

The panorama photo above was taken from the Katarinahissen on the photo below during a visit to Stockholm where I spent a weekend with my father. The Katarinahissen connects lower Södermalm to the bit of the city situated on top of the 20 meter hill. Sadly the elevator was out of order so we took the stairs. My father and I have been on several trips in the past few years. We went to Berlin and London in 2005.

Gondolen