The Snowden effect

A lot has been written about the whole Snowden case and some of the NSA practices for spying on us. You could argue all sorts of things about this case. A popular thing that people keep pointing out is that actually, if you know anything about security and the NSA, none of this should come as a surprise. This is indeed a valid point and a lot of crocodile tears are being shed and there is a lot of mock outrage by especially some governments that are trying to distance themselves from the whole affair. Hypocrisy is a rampant.

But it is an interesting point and it actually completely destroys the case against Snowden by the USA. The central point in that case is that Snowden supposedly leaked crucial information that will cause ‘enemies’ to adapt their behavior. In fact, that has already happened years ago. So, yes people knew and have long adapted their behavior. Whether it is terrorists, Chinese activists, or the Russian mafia, they all have learned the hard way to use technology to evade detection and keep their communication private a long time ago.

This brings me to the core point of this post: so should all of us. Snowden has clearly demonstrated that any trail you leave on the internet is subject to archiving and analysis, and may be used against you outside of the usual checks and balances provided by the law of wherever you happen to live. Some of us already knew this, some others thought those people were conspiracy theorists, and now we all know that it is about as bad as these people were saying it was all along.

This brings me to another point some people have been making. It’s the “I have nothing to hide” argument that a lot of people are using. The reasoning is that if you are a law abiding citizen, there is little of interest to discover in your online behavioral patterns so what’s the harm? The fallacy in this argument is that it depends entirely on those that do the analyzing and collecting to respect your rights and generally mean well. This is not the case. Dictators, Nigerian scammers, Terrorists, criminals, and indeed the NSA all largely have the same tools at their disposal to access your data and a wide range of motives for doing so. Your current government may be well behaved but there are no guarantees about the one that comes after. Times change. Besides, you don’t control who does the collecting. If you think the NSA is the only institute currently engaged in data collection you are a fool. The Chinese invented and perfected this game a long time ago. Most authoritarian regimes actively spy on their own citizens as well as foreign nationals using whatever technology is available to them. You may be comfortable with the NSA tracking your communications but what about the KGB, the Chinese secret service, or the Iranian government? You’d be a fool to assume you are safe from them.

So, what can be done about all this? You could argue that we should all turn into paranoid conspiracy theorists and behave accordingly by adopting all sorts of oddball technology ranging from tin foil hats to advanced encryption. This is neither feasible nor practical since tin foil hats are kind of ineffective and encryption is notoriously hard to get right even for people who supposedly know what they are doing. What’s much more practical is to scrutinize internet services for their track record regarding protecting your privacy, applying best practices regarding security, and generally doing the right things. One of the first things that happened after the Snowden case is that several major internet services started lobbying for permission to provide greater transparency on what they had been forced to expose to them. Reason: they don’t want to be caught lying to their customers about what they are doing and what they are not doing.

That is actually interesting. These companies are very worried about alienating their user base and clearly feel that they have an interest in explaining to their users how they go about protecting their privacy. That’s a start. The solution is to take this to the next level. Avoid dealing with companies and services that are known to do the wrong things and instead flock to those companies that do the right thing. The rest is a matter of darwinism: bad companies will be exposed and will adapt or perish.

The Snowden effect will be that doing so will be made a lot easier by a large crowd of people analyzing what different companies are doing with respect to your privacy and sharing their knowledge with others. That means that where some companies have been able to get away with sloppy practices and mildly aggressive tracking (e.g. Facebook and Google), it will be a lot harder for them to continue doing this without risking bad PR.

A second effect will be that the same scrutiny will be applied to politicians. After calling Snowden a traitor, there is now quite widespread support for actually taking some political action to undo some of the legislation that allowed the NSA to do their thing in the first place. Never mind the contradiction of denying the man is a whistle blower and then suddenly being in favor of backing measures that are basically about addressing some of the issues that the man exposed. Flip flopping like that is just business as usual for politicians. But whistle blower or not, it is already having a political effect. This will extend into elections, cause future scandals, and have political consequences for those that continue backing the wrong things.

The long term Snowden effect will be accountability. This is exactly what is needed.

Jruby & UTF-8

I just spent a day trying to force jruby to do the right things with UTF-8 content. Throughout my career, I’ve dealt with UTF-8 issues on pretty much any project I’ve ever touched. It seems that world+dog just can’t be trusted to do the right things when it comes to this. If you ever see mangled content in your browser: somebody fucked it up.

Lately, I’ve been experiencing issues both with requests and responses in our sinatra based application on top of ruby rack. So despite, setting headers correctly, ensuring I hardcode everything to be UTF-8 end to end, it still mangles properly encoded utf-8 content by applying a system default encoding both on the way in and out of my system. Lets just say I’ve been cursing a lot in the past few days. In terms of WTF’s per second, I was basically not being very nice.

Anyway, Java has a notion of a default encoding that is set (inconsistently) from the environment that you start that you can only partially control with the file.encoding system property. On windows and OSX this is not going to be UTF-8. That combined with Ruby’s attitude to be generally sloppy about encodings and just hoping for the best is basically a recipe for trouble. Ruby strings just pickup whatever system encoding is default. Neither rack nor Sinatra try to align that in any way with the http headers it deals with.

When you start jruby using the provided jruby script it actually sets the file.encoding system property to utf-8. However, this does not affect java.nio.charset.Charset.defaultCharset(), which is used a lot by Jruby. IMHO that is a bug and a serious one. Let me spell this out: any code that relies on java.nio.charset.Charset.defaultCharset() is broken. Unless you explicitly want to support legacy content that is not in UTF-8, in which case you should be very explicit about exactly which of the gazillion encodings you want. The chances of that aligning with the defaults are going to be slim.

This broke in a particularly weird way for me: it was working fine on OS X (with its broken system defaults). However, it broke on our server, which is Ubuntu 12.04 LTS. If you’ve ever used ubuntu, you may have noticed that terminals use UTF-8 by default. This is great, all operating systems should do this. There’s one problem though: it’s not actually system wide and my mistake was assuming that it was. It’s fine when you log in and use a terminal. However, when starting jruby from an init.d script with a sudo, the terminal encoding reverts back to some shit ansi default from the nineteen sixties that is 100% guaranteed inappropriate for any modern web server. This default is then passed on to the jvm, which causes jruby with rack to the wrong things.

The fix, add this to your script:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

This forces the encoding to be correct in the bash script and should hopefully trickle down the broken pile of software that just assumes the defaults make sense.

Jruby and Java at Localstream

Update. I’ve uploaded a skeleton project about the stuff in this post to Github. The presentation that I gave on this at Berlin Startup Culture on May 21st can be found here.

The server side code in the Localstream platform is a mix of Jruby and Java code. Over the past few months, I’ve gained a lot of experience using the two together and making the most of idioms and patterns in both worlds.

Ruby purist might wonder why you’d want to use Java at all. Likewise, Java purists might wonder why you’d waste your time doing Jruby at all instead of more hipster friendly languages such as Scala, Clojure, or Kotlin. In this article I want to steer clear of that particular topic and instead focus on more productive things such as what we use for deployment, dependency management, dependency injection, configuration, and logging. Also it is an opportunity to introduce two of my new Github projects:

The Java ecosystem provides a lot of good, very well supported technology. This includes the jvm itself but also libraries such as Google’s guava, misc. Apache frameworks such as httpclient, commons-lang, commons-io, commons-compress, the Spring framework, icu4j, and many others. Equivalents exist for Ruby, but mostly those equivalents leave a lot to be desired in terms of features, performance, design, etc. It didn’t take me long to conclude that a lot of the ruby stuff out there is sub-standard and not quite up to my level of expectations. That’s why I use Jruby instead of Ruby: it allows me to get the best of both worlds. The value of ruby is in its simplicity and the language. The value of Java is access to an enormous amount of good software. Jruby allows me to have both.

Continue reading “Jruby and Java at Localstream”

Mobile Linux

A lot has been written about mobile and embedded device platforms lately (aka. ‘phone’ platforms). Usually articles are about the usual incumbent platforms: Android, IOS, and Windows Phone and the handful of alternatives from e.g. RIM and others. Most of the debate seems to revolve around the question whether IOS will crush Android, or the other way around. Kind of a boring debate that generally involves a lot of fan boys from either camp highlighting this or that feature, the beautiful design, and other stuff.

Recently this three way battle (or two way battle really, depending on your views regarding Windows Phone), has gotten a lot more interesting. However, my in view this ‘war’ was actually concluded nearly a decade ago before it even started and mobile linux won in a very unambiguous way. What is really interesting is how this is changing the market right now.

Continue reading “Mobile Linux”

Using Elastic Search for geo-spatial search

Over the past few months we have been quietly working on the localstre.am platform. As I have mentioned in a previous post, we are using elastic search as a key value store and that’s working pretty nicely for us. In addition to that we are also using it as a geospatial search engine.

Localstrea.am is going to be all about local and that means geospatial data and lots of it. That’s not just POIs (points of interest) but also streets, cities, and areas of interest. In geospatial terms that means shapes: points, paths, and polygons. Doing geospatial search means searching through documents that have geospatial data associated with it using a query that also contains geospatial data. So given a shape, find every document with a shape that overlaps or intersects with it.

Since elastic search is still very new and rapidly evolving (especially the geospatial functionality), I had some worries about whether it would work as advertised. So, after months of coding it was about time to see if it could actually take a decent data set and work as advertised instead of falling over and dying in a horrible way.

Continue reading “Using Elastic Search for geo-spatial search”

Maven and my GitHub projects

I have several Java projects on github. Generally these projects require maven to build and some of them require each other.

Deploying to maven central would be the nice way to provide binaries to my users. However, the process that Sonatype imposes for pushing out binaries via their free repository is somewhat tedious (involves jira fiddling, pgp signing), hard to automate, and given the time it takes and the frequency with which I want to push out binaries, it’s just not worth my time.

Basically I want to code, check that it builds, commit, release, deploy and move on with the minimum amount of fuss.

So, I set up my own maven repository. This is good enough for what we do in LocalStre.am. If you wish to use some of my github projects, you may utilize this repository by adding the following snippet to your maven settings.xml.

<!-- this bit goes into the profiles section -->
<profile>
    <id>jillesvangurp</id>
    <repositories>
        <repository>
            <id>release.repo</id>
            <url>http://repo.jillesvangurp.com/releases/</url>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>snapshot.repo</id>
            <url>http://repo.jillesvangurp.com/snapshots/</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>
</profile>
....
<activeProfiles>
    <activeProfile>sonatype</activeProfile>
    <activeProfile>jillesvangurp</activeProfile>
</activeProfiles>

Disclaimer. This repository is provided as is. Particularly the snapshot part is basically there to support my own builds. I may clean it out at any time and there is no guarantee that snapshots of my github projects are there or up to date. Likewise, the releases repository is there to support my own builds. I will likely only keep recent releases there and you shouldn’t use older releases of my projects in any case. Binaries in either maven repository may broken and do untold amounts of damage (which the LICENSE tells you is your problem).

If this worries you, you can always build from source of course. Simply check out the github project (and any dependencies) and maven clean install it or deploy it to your own maven repository.

If not, using my repository may be convenient for you and I generally deploy source and javadoc jars as well.

If any of my projects become popular enough to warrant me putting in some more energy, I can deploy to maven central. I’ve done this for the jsonj project up to 0.7 for example. If this would help you, simply drop me an email and I’ll look into it.

I’ll try to keep this post up to date if things should change.

Puppet

I recently started preparing for deploying our localstre.am codebase to an actual server. So, that means I’m currently having a lot of fun picking up some new skills and playing system administrator (and being aware of my short comings there). World plus dog seems to be recommending using either Chef or Puppet for this  and since I know few good people that are into puppet in a big way and since I’ve seen it used in Nokia, I chose the latter.

After getting things to a usable state, I have a few observations that come from my background of having engineered systems for some time and having a general gut feeling about stuff being acceptable or not that I wanted to share.

  • The puppet syntax is weird. It’s a so-called ruby DSL that tries to look a bit like json and only partially succeeds. So you end up with a strange mix of : and => depending on the context. I might be nit picking here but I don’t think this a particularly well designed DSL. It feels more like a collection of evolved conventions for naming directories and files that happen to be backed by ruby. The conventions, naming, etc. are mostly non sensical. For example the puppet notion of a class is misleading. It’s not a class. It doesn’t have state and you don’t instantiate it. No objects are involved either. It’s more close to a ruby module. But in puppet, a module is actually a directory of puppet stuff (so more like a package). Ruby leaks through in lots of places anyway so why not stick with the conventions of that language? For example by using gems instead of coming up with your own convoluted notion of a package (aka module in puppet). It feels improvised and gobbled together. Almost like the people who built this had no clue what they were doing and changed their minds several times. Apologies for being harsh if that offended you BTW ;-).
  • The default setup of puppet (and chef) assumes a lot of middleware that doesn’t make any sense whatsoever for most smaller deployments (or big deployments IMNSHO). Seriously, I don’t want a message broker anywhere near my servers any time soon, especially not ActiveMQ. The so-called masterless (puppet) or solo (chef) setups are actually much more appropriate for most people.  They are more simple and have less moving parts. That’s a good thing when it comes to deployments.
  • It tries to be declarative. This is mostly a good thing but sometimes it is just nice to have an implicit order of things following from the order in which you specify things. Puppet forces you to be explicit about order and thus ends up being very verbose about this. Most of that verbosity is actually quite pointless. Sometimes A really comes before B if I specify it in that order in one file.
  • It’s actually quite verbose compared to the equivalent bash script when it comes to basic stuff like for example starting a service, copying a file from a to b, etc. Sometimes a “cp foo bar; chmod 644 bar” just kinda does the trick. It kind of stinks that you end up with these five line blobs for doing simple things like that. Why make that so tedious?
  • Like maven and ant in the Java world it, it tries to be platform agnostic but only partially succeeds. A lot of platform dependencies creep in any way and generally puppet templates are not very portable. Things like package names, file locations, service names, etc. end up being platform specific anyway.
  • Speaking of which, puppet is more like ant than like maven. Like ant, all puppet does is provide the means to do different things. It doesn’t actually provide a notion of a sensible default way that things are done that you then customize, which is what maven does instead. Not that I’m a big fan of maven but with puppet you basically have to baby sit the deployment and worry about all the silly little details that are (or should be) bog standard between deployments: creating users, setting up & managing ssh keys, ensuring processes run with the appropriate restrictions, etc. This is a lot of work and like a lot of IT stuff it feels repetitive and thus is a good candidate for automation. Wait … wasn’t puppet supposed to be that solution? The puppet module community provides some reusable stuff but its just bits and pieces really and not nearly enough for having a sensible production ready setup for even the simplest of applications. It doesn’t look like I could get much value out of that community.

So, I think puppet at this stage is a bit of a mixed bag and I still have to do a lot of work to actually produce a production ready system. Much more than I think is justified by the simplicity of real world setups that I’ve seen in the wild. Mostly running a ruby or java application is not exactly rocket science. So, why exactly does this stuff continue to be so hard & tedious despite a multi billion dollar industry trying to fix this for the last 20 years or so?

I don’t think puppet is the final solution in devops automation. It is simply too hard to do things with puppet and way too easy to get it wrong as well. There’s too much choice, a lack of sensible policy, and way too many pitfalls. It being an improvement at all merely indicates how shit things used to be.

Puppet feels more like a tool to finish the job that linux distributors apparently couldn’t be bothered to do in arbitrary ways than like a tool to produce reliable & reproducible production quality systems at this point and I could really use a tool that does the latter without the drama and attitude. What I need is sensible out of the box experience for the following use case: here’s a  war file, deploy that on those servers.

Anyway, I started puppetizing our system last week and have gotten it to the point where I can boot a bunch of vagrant virtual machines with the latest LTS ubuntu and have them run localstre.am in a clustered setup. Not bad for a week of tinkering but I’m pretty sure I could have gotten to that point without puppet as well (possibly sooner even). And, I still have a lot of work to do to setup a wide range of things that I would have hoped would be solved problems (logging, backups, firewalls, chroot, load balancing a bog standard, stateless http server, etc). Most of this falls in the category of non value adding stuff that somebody simply has to do. Given that we are a two person company and I’m the CTO/server guy, that would be me.

I of course have the benefit of hindsight from my career in Nokia where I watched Nokia waste/invest tens of millions on deploying simple bog standard Java applications (mostly) to servers for a few years. It seems simple things like “given a war file, deploy the damn thing to a bunch of machines” get very complicated when you grow the number of people involved. I really want to avoid needing a forty people ops team to do stupid shit like that.

So, I cut some corners. My time is limited and my bash skills are adequate enough that I basically only use puppet to get the OS in a usable enough state that I can hand off to to a bash script to do the actual work of downloading, untarring, chmodding, etc. needed to  get our application running. Not very puppet like but at least it gets the job done in 40 lines of code or so without intruding too much on my time. In those 40 lines, I install the latest sun jdk (tar ball), latest jruby (another tar ball), our code base, and the two scripts to start elastic search and our jetty/mizuno based app server.

What would be actually useful is reusable machine templates for bog standard things like php and ruby capable servers, java tomcat servers, apache load balancers, etc with sensible hardened configurations, logging, monitoring, etc. The key benefit would be inheriting from a sensible setup and only changing the bits that actually need changing. It seems that is too much to ask for at this point and consequently hundreds of thousands of system administrators (or the more hipster devops if you are so inclined) continue to be busy creating endless minor variations of the same systems.