OpenID, the identity landscape, and social networks

I’m still getting used to no longer being in nokia research center. One of my disappointments of being in NRC and being a vocal proponent of openid, social networks, etc. was that despite lots of discussion on this topic not much has happened in terms of me getting room to work on these topics or me convincing a lot of people about my opinions on these topics. I have one publication that is due out whenever the magazine involved gets around to approving and printing the article. But that’s it.

So, I take great pleasure in observing how things are evolving lately and finding that I’ve been pushing the right topics all along. Earlier this week, Facebook became a relying party for OpenID. Outside the OpenID community and regular techcrunch readers, this seems to have not been a major news story. Since, just about anybody I discussed this topic with in the past few years (you know who you are) always insisted that “no way that a major network like Facebook will ever use OpenID”. If you were one of those people: admit right now that you were wrong.

It seems to me that this is a result of fact that the social networking landscape is maturing. As part of this maturation process, several open standards are emerging. Identity and authentication are very important topics here and it seems the consensus is increasingly that no single company is going to own all 6-7 billion identities on this planet. So naturally any company with the ambition to potentially separate 6-7 billion individuals from their money for some product or service, will need to either work with multiple identity providers.

So naturally such companies require a standard for doing so. That standard is OpenID. It has no competition. There is no alternative. There are plenty of proprietary APIs that only work with limited sets of identity providers but none like OpenID that can work with all of them.

Similarly, major identity providers like Google, Facebook are stuck at sharing a few hundred million users between them, they shift their attention to somehow involving all those users that didn’t sign up with them. Pretty much all of them are OpenID providers already. Facebook just took the obvious next step in becoming a relying party as well. The economics are mindbogglingly simple: Facebook doesn’t make money from verifying peoples identity but they do make money from people using their services. OpenID relying party means the group of people who can access their services just grew to the entire internet population. Why wouldn’t they want that? Of course this doesn’t mean that world + dog will now be a Facebook user but it does mean that one important obstacle has just disappeared.

BTW. Facebook’s current implementation is not very intuitive. I’ve been able to hook up myopenid to my facebook account but I haven’t actually found a login page where I can login with my openid yet. It seems that this is a work in progress still.

Anyway, this concludes my morning blogging session. Haven’t blogged this much in months. Strange how the prospect of not having to work today is energizing me 🙂

wolframalpha

A few years ago, a good friend gave me a nice little present: 5 kilos of dead tree in the form of Stephen Wolfram’s “A new kind of science”. I never read it cover to cover and merely scanned a few pages with lots of pretty pictures before deciding that this wasn’t really my cup of tea. I also read a bit some of the criticism on this book from the scientific community. I’m way out of my league there so, no comments from be except a few observations:

  • Presentation of the book is rather pompous and arrogant. The author tries to convince the readers that they the most important piece of science ever produced in their hands.
  • This is what set of most of the criticism. Apparently, the author fails to both credit related work as well as properly back up some of his crucial claims with proper evidence.
  • Apparently there are quite a few insufficiently substantiated claims which affects credibility of the overall book and claims of the author
  • The approach of the author to write the book has been the ivory tower approach where he quite literally dedicated a decade+ of his life to writing it during which he did not seek out much criticism from his peers.
  • So, the book is controversial and may either turn out to be the new relativity theory (relatively speaking) or a genuine dud. I’m out of my league deciding either way

Anyway, the same Stephen Wolfram has for years been providing the #1 mathematical software IDE: Mathematica, which is one of the most popular software tools for anyone involved with mathematics. I’m not a mathematician and haven’t touched such tools in over 10 years now (dabbled a bit with linear algebra in college) but as far as I know, his company and product have a pretty solid reputation.

Now the same person has brought the approach he applied to his book and his solid reputation as a owner of Mathematica to the wonderful world of Web 2.0. Now that is something I know a thing or two about. Given the above I was initially quite sceptic when the first, pretty wild, rumors around wolframalpha started circulating. However, some hands on experience has just changed my mind. So here’s my verdict:

This stuff is great & revolutionary!

No it’s not Google. It’s not Wikipedia either. It’s not Semantic web either. Instead it’s a knowledge reasoning engine hooked up to some authoritative data sets. So, it’s not crawling the web. It’s not user editable and it is not relying on traditional Semantic web standards from e.g. W3C (though very likely it must be using similar technology).

This is the breakthrough that was needed. The semantic web community seems to be stuck in an endless loop pondering pointless standards, query formats, graph representations and generally rehashing computer science topics that have been studied for 40 years now without producing much viable business models or products. Wikipedia is nice but very chaotic and unstructured as well. The marriage of semantic web and wikipedia is obvious has been tried countless times and has so far not produced interesting results. Google is very good at searching through the chaos that is the current web but can be absolutely unhelpful with simple, fact based questions. Most fact based questions in Google return a wikipedia article as one of the links. Useful, but it doesn’t directly answer the question.

This is exactly the gap that wolframalpha fills. There’s many scientists and startups with the same ambition but Wolframalpha.com got to market first with a usable product that can answer a broad range of factual questions with knowledge imported into its system from trustworthy sources. It works beautifully for facts and knowledge it has and allows users to do two things:

  • Find answers to pretty detailed queries from trustworthy sources. Neither Wikipedia nor Google can do this, at best they can point you at a source that has the answer and leave it up to you to judge the trustworthyness of the source.
  • Fact surfing! Just like surfing from one topic to the next on Wikipedia is a fun activity, I predict that drilling down facts on wolframalpha is a equally fun and useful.

So what’s next? Obviously, wolframalpha.com will have competition. However, their core asset seems to be their reasoning engine combined with the quite huge fact database which is to date unrivaled. Improvements in both areas will solidify their position as market leader. I predict that several owners of large bodies of authoritative information will be itching to be a part of this and partnership deals will be announced. Wolframalpha could easily evolve into a crucial tool for knowledge workers. So crucial even that they might want to pay for access to certain information.

Some more predictions:

  • Several other startups will start competing soon with competing products. There should be dozens of companies working on similar or related products. Maybe all they needed was a somebody taking a first step.
  • Google likely has people working on such technologies they will either launch or buy products in this space in the next two years
  • Main competitors of Google are Yahoo and MS who have both been investing heavily in search technology and experience. They too will want a piece of this market
  • With so much money floating around in this market, wolframalpha and similar companies should have no shortage of venture capital, despite the current crisis. Also, wolframalpha might end up being bought up by Google or MS.
  • If not bought up or outcompeted (both of which I consider to be likely), wolframalpha will be the next Google

Java Profiling

One of the fun aspects of being in a programmer job is the constant stream of little technical problems that require digging into. This can sometimes be frustrating but it’s pretty cool if you suddenly get it and make the problem go away. Anyway, since starting in my new job in February, I’ve had lots of fun like this. Last week we had a bit of Java that was obviously out of line performance wise. My initial go at the problem was to focus on the part that had been annoying me to begin with: the way xml parsing was handled. There’s many ways to do XML parsing in Java. We use Jaxb. Jaxb is nice if you don’t have enough time to do the job properly with XPath but the trade off is that it can be slow and that there are a few gotchas like for example creating marshallers and unmarshallers is way more expensive than actually using them. So when processing a shitload of XML files, you spent a lot of time creating and destroying marshallers. Especially if you break down the big xml files into little blobs that are parsed individually. Some simple pooling using ThreadLocal improved things quite a bit but it was still slow in a way that I could not explain with just xml parsing. All helpful but it still felt unreasonably slow in one particular class.

So I spent two days setting up a profiler to measure what was going on. Two days? Shouldn’t this be easy? Yes, except there’s a few gotchas.

  1. The Eclipse TPTP project has a nice profiler. Except it doesn’t work with macs, or worse, macs with jdk1.6. That’s really an eclipse problem, the UI is tied to 1.5 due to Apple stopping to support of Cocoa integration in 1.6.
  2. So I fired up vmware, installed the latest Ubuntu 9.04 (nice), spent several hours making that behave nicely (file sharing is broken and needs a patch). Sadly no OpenGL eyecandy in vmware.
  3. Then I installed Java, eclipse, TPTP, and some other stuff
  4. Only to find out that TPTP and JDK 1.6 is basically unusable. First, it comes with some native library compiled against a library that no longer is used. Solution: install it.
  5. Then every turn you take there’s some error about agent controllers. If you search for this you will find plenty of advice telling you to use the right controller but none whatsoever as to how you would go about doing so. Alternatively people tell you to just not use jdk 1.6 I know because I spent several hours before joining the gang of “TPTP just doesn’t work, use netbeans for profiling”.
  6. So, still in ubuntu, I installed Netbeans 6.5, imported my eclipse projects (generated using maven eclipse:eclipse) and to my surprise this actually worked fine (no errors, tests seem to run).
  7. Great so I right clicked a test. and chose “profile file”. Success! After some fiddling with the UI (quite nerdy and full of usability issues) I managed to get exactly what I wanted
  8. Great! So I exit vmware to install Netbeans properly on my mac. Figuring out how to run with JDK 1.6 turned out to be easy.
  9. Since I had used vmware file sharing, all the project files were still there so importing was easy.
  10. I fired up the profiler and it had remembered the settings I last used in linux. Cool.
  11. Then netbeans crashed. Poof! Window gone.
  12. That took some more fiddling to fix. After checking the release notes it indeed mentioned two cases of profiling and crashes which you can fix with some commandline options.
  13. After doing that, I managed to finally get down to analyzing what the hell was going on. It turned out that my little test was somehow triggering 4.5 million calls to String.replaceAll. WTF!
  14. The nice thing with inheriting code that has been around for some time is that you tend to ignore those parts that look ugly and don’t seem to be in need of your immediate attention. This was one of those parts.
  15. Using replaceAll is a huge code smell. Using it in a tripple nested for loop is insane.
  16. So some more pooling, this time of the regular expression objects. Pattern.compile is expensive.
  17. I re-ran the profiler and … problem gone. XML parsing now is the bottleneck as it should be in code like this.

But, shouldn’t this just be easy? It took me two days of running from one problem to the next just to get a profiler running. I had to deal with crashing virtual machines, missing libraries, cryptic error messages about Agent Controllers, and several unrelated issues. I hope somebody in the TPTP project reads this: your stuff is unusable. If there’s a magic combination of settings that makes this shit work as it should: I missed it, your documentation was useless, the most useful suggestion I found was to not use TPTP. No I don’t want to fiddle with cryptic vm commandline parameters, manually compiling C shit, fiddle with well hidden settings pages, etc. All I wanted was right click, profile.

So am I now a Netbeans user? No way! I can’t stand how tedious it is for coding. Run profiler in Netbeans, go ah, alt tab to eclipse and fix it. Works for me.