Re: bear shaving

I was going to submit the stuff below in a shortened form as a comment to this fun little blog post on “bear shaving” but it sort of grew into a full blown article, again. To summarize the original article, there’s this nice analogy of shaving bears to help them cope with global warming and how that is not really addressing the core issues (not to mention dangerous). The analogy is applied to integration builds and people patching things up. Then the author sort of goes off and comes up with a few arguments against git and decentralization.

While some of the criticism is valid, this of course ticked me off 🙂

I see Git as a solution to increase the amount of change and dealing more effectively with people working in parallel. Yes, this puts a strain on integrating the resulting changes. But less change is the equivalent of bear shaving here. Change is good. Change is productivity. You want more productivity not less. You want to move forward as fast as you possibly can. Integration builds breaking are a symptom of a larger problem. Bear shaving would be doing everything you can to make the integration builds work again, including forcing people to sit on their hands. The typical reflex to a crisis like this in the software industry is less change, complete with the process to ensure that people do less. This is how waterfall was born. Iterative or spiral development is about the same thing but doing it more frequently and less longer. This was generally seen as an improvement. But you are still going to sit on your hands for pro longed periods of time. The real deal these days is continuous deployment and you can’t do this if you are sitting on your hands.

Breaking integration builds have a cause: the people making the changes are piling mistake on mistake and keep bear shaving (I love the metaphor) the problem because they are under a pressure to release and deliver functionality. All a faster pace of development does is make this more obvious. Along with the increased amount of change per time-unit comes also an increased amount of mistakes per time unit. Every quick fix and every misguided commit makes the system as a whole a little less stable. That’s why the waterfall model includes a feature freeze (aka. integration) where no changes are allowed because the system would never get finished otherwise.

A long time ago I wrote an article about design erosion. It was one of the corner stones of my phd thesis (check my publication page if you are interested). In a nutshell: changes are cumulative and we take design decisions in the context of our expectations of the future. Only problem: nobody can predict the future accurately and as a consequence, there will be mistakes from time to time. It is inevitable that you will get it wrong sometimes and won’t realize right away. You can’t just rip out a single change you made months/years ago without the depending subsequent changes being affected. In other words, change is cumulative: rip one piece out and the whole sand castle collapses. Some of the decisions will be wrong or will have to be reconsidered at some point and because changes are inter dependent, fixing design erosion can be painful and expensive. Consequently, it is inevitable that all software designs erode over time as inevitably such change is delayed until the last possible moment. Design erosion is a serious problem. You can’t just fix a badly eroded system that you had for years over-night. Failing to address design erosion in time can actually kill your company, product or project. But you can delay the inevitable by dealing with the problems closer to where they originate instead of dealing with it later. Dealing with the problem close to where it originates means less subsequent changes are affected, meaning that you minimize the cost of fixing the problem. Breaking integration builds are a symptom of an eroding design. Delaying the fix makes it worse.

So, the solution is to refactor and rethink the broken parts of the system to be more robust, easier to test, more flexible to meet the requirements, etc. Easier said then done of course. However, Git is a revolutionary enabler here: you can do the more disruptive stuff on a git branch and merge it back in when it is ready instead of when you go home and break the nightly build. This way you can do big changes without destabilizing your source tree. Of course you want continuous integration on your branches too. That way, you will push less mistakes between branches, thus solving problems closer to their origin and without affecting each other. You will still have breaking builds, but they will be cheaper to fix. Decentralization is the solution here and not the problem as is suggested in the blog post I linked above!

Here’s why decentralization works: testing effort grows exponentially to the amount of change. Double the amount of change, and you quadruple the testing effort. So don’t do that and keep the testing effort low. In a centralized world you do this through feature freeze. By stopping all change, you can actually find all the problems you introduced. In a decentralized world you do this by not pushing your changes until the changes you pull are no longer breaking your local branch. Then you push your working code. Why is this better? 1) you integrate incoming changes with your changes instead of the other way around. 2) you do this continuously (every time you pull changes), so you fix problems when they happen. 3) your changes only get pushed when they are stable which means that other people have less work with #1 and #2 on their side. 4) By keeping changes isolated from each other, you make it easier to test them. Once tested, the changes are a lot easier to integrate.

Continuous integration can help here but not if you only do it on the production branch: you need to do it all over the place. Serializing all the change through 1 integration environment turns it into a bottleneck: your version system may be decentralized but if your integration process is not you are still going to be in trouble. A centralized build system works ok with a centralized version system because centralized version system serializes the changes anyway (which is a problem and not something to keep bear shaving). The whole point of decentralizing version management is decentralizing change. You need to decentralize the integration process as well.

In a nutshell, this is how the linux kernel handles thousands of kloc of changes per day with hundreds of developers. And, yes, it is no coincidence that those guys came up with git. The linux kernel deals with design erosion by a continuous re development. The change is not additive, people are literally making changes all over the linux source tree, all the time. There is no way in hell they could deal with this in a centralized version management type environment. As far as I know, the linux kernel has no automated continuous integration. But they do have thousands of developers running all sorts of developer builds and reporting bugs against them, which is really the next best thing. Nothing gets in the mainline kernel without this taking place.

server side osgi, a myth?

Two years ago, I started using OSGI, the popular Java dependency injecting component standard, for an internal project. Fast forward to now and I have a nice set of bundles that depend on, amongst other the OSGI HTTP service.

All along, I’ve been reading how great OSGI is and how flexible it is and how it is the future of server side Java. I was ready to believe it. But to cut to the meat of this blog post: server side OSGI is vaporware. It doesn’t exist. None of the vendors actually support it. Support it as in production quality, well documented, widely used product available right now. I’ve looked at Felix, Tomcat, Equinox,  Jetty, Glassfish, JBoss, etc. and came up with nothing but a few obscure, unsupported, undocumented components. The default HTTP service implementation is not my idea of scalable & production quality. And the connections of existing production quality OSGI containers to existing production quality application servers is sketchy at best.

Frankly, I’m very surprised at this.I know lots of people that claim use OSGI serverside and there are are lots of announcements of vendor X endorsing OSGI bla bla bla fully modularized bla bla bla dependency injection  bla bla bla. That’s great but after two years of OSGI hacking I was hoping for something a little more substantial than what I have found so far:

The best option I came up with is the HTTP servlet bridge from equinox. The documentation for this is either hopelessly out of date or this is a case of abandonware. Basically all the page says is download this bridge.war and good luck. Problem #1 this bridge.war is from 1997 .. eh 2007 :-). Problem #2, I’d like to use a bit newer version of Equinox. Does this work at all? Are people still working on this? Problem #3, this page hasn’t changed substantially since I started using OSGI. Is anyone still working on this or is this a dead project? Are there any users?

Option #2 is to use Apache Felix which apparently can embed Jetty. That’s great but I’m a tomcat guy and am more interested in using tomcat as the outer container than Jetty. Neither the jetty nor the tomcat option is documented properly. I’m not even sure the tomcat option is possible/advisable. Some people hint at this being possible. A particular concern for me is that I need to cluster the damn thing, potentially on a large scale. Is this possible at all? I’m pretty sure people have done this but in terms of production quality code and documentation they have not left much of a trail. The Felix people don’t seem to much documentation in general. There’s of course the gratuitous OSGI tutorial and some hints of how you could use it but that’s it.

This situation is not something I can sell here at Nokia. I need something more substantial, preferably Tomcat or JBoss based that is 1) scalable in a cluster 2) production quality 3) well documented. I’m now pretty far convinced that what I’m looking for doesn’t exist. If I don’t find something soon, I’m going to just have to rip out all the OSGI stuff and switch to a proper dependency injecting container. Spring 3.0 is looking pretty neat for example but a bit heavyweight in my opinion.

Anyway, comments are open and please point out how wrong I am and what information I overlooked :-). My main gripe here is that I just have very little to base a decision on. Sketchy documentation, bits and pieces on blogs and mailinglists but nothing solid. Either OSGI is a genuine server side option or it is just an urban legend (some people have heard of other people that have done this). Everything I’ve seen so far hints at the latter.

I know Jboss 4, Glassfish 3, and Spring Application server are all going to be OSGI based of course. These are far from vaporware but also not exactly production ready. Additionally, being OSGI based is one thing, being able to deploy servlets from OSGI bundles is another thing. Most things I’ve read on this suggests that these servers are not really designed to allow application developers to interact with the OSGI container directly (i.e. deploying bundles, using http service instead of WAR files, etc.).

Photos Zurich and Dagstuhl

I’m traveling a lot lately. Two weeks ago I was in Zurich at the first Internet of Things Conference. I uploaded some pictures already last week and some more today.

Last week I also attended a Dagstuhl seminar on Combining the advantages of product lines and open source to present the position paper I posted some time ago. Naturally, I also took some pictures there.

Interestingly, one of the participants was Daniel German who does a lot of interesting things including publishing good articles on software evolution and working on a source forge project called panotools that happens to power most of what makes Hugin cool. Hugin is of course the tool I have been using for some time now to stitch together photos into very nice panoramas. I felt envious and lucky at the same time watching him take photos. Envious of his nice Canon 40D with very cool fish eye lens and lucky because his photo bag was huge and probably quite heavy considering the fact that he had two more lenses in there.

Attendees of the Dagstuhl Seminar

The whole gang together. Daniel is the guy in the orange shirt.

One of the best features of Dagstuhl: 1 beer = €1. Not quite free beer but close enough. And afterall, OSS is about free speech and cheap beer definitely loosens the tongues.

From SPLs to Open, Compositional Platforms

Below is a position paper I submitted to the upcoming Dagstuhl seminar I am attending. It’s not peer reviewed and it is not clear at this point if there will be any proceedings. So, as an experiment, I will just put the full text in a blog post as well as the pdf you can find on my publications page. The reason I am doing this is twofold: I want people to read stuff I write and locking it up in some hard to find proceedings just isn’t doing the trick. Secondly, this blog has a comment feature. Please feel free to use it.


From SPLs to Open, Compositional Platforms

Jilles van Gurp & Christian Prehofer
Smart Space Lab
Nokia Research Center
Helsinki, Finland

Abstract. In this position paper we reflect on how software development in large organizations such as ours is slowly changing from being top down managed, as is common in SPL organizations, towards something that increasingly resembles what is happening in large open source organizations. Additionally, we highlight what this means in terms of organization and tooling.

Trends and Issues

Over the past decade of our involvement with Software Product Lines, we have seen the research field grow and prosper. By now, many companies have adopted SPL approaches for their core software development. For example, our own company, Nokia, features prominently on the SEIs Product Line hall of fame [SEI 2006]. Recently, we [Prehofer et al. 2007], and others [Ommering 2004] have published articles on the notion of compositional development that decentralizes the development of software platforms and products. The motivation for our work in this area is that we have observed that the following trends are affecting software development:

  • Widening platform scope and more diverse products. As “victims” of their own success, successful product lines allow for the creation of an ever wider range of products. Necessarily, these products have increasingly less in common with each other. Particularly, they are likely to have substantial product specific requirements and require increasing amounts of variability in the platform provided features to deal with conflicting and overlapping requirements in the base platform. In other words, the percentage of functionality shared across all products relative to the total amount of functionality in the platform is decreasing. At the same time, the percentage of platform assets actually used in any particular product is also decreasing.
  • Platforms stretch over multiple organizations. As platform and product development starts to span multiple organizational entities (companies, business units, open source projects, etc), more openness towards different and conflicting requirements, features, roadmaps and processes in different development entities is required. This concerns both open source software and commercial platforms that are developed and productized differently by third party companies.
  • Time to market and innovation speed. While time to market has always been a critical issue, it is particularly an issue with the growing size and complexity of Software Product Lines. In general, large scale software projects tend to have longer development cycles. In the case of Software Product Lines that have to cater for more and more heterogeneous products, length of development cycles
    tends to increase as complexity of the work related to defining, realizing and testing new functionality grows increasingly complex. However, time to market of features does not only include the product line development cycle but also the time needed to do product derivation as well as the development cycles of any external software the Software Product Line integrates. Worst case is that a feature first needs to be integrated in one of these dependencies; then it needs to be integrated into the next major release of the Software Product Line before finally a software product with the new feature can be developed and put in the market.

We are seeing examples of this in Nokia as well. For example, Nokia has software development spread over several major phone platforms (S30, S40, S60 and Linux Maemo) and launches multiple products from each of those platforms every year. Interesting to note here is that Nokia has never really retired a mobile phone software platform and is actively using all of them. Roughly speaking, S40 evolution is in sync with the popularization of the notion of Software Product Lines since the mid nineties. It is indeed this product line that is featured on the before mentioned SEI SPL hall of fame [SEI 2006]. Development for products and platforms is spread over many Nokia locations all over the globe as well as a complex network of subcontractors, customers and supplying companies. Additionally, the use of open source software and the intensive collaboration Nokia has with many of the associated projects are adding more complexity here. Finally, time to market is of course very important in the mobile phone market. Products tend to be on the market for only short time (e.g. 6-12 months) and developing them from a stable software platform can take more than a year in some cases. This excludes time needed for major new releases of our software platform. Consequently, disruptive new features in the platform may take years to reach the market in the form of new phones.

The way large organizations such as Nokia manage and organize their software and platform development is constantly pushing the limits of what is possible with software engineering & architecting tools and methodology. Nokia is one of a handful of companies world wide that manage tens of millions of code across its product lines and products. We see Software Product Lines as a way to develop software that has arguably been very successful in organizations like ours. However, we also note that increasingly development practice is deviating from practices that are prescribed by literature on Software Product Lines particularly with respect to centralized definition, control, ownership and management of software assets and products. Therefore, we argue that now the research community needs to adapt to this new reality as well.

The complexity and scale of the development organization increasingly make attempts to centrally manage it futile and counter productive. Conflicts of interest between stakeholders, bureaucracy, politics, etc are all affecting centralized platform and product decision making and can end up leading to unworkable compromises or delays in the software development process. Additionally, it is simply becoming impossible to develop software without depending on at least some key open source projects. Increasingly the industry is also participating as an active contributor in the open source community. Arguably, most of the open source community now consists of software developers sponsored in some way by for profit organizations. For example, Nokia is a very active participant in the mobile Linux community (the Maemo Linux platform) and ships products such as the N810 internet tablet where the majority of lines of code is actually coming from externally owned and run open source projects and even direct competitors (e.g. Intel and Motorola).

This changes the game of balancing product and platform requirements, needs and interests substantially from what is generally assumed in a classical SPL context where a single company develops both platform and products in house and where it is possibly to drive both product and platform development in a top down fashion. This simply does not work in a context where substantial amounts of critical software in a product are coming from external sources that are unwilling / unlikely to take orders from internal product managers or other types of executives external to their organization.

Effectively, this new reality necessitates a different approach to software development. Rather than driving a top down decomposition of products and features and managing development and software assets per this hierarchy, as is very much the consequence of implementing practices advertised in SPL literature, we propose to adopt a more compositional style of development.

Compositional Development

In our earlier work [Prehofer et al. 2007], we outlined an approach to adopt a more compositional approach to development. Rob van Ommering has argued along similar lines but still takes the traditional perspective of a (large) company managing a population of products [Ommering 2002][Ommering 2004]. However, what we propose here is to further decentralize development and organize similar to the open source community where many independent development teams of components, framework and product owners are working together. Each of those teams is acting to represent their own interests (and presumably those of whomever they work for). Their perspective on the external world is simply that of upstream and downstream dependencies. Downstream are the major users and customers that use the software the team produces. These stakeholders act as primary source of requirements and probably also funding. Upstream, teams operate that produce software required for using and developing the software. These teams in turn depend on their downstream dependencies and funding.

This decentralized perspective is very different from the centralized perspective and essentially allows each team to optimize for what is required from them downstream and what is available to them upstream. For example, requirements for each team come primarily from their downstream dependencies. Since there is no central controlling entity that dictates requirements, picking up these requirements and prioritizing them is very much the task of the teams themselves. Of course, they need to do so in cooperation with their downstream dependencies. Generally, especially when crossing organizational boundaries, requirements are not dictated but rather the development teams try to asses the needs of their most important customers.

Organization

As Conway’s Law [Conway 1968] predicts, the architectural decomposition of software is reflected in organizations. In many open source communities, project team dependencies reflect the architecture decomposition of software into packages, frameworks, libraries, components, or other convenient units of software decomposition. Obviously, without at least some structure and management in place, the approach advocated here results in total anarchy, which is not a good organizational model to accomplish anything but chaos. Again, we look at the open source world where organizations such as Ubuntu, Eclipse, Apache and Mozilla are driving development of thousands of projects. Each of these organizations has a surprisingly sophisticated organizational structure that comes with rules, best practices, decision making processes, etc. While there are no binding contracts enforcing these, participants in the community are required to play by the rules or risk being ignored.

In practice this means, participants voluntarily comply with practices and rules and take part in what is often called a meritocracy where important decisions are taken by those who have the merits to do so. Generally, this requires a track-record of making important contributions and having the trust of the community. For example, in the Eclipse foundation, which was founded by IBM, this means that individuals from some of their major competitors such as BEA and Red Hat actually lead some of the key projects under the eclipse umbrella. These individuals are essentially trusted by IBM to do the right things even though they work for a major competitor. Organizations such as Eclipse exist to represent the common interests of the project teams they are composed of. For example the eclipse foundation, which is very much a corporate driven (and financed) institution, represents a broad consortium of stakeholders that covers pretty much the entire spectrum of Java (and increasingly also non-Java) enterprise, desktop and mobile/embedded software related development tooling. In the past two years, they have organized two major, simultaneous releases of the major projects. In their latest release, which goes by the name of Europa, they managed to synchronize the release process of around 20 of their top level projects which are collectively developed by thousands of developers coming from dozens of companies. Many of these companies are competitors. For example, BEA and IBM are directly competing in the enterprise market and major contributors to multiple eclipse projects.

What this proves is that the way the Eclipse Foundation organizes development is extremely effective and scalable because it involves dozens of organizations and hundreds/thousands of individuals producing, integrating and testing an enormous amount of new software in a very short time frame. Organizing like this brings in the necessary flexibility to seamlessly work with numerous internal and external teams and acknowledges the reality that even internally relations between teams can be complex and difficult to manage centrally.

Tooling

A consequence of decentralizing is that aligning the use of tools across development teams becomes essential. When collaborating, it helps if tools between teams are at least similar and preferably compatible/the same. SPL research has over the past few years focused on tooling for variability management, configuration management and requirements management. However, getting these tools adopted and using them effectively in a context of thousands of software development teams that are collaborating is quite a challenge; especially since many of these tools are either in house developed or only used in a handful of companies. Tooling in the open source community tends to focus on the essentials. That being said, the OSS community has also produced many development tools that are now used on a massive scale. For example, Mozilla has had a pioneering role through their contribution of important tools such as Bugzilla and Bonsai (bug tracking and build monitoring). The whole point of the Eclipse foundation seems to be development tools. Additionally, they have a project called equinox that implements a very advanced framework that provides many interesting variability technologies and has put into mainstream use notions of using API versioning and provided and required interfaces on components. In short, there seems to be a gradual migration of SPL like tool features to mainstream tooling. Additionally, eclipse is of course a popular platform for developing such tooling in the research community.

Conclusions and Future work

In this position paper we tried to highlight a few of the key issues around the ongoing trend from integrational development towards a more open ecosystem where many stakeholders work on many pieces of software that are integrated into products by some of the stakeholders. We are currently working on an article about what it means to go from a software development practice to a compositional approach in terms of organizational models, practices and other aspects. In that article, we will list a number of practices that we associate with compositional development and evaluate these against practices in open source communities as well as selected SPL case studies from the research community. Arguably, SPLs have vastly improved software development in many companies over the past decade or so. Therefore, the key issue for the next decade will be re-aligning with the identified trends towards larger software development ecosystem while preserving and expanding the benefits that SPL development have brought.

We do not see compositional development vs. SPL development as a black and white kind of thing but instead regard this as a wide spectrum of development practices that each may or may not be applied by individual companies. The more they apply them, the more compositional their development becomes. In any case, the right set of practices is of course highly dependent on context, domain, stakeholders, etc. However, we observe that in order to scale development and in order to work with hundreds or even thousands of globally and organizationally distributed software developers effectively, it is necessary to let go of centralized control. Compositional development in this open environment is vastly more complex, organic, and so we believe, more cost effective.

References

[Conway 1968] M. E. Conway, How do committees invent, Datamation, 14(4), pp. 28-31, 1968.
[Ommering 2002] R. van Ommering, Building product populations with software components, proceedings of Proceedings of the 24rd International Conference on Software Engineering (ICSE 2002), pp. 255-265, 2002.
[Ommering 2004] R. Van Ommering, Building Product Populations with Software Components, Ph. D thesis, University of Groningen, 2004.
[Prehofer et al. 2007] C. Prehofer, J. van Gurp, J. Bosch, Compositionality in Software Platforms, in A. De Lucia, F. Ferrucci, G. Tortora, M. Tucci eds., Emerging Methods, Technologies and Process Management in Software Engineering, Wiley, 2008.
[SEI 2006] Software Engineering Institute, Product Line Hall of Fame, http://www.sei.cmu.edu/productlines/plp_hof.html, 2006.

New and updated publications

As you saw in yesterday’s post, my publication site has moved to this blog. I also took the opportunity to update the page with recent work:

You can download the pdfs and find the full refs from here: publications.

Towards Effective Smart Space Application Development: Impediments and Research Challenges

I submitted a nice position paper with two of my colleagues at Nokia to the CMPPC’07 (Common Models and Patterns for Pervasive Computing) Workshop, at Pervasive 2007 in Toronto next month.

Abstract:

State-of-the-art research and existing commercial off-the-shelf solutions provide several technologies and methods for building Smart spaces. However, developing applications on top of such systems is quite a complex task due to several impediments and limitations of available solutions. This paper provides an overview of such impediments and outlines what are the main research challenges that still need to be solved in order to enable effective development of applications and systems that fully exploit the capabilities of state-of-the-art technologies and methodologies. The paper also outlines a few specific issues and impediments that we, at the Nokia Research Center, faced in this field so far. It also sheds some light on how we are going to tackle some of the mentioned issues in the future.

Full details are on my publication site and you can download the pdf from there as well.

Semantic diffusion

Martin Fowler wrote a nice blog post on semantic diffusion. It’s a term he coins for describing the effect that the meaning of new terms tends to diffuse as people start using it without paying too much attention to the original definitions. As examples he uses web 2.0 and agile, both of which have suffered from a lot of semantic diffusion due to the associated hype and buzz.

I’ve noticed the same with the the term “software architecture”. This term was first coined by Perry and Wolf in 1992. Soon after, people started using it. And of course every self respecting software firm suddenly had “software architecture”, even the ones that you might say were “architecturally challenged” in the sense that they had the equivalent of Stonehenge (piled together rocks) rather than, say, the Eiffel Tower. Also, by the late nineties, every software architecture conference/workshop/symposium, some person would come up to kick off a discussion on “hey what do we actually mean by software architecture”. This was fun the first dozen of times but I found that the discussion resets itself as soon as you leave the room. Nobody reads up and especially the older stuff gets ignored a lot.
However, the trend is turning around. A lot of serious software architecture books, businesses and tools have emerged that allow us to separate the men from the boys when it comes to software architectures. The type of discussion as listed above still surfaces at basically any related conference but you can now end it quickly by pointing out a few good references and asking a few simple questions about practices,  tools, etc.
This is how language works. Semantic diffusion is a crucial linguistic concept that causes languages to constantly evolve and change. New words and concepts are added on a continuous basis and old ones are re-purposed as well. Good words survive and have their definitions sharpened and eventually documented in dictionaries, encyclopedias, literature and other reference material.
I sure hope this web X.0 ends soon. I’ve already seen people blogging about web 3.0. Essentially the semantic web people have already recognized that they are missing the boat for 2.0 and are now targeting 3.0 :-). Of course it’s just a matter of time before we start seeing web 4.0 being coined by which time the actual meaning of web 2.0 will have diffused to “so 2006”.

New papers

I’ve been writing a few papers over the past few months. Two of them are now on my publications site:

Both papers are about service oriented architectures. The first one tries to bridge my earlier work on variability to the domain of web services and service grids. I see the latter as the emerging defacto integration technology. This facts makes it a likely candidate for becoming the backbone of many software product families and populations. Barring real time constraints, web service technology is really well suited for integrating large sets of independently developed subsystems and components. The second paper presents our views on mobile architectures and how to integrate services on the mobile client.
You may have noticed a few SOAP related blog posts here over the past few months. I’ve been learning a lot both about the technical and architectural aspects of web service technology over the past few months. As you may have noticed in these previous posts, I have some mixed feelings about the current standards and especially their implementations. On one hand they solve real problems; on the other hand the level of complexity for the simple use cases is unacceptable.
From the architectural point of view I’m much more enthousiastic. I see the current level of technology as promising but riddled with child deceases (and acceptable for many use cases despite that). I’m confident that upcoming third and fourth generations of both web service concepts and technology will be much better. Both industry and technology vendor seem to get a better grip on the concepts, something that was definately lacking in the first generation web service technology.

There are two more upcoming papers that, if accepted, I will be able to put online towards the end of July.

WSDL Hell and other WS stuff

I’ve been working with web services technology extensively for the past few years. First as a regular software engineer and currently as a software architecture researcher at Nokia.

Right now the market can roughly be divided in a number of overlapping factions:

  • The enterprise service bus people (IBM et al.). These people consider SOAP to be one of the (many) ways to plug software into a so-called enterprise bus: middleware that does the communication and marshalling on behalf of the plugged in components. This notion is particularly popular among businesses with skeleton filled closets (legacy software). If this sounds an awful lot like CORBA, it is probably because these are the same people.
  • JBI (Sun et al.). Sun likes enterprise buses too but sees them more as a way of integrating Java tighter into the enterprise. JBI (Java Business Integration) is a container for java based services running inside an enterprise bus with convenient ways to access, and be accessed through a whole bunch of protocols (SOAP, CORBA, …). The subtle difference with the IBM vision is that JBI is more about exposing and integrating new Java based services than it is about exposing old legacy services to Java.
  • The WS-* (i.e. the whole mess of web service related standards being pushed by W3C, Oasis and others) people. These people base themselves on piles and piles of WSDL (web service description language) descriptions of all sorts of standardized service interfaces. The interfaces cover all sorts of functionality ranging from resource management to security. In theory this is nice, in practice prepare for agony trying to get any of that stuff working.
  • The lets use SOAP as the latest fashion in RPC protocols masses. Confused by the acronyms, most people produce and consume web services using a thick layer of tools that keep them far away from the nasty details. Of course the tools are quite stupid so effectively they are engaging in a really ineffective form of remote procedure calls. They like to think they are still doing distributed objects, but really all they got was a downgrade from good old CORBA.
  • The asynchronous XML guys. These guys realize that RPC over SOAP is a really bad idea. With responsetimes being measured in seconds, doing anything non trivial runs into some hard scalability issues. Not to mention that dealing with all the details ends up getting messy real quick. This is a vocal minority, most web services (including the high profile public ones) continue to be RPC based.
  • The REST (Representative State Transfer) guys. These guys got sick of all of the above and decided to just send (preferably simple) xml documents using HTTP. To them, the medium is not the message. It works surprisingly well for most usecases in the industry. For me setting up a REST based service is generally less work than the equivalent SOAP service, despite the fact that tools are supposed to make my life easy when doing the latter.

In short, it’s an ugly world out there. Few people get the whole picture. As a programmer, I am less than enthousiastic about all of the above. I remember fondly of being amazed with the ease with which two Java applications could talk to each other over RMI about ten years ago, effortlessly throwing entire running programs (aglets) over the network. Things have gotten a lot more difficult since then and a lot less flexible. Somewhere it seems, people forgot that this should be easy.

Let me summarize my concerns:

  • XML is a machine readable format for exchanging structured data that is poorly suited for human consumption. The common textual representation sadly encourages people to believe that they should edit it. Sadly few good xml editors exist. The ones that do exist are standalone, commercial products.
  • Many of the current web service solutions in the market are XML centric. That means they rely on the exchange, automated manipulation and manual editing of vast amounts of XML data. Manual editing is where all of these approaches become nasty.
  • To make things ‘easier’ for developers, tools generally come with their own set of tool specific xml documents in addition to tool specific extensions of the standard ones. The better tools offer alternatives to text editors for some of the documents. Don’t count on those tools to actually work as expected for non trivial usecases.
  • The tools are part of a vertical stack, usually from one vendor. For the vendor, the stack is a tool to keep the customers tied to its services. Vendor interoperability does not extend beyond the standardized xml formats. Forget about migrating a service or service client from tool A to tool B.
  • Standardization attempts to address this problem have resulted in more complex tools. The WS-* collection of standards is a good example.
  • Despite the many standards, such simple and crucial things as how to integrate a web service in a servlet container have not been standardized. Nor are there usable standards for accessing a web service from client applications. The only thing that has been standardized is the syntax and some semantics of communication between client and server. The process of actually making communication happen is not covered by those standards.

And now let me illustrate by an example. Suppose I want to expose this nice little method:

String helloWorld();

What hoops would you have to jump through to expose this as a web service and consume it from some client using off the shelf tools like Axis? Well, quite a few:

  • First you would need to generate a wsdl description. The tool for that is conveniently called java2wsdl The resulting document compared to the single line interface illustrates my earlier point beautifully. Several decisions need to be made:
    • Like what namespace should the package name be mapped to
    • What server address is going to be the endpoint for the service (not kidding you, this is part of the WSDL)
    • What is the name of the service.
  • The next step is to generate a server stub using wsdl2java. That is a bit of generated code that translates incoming messages back to Java.
  • Then you need to edit the generated code to make it do useful things. Yes that’s right, that means some complications if later on you decide that you would want to change the interface.
  • Additionally two wsdd files are generated by wsdl2java. Wsdd files tell axis what to deploy and undeploy.
  • At this point it is time to setup tomcat with the default axis web application that will host the service. Once you have that up and running you need to modify the axis web application to have your own jar files (including the compiled stub) in the classpath so that axis can access them. That’s right, you need to modify the service container to be able to run a service. If your service requires access to jndi resources, you will need to edit the default axis web.xml as well!
  • Now you can start tomcat and deploy the web service. Deploying in this case means using one of the default web services included with the axis web application to tell it that there is a new web service installed. The file used in this process is the earlier generated deploy.wsdd. Now that the service is running, it may be accessed. For that you need a clientstub.
  • To create a clientstub, download the wsdl description from your new web service (technically you could in this case use the earlier generated one. This is not always the case however!).
  • Using wsdl2java with a different set of parameters a few java classes may be generated. Compile them and use them to create a service call.

Now, that IMHO is a lot of work for Helloworld. Too much work in fact. All this stupid bookkeeping should be done automatically (I mean, Java has typechecking, generics and annotations for a good reason!). Be glad if it stays this simple. Unfortunately, it usually gets hairy if:

  • You ‘want’ (usually this means required) to use any of the WS-* stuff. This is the nightmare scenario, you need to edit basically all of the generated artifacts, hope that you don’t make any mistakes in the process and then it may work. A good example is securing the service using WS-Security. This will essentially triple your workload. You will be doing stuff like downloading various jar files to satisfy dependencies, fiddle with axis handlers, wsdd files and lots of other axis specific stuff.
  • You want to use some WS-* stuff not supported by axis (i.e. most of the WS standards). You will need to edit the generated WSDL file to do this.
  • You want to make the service asynchronous. This should be possible by modifying the wsdd files. I’ve never actually tried this. Nor have I ever encountered an asynchronous web service in the wild.
  • You want to change the Java interface and have these changes reflected in the WSDL and the client and server stubs. You need to start from step 1. Tip save some of the generated code you had to edit. You may be able to copy paste some stuff.

Now all of the above would still be doable if there was good documentation to assist programmers. Unfortunately there isn’t. Worse, any mistake you make will be punished with obscure exceptions either serverside or clientside. Obcure exceptions have two problems, they don’t tell you what the problem is and they don’t tell you where the problem is. Consequently, a small typo can cause you to spend hours trying to find out what is going on. I’ve been there multiple times. In several cases I found the solution just looking at the code where the exception came from (a big advantage of OSS software is that you can do that).
That’ in short is the reason I don’t like WSDL/SOAP based web services. Modern IDEs + application servers automate some of the tasks but rarely all. At best they hide the problem.

this is so true

I read this interesting article on mathematics and software engineering. Like many software engineers, I’ve had extensive mathematics training during my computer science education and of course in high school. The problem is, I don’t seem to remember much of it. I passed courses on statistics, probability theory, linear algebra, discrete mathematics 1 & 2, temporal logic, etc. I have a vague idea that linear algebra was about mostly about matrix manipulation and resolving formulas. But I haven’t really used any of that stuff since. I did use probability theory on a number of occasions when trying to figure out Bayesian belief networks. But even that is eight years ago now. At the time, I was wrapping my mind around the core algorithms of that technology (which involves some exposure to Bayes’ theory of course) but right now I wouldn’t get very far describing how a Bayesian belief network works. However, I know how to pick up the basics in afternoon of reading and even what to read. If ever needed, I’ll be able to brush up my skills.

And that is the main point of the article. Current mathematics education, especially in highschools, does not teach students the skills they need: namely to be able to aqcuire the mathematics knowledge they need. Instead mathematics education is all about force feeding large amounts of algorithms in the hope that some of it will be remembered. For most people this is not true. The few people that do remember end up studying mathematics. This is what the author of the post I’m citing calls depth first mathematics. They throw lots of stuff on integration theory at you in high school without actually explaining where it is coming from, how it will be useful to you in the future and how this fits in the overall mathematic tradition. So you dutifully learn by doing, pass the exam and then forget all about it in period of two years. That’s how I got through highschool and I even enjoyed doing some of the math.

The author instead pleads for breadth first mathematics. That is, don’t dive straight in to the algorithms but explain where it all comes from, what the concepts are, how they relate to each other. The article I posted a few days ago on an ebook on flying an airplane is a good example of how people can acquire math knowledge. The book assumes a basic understanding of physics. The concepts should trigger some memories of boring physics lessons in highschool. The author does a good job of explaining all the concepts relevant and before you know it you have a aqcuired some in depth knowledge on some crucial aerodynamics. The knowledge is immediately useful because it helps understand why the damn plane doesn’t drop out of the sky. It sticks too, as long the topic of keeping the plane up remains interesting to you.