Java Profiling

One of the fun aspects of being in a programmer job is the constant stream of little technical problems that require digging into. This can sometimes be frustrating but it’s pretty cool if you suddenly get it and make the problem go away. Anyway, since starting in my new job in February, I’ve had lots of fun like this. Last week we had a bit of Java that was obviously out of line performance wise. My initial go at the problem was to focus on the part that had been annoying me to begin with: the way xml parsing was handled. There’s many ways to do XML parsing in Java. We use Jaxb. Jaxb is nice if you don’t have enough time to do the job properly with XPath but the trade off is that it can be slow and that there are a few gotchas like for example creating marshallers and unmarshallers is way more expensive than actually using them. So when processing a shitload of XML files, you spent a lot of time creating and destroying marshallers. Especially if you break down the big xml files into little blobs that are parsed individually. Some simple pooling using ThreadLocal improved things quite a bit but it was still slow in a way that I could not explain with just xml parsing. All helpful but it still felt unreasonably slow in one particular class.

So I spent two days setting up a profiler to measure what was going on. Two days? Shouldn’t this be easy? Yes, except there’s a few gotchas.

  1. The Eclipse TPTP project has a nice profiler. Except it doesn’t work with macs, or worse, macs with jdk1.6. That’s really an eclipse problem, the UI is tied to 1.5 due to Apple stopping to support of Cocoa integration in 1.6.
  2. So I fired up vmware, installed the latest Ubuntu 9.04 (nice), spent several hours making that behave nicely (file sharing is broken and needs a patch). Sadly no OpenGL eyecandy in vmware.
  3. Then I installed Java, eclipse, TPTP, and some other stuff
  4. Only to find out that TPTP and JDK 1.6 is basically unusable. First, it comes with some native library compiled against a library that no longer is used. Solution: install it.
  5. Then every turn you take there’s some error about agent controllers. If you search for this you will find plenty of advice telling you to use the right controller but none whatsoever as to how you would go about doing so. Alternatively people tell you to just not use jdk 1.6 I know because I spent several hours before joining the gang of “TPTP just doesn’t work, use netbeans for profiling”.
  6. So, still in ubuntu, I installed Netbeans 6.5, imported my eclipse projects (generated using maven eclipse:eclipse) and to my surprise this actually worked fine (no errors, tests seem to run).
  7. Great so I right clicked a test. and chose “profile file”. Success! After some fiddling with the UI (quite nerdy and full of usability issues) I managed to get exactly what I wanted
  8. Great! So I exit vmware to install Netbeans properly on my mac. Figuring out how to run with JDK 1.6 turned out to be easy.
  9. Since I had used vmware file sharing, all the project files were still there so importing was easy.
  10. I fired up the profiler and it had remembered the settings I last used in linux. Cool.
  11. Then netbeans crashed. Poof! Window gone.
  12. That took some more fiddling to fix. After checking the release notes it indeed mentioned two cases of profiling and crashes which you can fix with some commandline options.
  13. After doing that, I managed to finally get down to analyzing what the hell was going on. It turned out that my little test was somehow triggering 4.5 million calls to String.replaceAll. WTF!
  14. The nice thing with inheriting code that has been around for some time is that you tend to ignore those parts that look ugly and don’t seem to be in need of your immediate attention. This was one of those parts.
  15. Using replaceAll is a huge code smell. Using it in a tripple nested for loop is insane.
  16. So some more pooling, this time of the regular expression objects. Pattern.compile is expensive.
  17. I re-ran the profiler and … problem gone. XML parsing now is the bottleneck as it should be in code like this.

But, shouldn’t this just be easy? It took me two days of running from one problem to the next just to get a profiler running. I had to deal with crashing virtual machines, missing libraries, cryptic error messages about Agent Controllers, and several unrelated issues. I hope somebody in the TPTP project reads this: your stuff is unusable. If there’s a magic combination of settings that makes this shit work as it should: I missed it, your documentation was useless, the most useful suggestion I found was to not use TPTP. No I don’t want to fiddle with cryptic vm commandline parameters, manually compiling C shit, fiddle with well hidden settings pages, etc. All I wanted was right click, profile.

So am I now a Netbeans user? No way! I can’t stand how tedious it is for coding. Run profiler in Netbeans, go ah, alt tab to eclipse and fix it. Works for me.

Java & Toys

After a few months of doing python development, which to me still feels like a straight jacket. I had some Java coding to do last week and promptly wasted a few hours checking out the latest toys, being:

  • Eclipse 3.4 M7
  • Hudson
  • Findbugs for Hudson

Eclipse 3.4 M7 is the first milestone I’ve tried for the upcoming Eclipse release. This is due to me not coding Java much lately; nothing wrong with it otherwise. Normally, I’d probably have switched around M4 already (at least did so for 3.2 and 3.3 cycles). In fact it is a great improvement and several nice productivity enhancements are included. My favorite one is the problem hover that now includes links to quick fixes. So instead of point, click, typing ctrl+0, arrow down (1..*), enter, you can now lean back and point & hover + click. Brilliant. I promptly used it to convert some 1.4 non generics based code into nice generics based code simply by tackling all the generics related warnings one by one essentially only touching the keyboard to suggest a few type parameters Eclipse couldn’t figure out. Introduce generic type parameter + infer generics refactorings are very helpful here. The code of course compiled and executed as expected. No bugs introduced and the test suite still runs fine. Around 5000 lines of code refactored in under 20 minutes. I still have some work to do to remove some redundant casts and to replace some while/for loops with foreach.

Other nice features are the new breadcrumps bar (brilliant!) and a new refactoring to create parameter classes for overly long lists of parameters on methods. Also nice is a refactoring to concatenate String concatenation into StringBuffer.append calls. Although StringBuilder is slightly faster for cases where you don’t need thread safe code (i.e most of the time). The rest is the usual amount of major and minor refinements that I care less about but are nice to have around anyway. One I imagine I might end up using a lot is quickfixes to sort out osgi bundle dependencies. You might recall me complaining about this some time ago. Also be sure to read Peter Kriens reply to this btw. Bnd is indeed nice but tools don’t solve what is in my view a kludge. Both the new eclipse feature and BND are workarounds for the problem that what OSGI is trying to do does not exist at  (and is somewhat at odd with) the Java type level.

Anyway, the second thing I looked into was Hudson, a nice server for continuous integration. It can checkout anything from a wide range of version control systems (subversion supported out of the box, several others through plugins) and run any script you like. It also understands maven and how to launch ant. With the right plugins you can then let it do quite useful things like compiling, running static code analyzers, deploying to a staging server, running test suites, etc. Unlike some stuff I evaluated a few years this actually worked right out of the box and was so easy to set up that I promptly did so for the project I’m working on. Together with loads of plugins that add all sorts of cool functionality, you have just ran out of excuses to not do continuous integration.

One of the plugins I’ve installed so far is an old favorite Findbugs which promptly drew my attention to two potentially dangerous bugs and a minor performance bug in my code reminding me that running this and making sure it doesn’t complain is actually quite important. Of all code checkers, findbugs provides the best mix between finding loads of stuff while not being obnoxious about it without a lot of configuration (like e.g. checkstyle and pmd require to shut the fuck up about stupid stuff I don’t care about) and while actually finding stuff that needs fixing.

While of course Java centric, you can teach Hudson other tricks as well.  So, next on my agenda is creating a job for our python code and hooking that up to pylint and possibly our django unit tests. There’s plugins around for both tasks.

stuff gets released

Lots of stuff has been released or is about to be released. Enough to warrant a little blog post about this stuff.

Open Office 2.0

The 2.0 version is a nice improvement over 1.1. OOo 1.1 sucked IMHO but 2.0 might convince me to actually use it. If only they fixed the bugs I reported four years ago on crossreferences (not implemented properly). Without fixes for that, I can’t write large, structured content in it (i.e. scientific articles). But still, quite an improvement. Importing of office stuff now actually works. I managed to import and save an important spreadsheet at work and removed about 9 MB of redundant data in the process (no idea where this came from), which makes working with the file over the network a lot less frustrating. Also it seems to actually be able to work with word documents without seriously messing up layout and internal structure (and its a lot faster on large documents). In short, compatibility now works more or less as advertised for the past four years (1.1 didn’t, even for trivial stuff). It’s still quite ugly though and lots of usability challenges remain unaddressed. Looking cool is not a product feature, nor is blending in with your OS. It remains the poormans alternative to MS Office.

Update. It looks like I was wrong about not messing up word documents. I did some roundtrip editing on a document written in word and OOo thoroughly messed it up. It turns out it doesn’t handle documents with adjusted page settings. It applied the page settings for the title page to the whole document. As a concequence it looks like shit, all the headers and footers are in the wrong place. It’s a lot of work to fix it too.

Maven 2.0

I spent some time with a release candidate and decided not to use it. The reasons were a mix of poor documentation and a dislike of the structure it tries to enforce on everything you do. I’m pretty sure the ideas behind it are ok but it just doesn’t feel right yet. In short it didn’t pass the fifteen minute test they put on their website: the documentation keeps telling you how beautiful and useful maven is without actually telling you anything about how it works. Some crucial things are lacking like explaining how these dependencies actually work, where the repository where it magically pulls all these jar files from is, how to set up your own repository, etc.

In the end I prefer the more verbose nature of ant. I have a lot of experience writing ant build files now. I’ve even written a few ant tasks at work. I happen to both like and need its flexibility a lot. I don’t see how maven solves any of the more non trivial stuff I do it (other than allowing me to use ant).

The assumptions maven is based on are IMHO incorrect. First of all it is tool centric, if you don’t structure your projects the way it likes you’ll have lots of trouble trying to get it to do anything useful (that means it won’t be used where I work now or any other place that has an existing, complex project). Secondly it solves a lot of easy stuff that is not really a problem with ant and not much else. Compiling, generating javadoc, etc. is not that hard with ant. In fact, most of the time I reuse the same tasks for that (by importing it). And, finally, maven just adds complexity. I find maven projects hideously complicated in their structure. I’ve seen quite a few maven projects and they all spread their source code over numerous modules in nested directories. I don’t want to structure my projects like that. But the most important thing is that maven doesn’t actually solve any problem I have.

Mysql 5.0

Nice to finally see this arrive. I expect this to have some consequences for the use of commercial databases in the next few years. At work our customers still prefer commercial stuff like oracle or mssql. Increasingly this has more to do with irrationality than actual features that are actually used. Performance certainly has little to do with it. Nor does scalability. Our webapp is a few dozen simple tables with some optional stored procedures. The latter are what have kept us from fully supporting mysql though arguably they are not required in our app.

Firefox 1.5 RC1
The release candidate should be ready right about now or very soon anyway. Beta2 has worked flawlessly here, as did Beta1. See my earlier review of the beta for more details.