Asana – killer issue tracker

I recently discovered Asana through @larsfronius. I have had a rocky history with issue trackers and productivity tools in general. Whether it is Jira, Trac, Bugzilla, Trello, the Github, Bitbucket, and Gitlab issue trackers, text files, excel sheets, or post its. It just doesn’t work for me; it gets in the way and stuff just starts happening outside it. I’ve devolved to the point where I can’t read my own handwriting so anything involving paper, pens, crayons and what not is a complete non starter for me. Besides it doesn’t work if you have people working remotely. The combination of too much bureaucracy, bad UX, and a constant avalanche of things landing in my lap means I have a tendency to mostly not document what I’m doing, have done, or am planning to do. This is bad; I know.

Fundamentally I don’t plan my working weeks waterfall style in terms of tickets which I then pick up and do. In many ways, writing a ticket is often half doing the work since it triggers my reflex to solve any problem in near sight. It’s what engineers do. If you have ever tried to have a conversation with an engineer you know what I am talking about. You talk challenges; they talk solutions. Why don’t you just do X or Y? What about Z? It’s hard to separate planning from the execution. So, there’s a bit of history with me starting to create a ticket for something and realizing half way that actually just solving the problem takes less time, is more fun, and probably, a better use of my time and then doing that instead.

I work in a startup company where I’ve more or less labeled myself as chief plumber. This means I’m dealing with a wide variety of topics, all the time. That means I’m often dealing with three things already and somebody comes along with a fourth and a fifth. All of them urgent. All of them unplanned. We’ve tried dealing with it the traditional ways of imposing process, tools, bureaucracy, etc. But it always boils down to answering this question: what is the single most productive thing I can do that moves us forward and acknowledging that this is not a fixed thing that we set in stone for whatever sprint length is fashionable at the time but subject to change. Me hopping from task to task continuously means I don’t get anything done. Me only doing what seems nice, means I get the wrong things done. In a nutshell, this doesn’t scale and I need a decent issue tracking tool to solve it properly.

Since my memory is flaky and tends to hold only a handful of things, I tend to write down things that seem important but not urgent so that I can focus on what I was doing and then come back to it later. This process is highly fluid. Something comes along; I write it down. Then later I look at what I’ve written and edit a bit and once in a while I actually get around to doing stuff that I wrote down but mostly the list just grows and I pick off things that seem the most urgent. The best tool for this process is necessarily something brutally simple. The main goal is to be minimally disruptive to the actually productive thing I was doing when I got interrupted while still getting the job of taking note whatever I was interrupted for so that I don’t forget about it. So, for a long time a simple text editor was my tool of choice here. Alt tab, type, edit,type, ctrl+s, alt tab back to whatever I was doing. This is minimally intrusive. My planning process consists of moving lines around inside the file and editing them. This sounds as primitive as it is and it has many drawbacks; especially in teams. But it beats having to deal with Jira’s convoluted UI or hunting for the right button in a web ui to find stuff across the dozen or so Github and Gitlab projects I work on. However, using a text editor doesn’t scale and I need a decent issue tracking tool to solve it properly.

Enter Asana. As you can probably imagine, I came to this tool with healthy bias of basically all previous tools that I’ve tried over the past decades not coming close to my preferred but imperfect tool: the text file. My first impression of this tool was wrong. The design and my bias lead me to believe that this was another convoluted, over-engineered issue tracker. It took me five minutes of using it before I got how wrong I was.

The biggest hurdle was actually migrating the hundred or so issues I was tracking. Or so I thought. I was not looking forward to clicking new, edit, ok etc. a hundred times, which I assumed would be the case because that is how basically all issue trackers I’ve worked with so far work. So, I had been putting off that job. It turns out Asana does not work that way: copy 100 lines of text, paste, job done. So, one minute into using it I had already migrated everything I had in my text editor. I was impressed by that.

Asana is a list of stuff where you can do all the things that you would expect to do in a decent UI for that. You can paste lines of text and each line becomes an issue. You can drag lines around to change the order. Organize them using sections, tags, and projects. You can multi select lines using similar mouse and keyboard commands to what you would use in say a spreadsheet and manipulate issues that way. Unlike every other issue tracker, the check box in the UI actually is there to allow you to mark things as done and not for selecting stuff. Instead CMD+a, SHIFT+click, or CMD+click selects issues and then clicking e.g. the tag field does what you’d expect. Typing @ triggers the autocomplete and you can easily refer things (people, issues, projects, etc.) by name. There are no ticket numbers in the UI but each line has a unique url of course. Editing the line updates all the @ references to that issue. There are no modal dialogues or editing screens that hijack the screen. Instead Asana has a list and a detail pane that sit side by side. Click any line and the pane updates and you do your edits there. Multi select some lines and anything you do in the pane happens to the selected issues. There are no save, OK, submit, or other buttons that add unnecessary levels of indirection. Just clicking in the field and typing is enough.

Asana is the first actually usable issue tracker that I’ve come across. I’ve had multiple occasions where I found that Asana actually works as I would want it to. As in, I wonder what happens if I press CMD+z. It actually undid what I just did. I wonder what happens if I do that again. WTF, that works as well! Multi level undo; in a web app. OK, lets paste CMD+X and CMD+C some issues between asana projects. Boom, 100 issues just moved. Of course you can also CMD+A and drag selected issues to another asana project. I wonder if I can assign them to multiple projects. Yes you can, just hit the big + button. This thing just completely fixed the UX around issue tracking for me. All the advantages of a text file combined with all the advantages of a proper issue tracker. Creating multiple issues is as simple as type, enter, type another one, enter, etc. Organizing them is a breeze. It’s like a text editor but backed by a proper issue tracker. This UI wipes out 20 years of forms based web UX madness and it is refreshing. We’ve been using it for nearly two months at Inbot and are loving it.

So, if you are stuck using something more primitive and are are hating it: give Asana a try and you might like it as well.

How to rename an index in Elasticsearch

I’ve found that Elasticsearch on startup fixes index names to reflect the directory name, which is nice.

This is useful if you want to for example change the logstash index mapping template and don’t want to lose all the data indexed so far and going through a lengthy reindex process or wait until midnight for the index to roll over.

So, this actually works:

  • configure the new index template in logstash
  • shut down cluster
  • rename todays logstash index directory to logstash-2015.03.03_beforenoon
  • restart cluster and elasticsearch figures out that logstash-2015.03.03_beforenoon probably should be opened as logstash-2015.03.03_beforenoon; logstash will notice the missing index for today and fix it with the new template

Nice & almost what I want but I was wondering if I can do the same without shutting down my cluster and restarting it, which kind of a disruptive thing to do in most real environments. After a bit of experimenting, I found that the following works:

PUT /_cluster/settings
{
    "transient" : {
        "discovery.zen.minimum_master_nodes" : 1
    }
}

The actual settings don’t matter, as long as you have something there, any PUT to the settings will basically cause elasticsearch to reload the cluster.

Update. You may want to not do this on a index that is being updated (like typically an active logstash index) since this duplicates lock files that elasticsearch uses. I ended up removing these lock files in my index copy after which it stopped barfing errors about the duplicated lock files. But probably not nice. So probably better is to

  • mv logstash-2015.03.03 logstash-2015.03.03_moved
  • clear out any write.lock files inside the new 2015.03.03_moved dir
  • do the PUT to /_cluster/settings

Elasticsearch failed shard recovery

We have a single node test server with some useful data in there. After a unplanned reboot of the server, elasticsearch failed to recover one shard in our cluster and as a consequence the cluster went red, which means it doesn’t work until you fix it. Kind of not nice. If this was production, I’d be planning an extensive post mortem (how did it happen) and doing some kind of restore from a backup probably. However, this was a test environment. Which meant an opportunity to figure out if the problem can actually be fixed somehow.

I spent nearly two hours to figure out how to recover from this in a way that does not inolve going “ahhh whatever” and deleting the index in question. Been there done that. I suspect, I’m not the only one to get stuck in the maze of half truths, well intentioned but incorrect advice, etc. So, I decided to document the fix I pieced together since I have a hunch this won’t be the last time I have to do this.

This is one topic where the elasticsearch documentation is of little help. It vaguely suggests that this shouldn’t happen that red is a bad color to see in your cluster status. It also provides you plenty of ways to figure out that, yes, your cluster isn’t working and why in excruciating levels of detail. However, very few ways of actually recovering beyond a simple delete and restore backup are documented.

However, you can actually fix things sometimes and I was able to piece together something that works with a few hours of googling.

Step 0 – diagnose the problem

This mainly involves figuring out which shard(s) are the problem. So:

# check cluster status
curl localhost:9200/_cluster/health
# figure out which indices are in trouble
curl 'localhost:9200/_cluster/health?level=indices&pretty'
# figure out what shard is the problem
curl localhost:9200/_cat/shards

I can never remember these curl incantations so nice to have them in one place. Also, poke around in the log. Look for any errors when elasticsearch restarts.

In my case it was pretty clear about the fact that due to some obscure exception involving a “type not found [0]” it couldn’t start shard 2 in my inbot_activities_v29 index. I vaguely recall from a previous episode where I unceremoniously deleted the index and moved on with my life that the problem is probably related to some index format change in between elasticsearch updates some time ago. Doesn’t really matter: we know that somehow that shard is not happy.

Diagnosis: Elasticsearch is not starting because there is some kind of corruption with shard 2 in index inbot_activities_v29. Because of that the whole cluster is marked as red and nothing works. This is annoying and I want this problem to go away fast.

Btw. I also tried the _recovery API but it seems to lack an option to actuall recover anything. ALso, it seems to not list any information for those shards that failed to recover. In my case it listed the four other shards in the index that were indeed fine.

Step 1 – org.apache.lucene.index.CheckIndex to the rescue

We diagnosed the problem. Red index. Corrupted shard. No backups. Now what?

Ok, technically you are looking at data loss at this point. The question is how much data you are going to lose. Your last resort is deleting the affected index. Not great, but it at least gets the rest of the cluster green.

Say you don’t actually care about the 1 or 2 documents in the index that are blocking the shard from loading? Is there a way to recover the shard and nurse the broken cluster back to a working state minus those apparently corrupted documents? That might be a preferable approach to simply deleting the whole index.

The answer is yes. Lucene comes with a tool to fix corrupted indices. It’s not well integrated into elasticsearch. There’s an open ticket in elasticsearch that may involve addressing this. In any case, you can run this tool manually.

Assuming a centos based rpm install:

# OK last warning: you will probably lose data. Don't do this if you can't risk that.

# this is where the rpm dumped all the lucene jars
cd /usr/share/elasticsearch/lib

# run the tool. You may want to adapt the shard path 
java -cp lucene-core*.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /opt/elasticsearch-data/linko_elasticsearch/nodes/0/indices/inbot_activities_v29/2/index/ -fix

The tool displays some warnings about what it is about to do, and if you are lucky reports that it fixed some issues and wrote some segment. Run the tool again and it mentions everything is fine. Excellent.

Step 2 – Convincing elasticsearch everything is fine

Except, elasticsearch is still red. Restarting it doesn’t help. It stays red. This one took me a bit longer to figure out. It turns out that all those well intentioned blogposts that mention the lucene CheckIndex tool sort of leave the rest of the process as an excercise to the reader. There’s a bit more to it:

# go to wherever the translog of your problem shard is
cd /opt/elasticsearch-data/linko_elasticsearch/nodes/0/indices/inbot_activities_v29/2/translog
ls
# note the recovery file; now would be a good time to make a backup of this file because we will remove it
sudo service elasticsearch stop
rm *recovery
sudo service elasticsearch start

After this, elasticsearch came back green for me (see step 0 for checking that). I lost a single document in the process. Very acceptable given the alternative of having to delete the entire index.

Enforcing code conventions in Java

After many years of working with Java, I finally got around to enforcing code conventions in our project. The problem with code conventions is not agreeing on them (actually this is hard since everybody seems to have their own preferences but that’s beside the point) but enforcing them. For the purpose of enforcing conventions you can choose from a wide variety of code checkers such as checkstyle, pmd, and others. My problem with this approach is that checkers usually end up being a combination of too strict, too verbose, or too annoying. In any case nobody ever checks their output and you need to have the discipline to fix things yourself for any issues detected. Most projects I’ve tried checkstyle on, it finds thousands of stupid issues using the out of the box configuration. Pretty much every Java project I’ve ever been involved with had somewhat vague guidelines on code conventions and a very loose attitude to enforcing these. So, you end up with loads of variation in whitespace, bracket placement, etc. Eventually people stop caring. It’s not a problem worthy of a lot of brain cycles and we are all busy.

Anyway, I finally found a solution to this problem that is completely unintrusive: format source code as part of your build. Simply add the following blurb to your maven build section and save some formatter settings in XML format in your source tree. It won’t fix all your issues but formatting related diffs should be a thing of the past. Either your code is fine, in which case it will pass the formatter unmodified or you messed up, in which case the formatter will fix it for you.

<plugin><!-- mvn java-formatter:format -->
    <groupId>com.googlecode.maven-java-formatter-plugin</groupId>
    <artifactId>maven-java-formatter-plugin</artifactId>
    <version>0.4</version>
    <configuration>
        <configFile>${project.basedir}/inbot-formatter.xml</configFile>
    </configuration>

    <executions>
        <execution>
            <goals>
                <goal>format</goal>
            </goals>
        </execution>
    </executions>
</plugin>

This plugin formats the code using the specified formatting settings XML file and it executes every build before compilation. You can create the settings file by exporting the Eclipse code formatter settings. Intellij users can use these settings as well since recent versions support the eclipse formatter settings file format. The only thing you need to take care off is the organize imports settings in both IDEs. Eclipse comes with a default configuration that is very different from what Intellij does and it is a bit of a pain to fix on the Intellij side. Eclipse has a notion of import groups that are each sorted alphabetically. It comes with four of these groups that represent imports with different prefixes so, javax.* and java.*, etc. are different groups. This behavior is very tedious to emulate in Intellij and out of the scope of the exported formatter settings. For that reason, you may want to consider modifying things on the Eclipse side and simply remove all groups and simply sort all imports alphabetically. This behavior is easy to emulate on Intellij and you can configure both IDEs to organize imports on save, which is good practice. Also, make sure to not allow .* imports and only import what you actually use (why load classes you don’t need?). If everybody does this, the only people causing problems will be those with poorly configured IDEs and their code will get fixed automatically over time.

Anyone doing a mvn clean install to build the project will automatically fix any formatting issues that they or others introduced. Also, the formatter can be configured conservatively and if you set it up right, it won’t mess up things like manually added new lines and other manual formatting that you typically want to keep. But it will fix the small issues like using the right number of spaces (or tabs, depending on your preferences), having whitespace around brackets, braces, etc. The best part: it only adds about 1 second to your build time. So, you can set this up and it basically just works in a way that is completely unintrusive.

Compliance problems introduced by people with poor IDE configuration skills/a relaxed attitude to code conventions (you know who you are) will automatically get fixed this way. Win win. There’s always the odd developer out there who insists on using vi, emacs, notepad, or something similarly archaic that most IDE users would consider cruel and unusual punishment. Not a problem anymore, let them. These masochists will notice that whatever they think is correctly formatted Java might cause the build to create a few diffs on their edits. Ideally, this happens before they commit. And if not, you can yell at them for committing untested code: no excuses for not building your project before a commit.

Accessing Elasticsearch clusters via a localhost node

I’m a regular at the Elasticsearch meetup here in Berlin and there are always lots of recent converts that are trying to wrap their head around the ins and outs of what it means to run an elasticsearch cluster. One issue that seems to baffle a lot of new users is the question of which node in the cluster has the master role. The correct answer is that it depends on what you mean by master. Yes, there is a master node in elasticsearch but that does not mean what you think it means: it merely means that a single node is elected to be the node that holds the truth about which nodes have which data and crucially where the master copies of shards live. What it does NOT mean is that that node has the master copy of all the data in the cluster. It also does NOT mean that you have to talk to specifically this node when writing data. Data in elasticsearch is sharded and replicated and shards and their replicas are copied all over the cluster and out of the box clients can talk to any of the nodes for both read and write traffic. You can literally put a load balancer in front of your cluster and round robin all the requests across all the nodes.

When nodes go down or are added to the cluster, shards may be moved around. All nodes synchronize information about which nodes have which shards and replicas of those shards. The elasticsearch master merely is the ultimate authority on this information. Elasticsearch masters are elected at runtime by the nodes in the cluster. So, by default, any of the nodes in the cluster can become elected as the master. By default, all nodes know how to look up information about which shards live where and know how to route requests around in the cluster.

A common pattern in larger cluster is to reserve the master role for nodes that do not have any data. You can specialize what nodes do via configuration. Having three or more such nodes means that if one of them goes down, the remaining ones can elect a new master and the rest of the cluster can just continue spinning. Having an odd number of nodes is a good thing when you are holding elections since you always have an obvious majority of n/2 + 1. With an even number you can end up with two equally sized network partitions.

The advantage of not having data on a node is that it is far less likely for such nodes to get into trouble with e.g. OutOfMemoryExceptions, excessive IO that slows the machine, or excessive CPU usage due to expensive queries. If that happens, the availability of the node becomes an issue and the risk emerges for bad things to happen. This is a bad thing on a node that is supposed to hold the master data for your cluster configuration. It becoming unavailable will cause other nodes to elect a new node as the maste. There’s a fine line between being unavailable and slow to respond, which makes this a particularly hard problem. The now, infamous call me maybe article highlights several different cluster failure scenarios abd most of these involve some sort of network partioning due to temporary master node failures or unavailability. If you are worried about this, also be sure to read the Elasticsearch response to this article. The botton line is that most of the issues have by now been addressed and are now far less likely to become an issue. Also, if you have declined to update to Elasticsearch 1.4.x with your production setup, now might be a good time to read up on the many known ways in which things can go bad for you.

In any case, empty nodes still do useful work. They can for example be used to serve traffic to elasticsearch clients. Most things that happen in Elasticsearch involve internal node communication since the data can be anywhere in the cluster. So, there are typically two or more network hops involved one from the client to what is usually called a routing node and from there to any other nodes that hold shards needed to complete the request that perform the logic for either writing new data to the shard or retrieving data from the shard.

Another common pattern in the Elasticsearch world is to implements clients in Java and make the embedd a cluster node inside the process. This embedded node is typically configured to be a routing only node. The big advantage of this is that it saves you from having to do one network hop. The embedded node already knows where all the shards live so application servers with an embedded node already know where all the shards are in the cluster and can talk directly to the nodes with these shards using the more efficient network protocol that the Elasticsearch nodes use to communicate with each other.

A few months ago in one of the meetups I was discussing this topic with one of the organizers of the meetup, Felix Gilcher. He mentioned an interesting variant of this pattern. Embedding a node inside an application only works for Java nodes and this is not possible if you use something else. Besides, dealing with the Elasticsearch internal API can be quite a challenge as well. So it would be convenient if non Java applications could get similar benefits. Then he suggested the obvious solution that actually you get most of the same benefits of embedding a node simply running a standalone, routing only elasticsearch node on each application server. The advantage of this approach is that each of the application servers communicates with elasticsearch via localhost, which is a lot faster than sending REST requests over the network. You still have a bit of overhead related to serializing and deserializing requests and doing the REST requests. However, all of that happens on localhost and you avoid the network hop. So, effectively, you get most of the benefits of the embedded node approach.

We recently implemented this at Inbot. We now have a cluster of three elasticsearch nodes and two application servers that each run two additional nodes that talk to the three nodes. We use a mix of Java, Javascript and ruby components on our server and doing this allows us to keep things simple. The eleasticsearch nodes on the application server have a comparatively small heap of only 1GB and typically consume few resources. We could probably reduce the heap size a bit further to 512MB or even 256MB since all these nodes do is pass around requests and data from the cluster to the application server. However, we have plenty of memory and have so far had little need to tune this. Meanwhile, our elasticsearch cluster nodes run on three fast 32GB machines and we allocate half of the memory for heap and reserve the rest for file caching (as per the Elasticsearch recommendations). This works great and it also simplifies application configuration since you can simply configure all applications to talk to localhost and elasticsearch takes care of the cluster management.

Eventual Consistency Now! using Elasticsearch and Redis

Elasticsearch promises real-time search and nearly delivers on this promise. The problem with ‘nearly; is that in interactive systems, it is actually unacceptable to not have user changes reflect in the any query results. Eventual consistency is nice but it also means occasionally being inconsistent, which is not so nice for users, or worse, product managers, who typically don’t understand these things and report them as bugs. At Inbot, this aspect of using Elasticsearch has been keeping us busy. It would be awfully convenient if it never returned stale data.

Mostly things actually work fine but when a user updates something and then within a second navigates back to a list of stuff that includes what he/she just updated, chances are that it still has the old version because elasticsearch has not yet committed the change to the index. In any interactive system this is going to be a an issue and one way or another, a solution is needed. The reality is that elasticsearch is an eventually consistent cluster when it comes to search and not a proper transactional store that is immediately consistent after modifications. And while it is reasonably good at catching up in a second, that leaves plenty of room for inconsistencies to surface. While you can immediately get any changed document, it actually takes a bit of time for search results to get updated as well. Out of the box, the commit frequency is once every second, which is enough time for a user to click something and then something else and see results that are inconsistent with actions he/she just performed.

We started addressing this with a few client side hacks like simply replacing list results with what we just edited via the API, updating local caches, etc. Writing such code is error prone and tedious. So we came up with a better solution: use Redis. The same DAO I described in my recent article on optimistic locking with elasticsearch also stores the id of any modified documents in a shortlived data structure in Redis. Redis provides in memory data structures such as lists, sets, and hash maps and comes with a ton of options. The nice thing about Redis is that it scales quite well for small things and has a very low latency API. So, it is quite cheap to use it for things like caching.

So, our idea was very simple: use Redis to keep track of recently changed documents and change any results that include these objects on the fly with the latest version of the object. The bit of Java code that we use to talk to Redis uses something called JedisPool. However, this should pretty much work in a similar way from other languages.

try(Jedis jedis = jedisPool.getResource()) {
  Transaction transaction = jedis.multi();
  transaction.lpush(key, value);
  transaction.ltrim(key, 0, capacity); // right away trim to capacity so that we can pretend it is a circular list
  transaction.expire(key, expireInSeconds); // don't keep the data forever
  transaction.exec();
}

This creates a circular list with a fixed length that expires after a few seconds. We use it to store the ids of any document ids we modify for a particular index or belonging to a particular user. Using this, we can easily find out when returning results from our API whether we should replace some of the results with newer versions. Having the list expire after a few seconds means that it is enough for elasticsearch to catch up and the list will stay short or will not be there at all. Under continuous load, it will simply be trimmed to the latest ids that were added (capacity). So, it stays fast as well.

Each of our DAOs exposes an additional function that tells you which document ids have been recently modified. When returning results, we loop over the results and check the id against this list and swap in the latest version. Simple, easy to implement, and it solves most of the problem and more importantly, it solves it on the server and does not burden our API users or clients with this.

However, It doesn’t fix the problem completely. Your query may match the old document but not the new document and replacing the old document with the new document in the results will make it appear that the changed document actually still matches the query. But it is a lot better than showing stale data to the user. Also, we’re not handling deletes currently but that is trivially supported with a similar solution.

Optimistic locking for updates in Elasticsearch

In a post in 2012, I expanded a bit on the virtues of using elasticsearch as a document store, as opposed to using a separate database. To my surprise, I still get hits on that article on a daily basis. This indicates that there is some interest in using elasticsearch as described there. So, I’m planning to start blogging a bit more again after more or less being too busy with building Inbot to do so since last February.

Continue reading