Here’s a preprint of our paper on selecting tick labels for axes which will appear in this year’s InfoVis! Source code of the implementation will be made available before the conference. We’re hoping to get this implemented in a number of common plotting libraries. I already have a partial matplotlib version working. I would also like to have one for ggplot. Other suggestions are welcome.

The abstract:

The non-data components of a visualization, such as axes and legends, can often be just as important as the data itself. They provide contextual information essential to interpreting the data. In this paper, we describe an automated system for choosing positions and labels for axis tick marks. Our system extends Wilkinson’s optimization-based labeling approach to create a more robust, full-featured axis labeler. We define an expanded space of axis labelings by automatically generating additional nice numbers as needed and by permitting the extreme labels to occur inside the data range. These changes provide flexibility in problematic cases, without degrading quality elsewhere. We also propose an additional optimization criterion, legibility, which allows us to simultaneously optimize over label formatting, font size, and orientation. To solve this revised optimization problem, we describe the optimization function and an efficient search algorithm. Finally, we compare our method to previous work using both quantitative and qualitative metrics. This paper is a good example of how ideas from automated graphic design can be applied to information visualization.

**Update**: We’ve released a preliminary R package implementing the three labeling algorithms we compared in the paper. Feedback is appreciated. The final version should be released by InfoVis (in October).

Hadley WickhamThe R graphics don’t look quite right – they’re missing the standard 4% offset between the data and the axes.

I also wonder if you could make the ticks more aesthetically pleasing by making the space between the first/last tick and the axis more symmetric.

Justin TalbotPost authorYou’re right that more white space would look better. This was an issue that I ran out of time and space to address correctly. Basically, it’s not sufficient to just add 4% whitespace to the resulting axis because the optimization function might have found a better solution had it known that it had more space to work in. On the other hand, you can’t just pad the white space up front, because you want the ending ticks to be close to the actual data extremes, not to the padded extremes. To address this, our coverage component should probably be split into two parts, one evaluating white space and the other evaluating closeness to the extrema. Good future work.

I’m not exactly sure what you mean by the space between the first/last tick and the axis. Do you mean the space between the first and last tick and the respective end of the axis *line*? If so, that would be a good change (another adjustment to the coverage criteria) and should be easy for plots with lots of labels. However, for plots with few labels, my guess is that this would introduce a lot of extra white space.

MySchizoBuddyWas the final version released by InfoVis?

where is the link?

Justin TalbotPost authorThe current R version is available via CRAN or R-Forge.