Skip to content


An Extension of Wilkinson’s Algorithm for Positioning Tick Labels on Axes

Here’s a preprint of our paper on selecting tick labels for axes which will appear in this year’s InfoVis! Source code of the implementation will be made available before the conference. We’re hoping to get this implemented in a number of common plotting libraries. I already have a partial matplotlib version working. I would also like to have one for ggplot. Other suggestions are welcome.

The abstract:

The non-data components of a visualization, such as axes and legends, can often be just as important as the data itself. They provide contextual information essential to interpreting the data. In this paper, we describe an automated system for choosing positions and labels for axis tick marks. Our system extends Wilkinson’s optimization-based labeling approach to create a more robust, full-featured axis labeler. We define an expanded space of axis labelings by automatically generating additional nice numbers as needed and by permitting the extreme labels to occur inside the data range. These changes provide flexibility in problematic cases, without degrading quality elsewhere. We also propose an additional optimization criterion, legibility, which allows us to simultaneously optimize over label formatting, font size, and orientation. To solve this revised optimization problem, we describe the optimization function and an efficient search algorithm. Finally, we compare our method to previous work using both quantitative and qualitative metrics. This paper is a good example of how ideas from automated graphic design can be applied to information visualization.

Posted in visualization.


Masten Xombie rocket engine restart in flight

Very cool video. (HT Instapundit)

Posted in Uncategorized.


Yosemite

I spent last week in Yosemite with my family on a vacation from research. We recently bought a DSLR and I enjoyed playing with it in the park.

Half Dome under a cloud. The weather was changing quite dramatically over the week and I got pictures under a number of different atmospheric conditions.

This is Royal Arch Fall. I hiked to this fall in the morning before my family got up. Royal Arch runs down over a long curving rock. From far away it’s quite beautiful. From the base, you can’t see much of the falls since it curves away out of sight.

On our way out of the park on the 120 we saw our third bear of the trip and I finally got a picture (click to make it large enough to see). The surroundings here were much more picturesque than either of the other bears.

This is cheating somewhat; we passed this field outside of the park, but I loved the combed look of the rows of hay. (I really like the dramatic aspect ratio of the picture. I cropped it this way to get rid of an ugly fence on the bottom and power lines on the top, but the result is quite pleasing I think.)

Posted in photos.


Happy Hollow

MonkeyLast week I went with the kids to Happy Hollow Park and Zoo in San Jose. One monkey, in particular, was very photogenic. We also had fun riding the “roller coaster” and sitting in line for 45 minutes for the dragon ride which goes past a number of small Disney-knockoff scenes and through a tunnel, which I was expecting to contain animatronics, but it was only a tunnel, plastered in green paint with strange colored egg-like things protruding out. Not sure what they were thinking when they built that. Next time we’ll visit on a week day.

Posted in photos.


Halloween

Our pumpkins this year.

Pumpkins carved as the Statue of Liberty and Starry Night.

Pumpkins carved as the Statue of Liberty and Starry Night.

Posted in photos.


Gallup-Healthways Well-Being

wellness1Will Wilkinson points to the Gallup-Healthways Well-Being Index which purports to measure overall health (“not only the absence of infirmity and disease, but also a state of physical, mental, and social well-being”) at the congressional district level for the United States. Will hypothesizes that Utah’s high score may be due to “a skoche of culture-driven upward inflation” (Mormons overstating their happiness).

Fortunately, the components of the Well-Being Index are reported as well. Two components, Life Evaluation and Emotional Health, measure self-reported happiness. If Will’s hypothesis were correct, we would expect these components to account for a disproportionate share of Utah’s overall index. In the scatterplots to the right, the three Utah congressional districts are highlighted in orange. Contrary to Will, Utah is above average only in the Work Quality component. On all the others, including Life Evaluation and Emotional Health, Utah is average or below average.

Wellness data in Excel, since I couldn’t figure out how to get it from the Gallup-Healthways site. The visualization was done in Tableau.

Posted in visualization.


our day, visualizing the American Time Use Survey

For Jeff’s class I created an interactive visualization of the American Time Use Survey. I got sick last week so didn’t have a lot of time to work on it. As a result it turned out somewhat derivative of the Baby Name Voyager and other stacked area plots.

That said, I think it lets you find some rather interesting patterns in how people use their time. Most noticeable is the extra hour or so that people sleep in on the weekends.

ourday

Posted in visualization.


Fundamental Statistical Concepts in Presenting Data: Principles for Constructing Better Graphics

Via Andrew Gelman I came across this long paper on statistical visualization by Rafe Donahue. I haven’t read it through carefully yet, but I enjoyed the examples of visualizations from his children’s schoolwork.

He criticizes boxplots, which caused a discussion in the comments to Andrew’s post. I read Tukey’s EDA recently and was surprised to see how much of Tukey’s work was focused on visualization by hand. The boxplot is a sensible visualization when you had to compute and plot manually. Using only 5 numbers it portrayed much of what was important about the data. However, now that plotting is cheap, it makes a lot more sense to just plot all the data.

In general, summaries, visual or otherwise, which assume a single mode, or worse normality, should be treated with a great deal of caution.

Posted in Uncategorized.


R packages

In class today we covered R packages. A quick try to create a package in Windows revealed that the Windows version of R does not come with the necessary build tools. I tried again on a Mac and ran into problems where package.skeleton failed to create the package directories since .find.package couldn’t find my newly created package. After a little playing around I found that package names cannot have an ‘_’ (at least on a Mac).

The R CMD CHECK command is very nice. It expands on the idea of static code checking to also check documentation, the install process, example code, etc.

Posted in Uncategorized.


Exploratory Model Analysis

I’ve recently come across a few papers on Exploratory Model Analysis. I wasn’t familiar with this work when writing the EnsembleMatrix paper, but they are very closely related. I was working with a ML researcher while designing the EnsembleMatrix visual interface and so did quite of bit of looking around in the ML literature. EMA is emerging in statistics and so didn’t appear in my search.

Here are a few pointers:
Parallel coordinates for exploratory modelling analysis (Antony Unwin)
Exploratory modelling analysis: visualizing the value of variables (Antony Unwin)
Meifly: Models explored interactively (Hadley Wickham)

I tried installing meifly, but it appears to depend on ggplot which is no longer available since ggplot2 has been released.

[2004] Exploratory data analysis for complex models (with discussion) (Andrew Gelman)
Discussion of this paper by Andreas Buja (Andreas Buja)
Rejoinder to discussion (Andrew Gelman)

This discussion is quite inspiring. The idea that visualizations can be thought of a statistical tests was quite eye opening. I think that this suggests quite a few directions for research in InfoVis. However, there hasn’t been much work in this area in the 4 years since the paper came out. Why? Perhaps the artificial division between InfoVis and statistical visualization has kept it from being noticed. Perhaps it’s just very hard

Posted in Uncategorized.