Category: visualization

Fundamental Statistical Concepts in Presenting Data: Principles for Constructing Better Graphics

February 14th, 2009 — 2:11am

Via Andrew Gelman I came across this long paper (updated version) on statistical visualization by Rafe Donahue. I haven’t read it through carefully yet, but I enjoyed the examples of visualizations from his children’s schoolwork.

He criticizes boxplots, which caused a discussion in the comments to Andrew’s post. I read Tukey’s EDA recently and was surprised to see how much of Tukey’s work was focused on visualization by hand. The boxplot is a sensible visualization when you had to compute and plot manually. Using only 5 numbers it portrayed much of what was important about the data. However, now that plotting is cheap, it makes a lot more sense to just plot all the data.

In general, summaries, visual or otherwise, which assume a single mode, or worse normality, should be treated with a great deal of caution.

1 comment » | visualization

Exploratory Model Analysis

January 21st, 2009 — 4:17am

I’ve recently come across a few papers on Exploratory Model Analysis. I wasn’t familiar with this work when writing the EnsembleMatrix paper, but they are very closely related. I was working with a ML researcher while designing the EnsembleMatrix visual interface and so did quite of bit of looking around in the ML literature. EMA is emerging in statistics and so didn’t appear in my search.

Here are a few pointers:
Parallel coordinates for exploratory modelling analysis (Antony Unwin)
Exploratory modelling analysis: visualizing the value of variables (Antony Unwin)
Meifly: Models explored interactively (Hadley Wickham)

I tried installing meifly, but it appears to depend on ggplot which is no longer available since ggplot2 has been released.

[2004] Exploratory data analysis for complex models (with discussion) (Andrew Gelman)
Discussion of this paper by Andreas Buja (Andreas Buja)
Rejoinder to discussion (Andrew Gelman)

This discussion is quite inspiring. The idea that visualizations can be thought of a statistical tests was quite eye opening. I think that this suggests quite a few directions for research in InfoVis. However, there hasn’t been much work in this area in the 4 years since the paper came out. Why? Perhaps the artificial division between InfoVis and statistical visualization has kept it from being noticed. Perhaps it’s just very hard

1 comment » | visualization

Visualizing Obama’s voter contact operation

January 17th, 2009 — 4:59pm

Mark Blumenthal writes about new voter turnout information from the 2008 election. The following graph shows the level of voter contact from the Kerry and Obama campaigns (red=low to green=high). Obama had a broader voter contact operation spreading resources more effectively across those with a high probability of voting and voting Democrat.



  • swap the direction of the vertical axis to put high turnout on top
  • add scale numbers, how many contacts? how high is high turnout?
  • since the number of contacts is nonnegative, I would use a sequential (one-sided) color scale (running from white, 0, to green) rather than a diverging scale.
  • how many people fall into each bucket? An additional grayscale plot showing the distribution of people would be helpful. Or preferably, if possible, the axes could be transformed to make the distribution of individuals roughly uniform across the plot.

Comment » | visualization

First time designing a visualization

January 15th, 2009 — 10:34pm

The CHANCE contest submission below was my first time creating a complete static visualization that tries to tell a story. It’s sort of sad that I’m in my third year as a Ph.D. student studying visualization and I hadn’t done that yet.

I found it quite satisfying. Back in the olden days when I worked in rendering there was an immense amount of satisfaction that came from getting a rendering right–both visually and algorithmically. In visualization I hadn’t felt that yet, since all of my projects so far have been rather flaky research prototypes.

Over at FlowingData, Nathan is running a biweekly visualization competition/discussion. The first installment uses US poverty statistics. This’ll be a good chance for me to get more design experience.

Comment » | visualization

My submission to the CHANCE contest

January 15th, 2009 — 10:13pm


Contest description is here:

Comment » | visualization


January 25th, 2007 — 1:30am

I’ve been meaning to get back to TreeMaps for awhile. The recently unveiled Many-Eyes includes a TreeMap visualization. Martin Wattenberg (who did some research on TreeMap layouts) provides a typical example:

My main complaint with TreeMaps is their ugliness. Every TreeMap I have seen, including this one, look disorganized. Basically, we can only see the top level of the hierarchy, indicated with the strong dark lines and labeled. All other levels disappear into the patchwork mess. So one of the “strengths” of TreeMaps, the ability to directly view the hierarchy, is effectively neutered.

The layout also ignores good graphic design. Elements are not aligned and they do not visually cluster in meaningful ways. This is in contrast to indented lists, another common way to show hierarchical data, which use alignment and clustering very effectively to communicate the organization of the hierarchy.

More effective use of whitespace could dramatically improve the appearance of TreeMaps, but so few people use it well. Here‘s an example that does. Also notice how the alignment makes the diagram look very organized.

Next time: the travesty of Cushion TreeMaps.

1 comment » | visualization

Critiquing Treemaps

January 5th, 2007 — 11:52am

TreeMaps are a popular hierarchical visualization technique developed by Ben Shneiderman. Despite the emphasis they have received in visualization literature over the last decade, TreeMaps remain a very limited tool. In the next few days, I want to explore why this is true.

Comment » | visualization

Google Maps vs. Microsoft Live Maps

December 14th, 2006 — 7:32pm

Placing labels on maps is a hard problem. Although quite a bit of work has been done on how to place labels so that they don’t overlap, not much work has been done on how to place labels in an aesthetically pleasing manner. In an effort to understand what’s in production already I decided to make a brief comparison of Google Maps and Microsoft’s Live Maps.

The two maps compared are for the Redmond, Washington area where I lived for the past year. You can compare them yourself by opening Google’s and Microsoft’s versions of the same area. I was looking at them on a relatively large monitor, so you may need to scroll around a little bit to see the same things.

  1. Initial Impressions. Microsoft uses a subdued color scheme which reminds me of Eduard Imhof’s maps. The parks could be slightly greener, but otherwise it’s very nice. Google’s color scheme is more saturated and, to my eye, not as clean. On the other hand, Microsoft’s map is noticeably blurry, apparently caused by too much anti-aliasing. The small, unlabeled roads in Google are too strong, they distract from the rest of the map. The unlabeled roads in Microsoft are lighter which looks better. However, some of their labeled roads are also very light! It’s often unclear with which street the label is associated. As a general rule: if the street is labeled it should be dark, if not, it should be very light or not rendered at all.
  2. Label Selection. One of the most noticeable differences is the types of labels shown at this zoom level. Microsoft labels a lot of streets, but only one park out of dozens. Google’s map labels almost all the parks and only labels the largest streets. Park names are useful as landmarks, but it they can’t be nearly as useful as street names. Why waste screen space on them?
  3. Label Clutter. Since Microsoft shows so many street labels, it can suffer from unsightly cluttering, although labels never actually overlap. Notice how the meeting of “NE 85th St” and “116th Ave NE” at right angles looks awkward. Also the coincidence of the top of the ‘E’ in “124th Ave NE” with the edge of the freeway is distracting. Also notice the lack of any alignment and the seeming arbitrary top-to-bottom vs. bottom-to-top reading of the vertical text.
  4. Label Layout. Microsoft seems inconsistent in placing street labels adjacent to or directly over the road. Google always places the text over the street.
  5. Text Layout. In both maps, the street label text follows the path of the street. This leads to some very strange text, especially on Microsoft’s map. “Sammamish” is really one word. An obvious solution would be to smooth the path so this doesn’t happen. The rotation and baseline of a label’s text should change smoothly over the label. A related problem is that of character spacing. Notice that the ‘i’ has almost disappeared from the word “Sammamish”.
  6. Label Contrast. Both maps render a background behind the text to make the label stand out from the map. Google’s is a nearly opaque white border (or yellow, for main roads). Microsoft uses a much more subtle mask. Visually, I prefer Microsoft’s approach. However, I did notice that in many places Microsoft’s mask would obscure the road, making the label appear unassociated with anything on the map. Google neatly avoids this problem by making the text border the same color as the road (yellow), this provides more visual continuity to the road even when obscured by the label.
  7. Labeling Areas. Labeling of areas doesn’t work very well as you zoom into the area. For example, city labels remain even after you have zoomed so that the city fills up almost the entire screen. It appears that the city label is identifying a specific intersection, rather than an entire area. I’d be interested to see how traditional maps deal with this problem.

Comment » | visualization

Back to top