Skip to content


An Empirical Model of Slope Ratio Comparisons (Corrected)

I’ve posted a corrected version* of our InfoVis paper from last year: An Empirical Model of Slope Ratio Comparisons. In preparing the published version of the paper, we made a change in the parameterization of our space of slope comparisons to simplify the explanation of what we did. In doing this, I made a simple math error that resulted in us using the wrong mid-angles in our analysis. To see the difference, compare Figure 2 in the original and in the updated versions. The impact of the error is minor and doesn’t change our arguments or conclusions, but it required regenerating our plots and it slightly changed our model parameter estimates.

I’ve also posted R code which will reproduce our (corrected) analysis and figures. Along with the stimuli we released earlier, this should allow anyone to reproduce our analysis.

(*The irony of having to correct our paper which itself attempts to correct Cleveland’s earlier paper was not lost on me.)

Posted in visualization.


CS148

This summer I taught CS 148, the introduction to computer graphics course at Stanford. This was the first time I taught the class and, despite being incredibly time consuming, I really enjoyed it. I created all my own slides and new projects for the course.

The final project was to combine a path traced object with a real photograph—creating a “special effects” style image. I thought this was quite successful. The winning image was created by Zhan Fam Quek:

This was created using a path tracer written by the students, using a HDR environment map and HDR background image captured by the students.

More images from the final project can be found in the course gallery. Slides and assignments can be found on the course webpage.

Posted in rendering.


LaTeX Hyphenation Mist-akes*

One of the great things about TeX is that it will automatically hyphenate words when doing so leads to better overall line breaks in a paragraph. This is somewhat difficult task because, when hyphenating words, it is not acceptable to insert the hyphen between just any pair of letters. Some hyphenations, such as “new-spaper” can lead the reader “down a garden path.” That is, when reading the end of the line (“new-”), the reader guesses incorrectly that “new” is a complete stem within a compound word and is then completely confused when confronted with the unlikely terminating word “spaper”. A similar problem occurs when the hyphenation causes the reader to pronounce the head portion incorrectly, causing them to read nonsensical words (I’ll give an example in a second.)

To address this problem some automatic text layout systems rely on dictionaries in which acceptable hyphenation points have been marked by a human. While generally correct, such dictionary-based approaches require a very large data file to store the dictionary and fail when given new words. An alternate approach, taken in TeX, is to summarize hyphenation points into a small set of patterns which can then be applied to any word. The TeX method was described in Franklin Liang’s thesis Word Hy-phen-a-tion by Com-put-er. Liang’s thesis claims that his pattern-based finds about 90% of the human marked hyphenation points and finds essentially no incorrect ones.

Unfortunately, essentially no incorrect ones is not the same as no incorrect ones. In the last two papers I’ve written, I’ve come across words that TeX’s method failed rather dramatically on. Fortunately, LaTeX provides an easy way to override the automatic hyphen selection through the \hyphenation{} command.

\hyphenation{white-space} fixes TeX’s “whites-pace”, a somewhat racist rendering. Note that this incorrect hyphenation is the opposite of the “new-spaper” example, which came from Liang’s thesis, highlighting the problems facing a purely pattern-based approach.

\hyphenation{analy-sis} fixes TeX’s “anal-ysis” which leads to a rather infelicitous mispronunciation of the first part of the word.

Despite doing a good job most of the time, TeX’s automatic hyphenation can and does go awry. Keep your eyes open when proof reading!

*I’m pretty sure TeX gets the hyphenation of “mistakes” correct.

Posted in latex.


SIGGRAPH course on Importance Sampling

I just noticed that my Masters work on Resampled Importance Sampling was included in a SIGGRAPH course last year.

In standard importance sampling you have to draw samples from a distribution which can be sampled and which has a known scaling factor (so that the area under the pdf = 1). This is very limiting in some MC situations, such as sampling the incoming light direction in a path tracer. The function we want to sample from is the product of the BRDF and the incident light field. However, we can typically only sample from one component or the other, not both. In RIS, we create a sampled, discrete approximation to the desired distribution and then draw our samples from that. Since the approximation is discrete we can easily normalize it and draw samples from it. In my thesis, I showed that when weighted correctly, this results in an unbiased estimate.

RIS will beat IS when it is substantially cheaper to create the approximate distribution than it is to evaluate the true function. In the path tracing context, I did this by evaluating the BRDF and environment map lighting for the approximate distribution, but not the visibility (the ray tracing step). With this approach, the RIS image (on the right) has substantially lower variance than the IS image (left).

IS vs. RIS

As long as the visibility test accounts for a substantial fraction of the execution time, RIS will beat IS. However, this wasn’t particularly true when I wrote the thesis and is probably less true now. In my thesis, I concluded that, for current rendering tasks, RIS wasn’t a big win since the difference in evaluation cost between the approximate samples and the actual samples wasn’t very high. However, I think it could be quite useful in some MC applications where a relatively good and cheap discrete approximation can be found.

Posted in rendering.


Arc Length-based Aspect Ratio Selection

Here’s a preprint of our paper on aspect ratio selection which will appear in InfoVis 2011. In it we propose a new criteria for banking data plots, building on previous ideas from Bill Cleveland and Jeff Heer and Maneesh Agrawala.

We frame the aspect ratio selection problem as one of minimizing the length of the data curve while keeping the area of the plot constant. This leads to a method that is substantially more robust than previous approaches. We’re also able to demonstrate empirically that the resulting aspect ratios are a compromise between those suggested by previous methods. As shown below, the arc length method can also effectively bank both standard line charts (in this case a loess regression line) as wells as contour charts.

Arc-length banking example

Perhaps the most surprising result is that good aspect ratios can be selected without explicit reference to the slopes or orientations of the line segments within the plot.

Posted in visualization.


C# code for labeling paper

I’ve finally had time to pull the labeling algorithm out of my much larger visualization package. It’s now up on github: https://github.com/jtalbot/Labeling. This implements all parts of the labeling paper, including the formatting variations.

Let me know if you run into any problems with it or have any suggestions for improvement.

Posted in visualization.


R labeling package released on CRAN

Version 0.1 of the labeling package has been released on CRAN.

Posted in visualization.


R code for our axis labeling algorithm

The R version of our labeling code is now hosted at R-forge. You can get it here or install it from within R using install.packages("labeling", repos="http://R-Forge.R-project.org").

A few small bugs in the implementation of our algorithm have been fixed thanks to feedback from Ahmet Karahan who is working on a Java version. I have also added a number of other labeling algorithms that have been proposed or used in the past, including those by Sparks, Thayer, and Nelder (from about 40 years ago), and adaptations of the matplotlib, gnuplot, and R’s pretty labeling functions.

Posted in visualization.


Monitoring Hadoop

As a side project this summer, I implemented a simple visual interface for HOP, an extended version of Hadoop. This was used by the HOP creators in their demo of HOP at this summer’s SIGMOD.

Screenshot of hop visual interface

Hop visual interface

The graphical elements were produced using Protovis since I needed an excuse to play around with it. We ran into minor performance problems using Protovis for so many plots in a single page. In a production system it would be wiser to generate and cache the plots on the server side.

Update: The screenshot shows a task scheduling imbalance bug that we found in HOP using the visual interface.

Posted in visualization.


An Extension of Wilkinson’s Algorithm for Positioning Tick Labels on Axes

Here’s a preprint of our paper on selecting tick labels for axes which will appear in this year’s InfoVis! Source code of the implementation will be made available before the conference. We’re hoping to get this implemented in a number of common plotting libraries. I already have a partial matplotlib version working. I would also like to have one for ggplot. Other suggestions are welcome.

The abstract:

The non-data components of a visualization, such as axes and legends, can often be just as important as the data itself. They provide contextual information essential to interpreting the data. In this paper, we describe an automated system for choosing positions and labels for axis tick marks. Our system extends Wilkinson’s optimization-based labeling approach to create a more robust, full-featured axis labeler. We define an expanded space of axis labelings by automatically generating additional nice numbers as needed and by permitting the extreme labels to occur inside the data range. These changes provide flexibility in problematic cases, without degrading quality elsewhere. We also propose an additional optimization criterion, legibility, which allows us to simultaneously optimize over label formatting, font size, and orientation. To solve this revised optimization problem, we describe the optimization function and an efficient search algorithm. Finally, we compare our method to previous work using both quantitative and qualitative metrics. This paper is a good example of how ideas from automated graphic design can be applied to information visualization.

Update: We’ve released a preliminary R package implementing the three labeling algorithms we compared in the paper. Feedback is appreciated. The final version should be released by InfoVis (in October).

Posted in visualization.