An Extension of Wilkinson’s Algorithm for Positioning Tick Labels on Axes
Justin Talbot, Sharon Lin, Pat Hanrahan. InfoVis 2010.
Abstract
The non-data components of a visualization, such as axes and legends, can often be just as important as the data itself. They provide contextual information essential to interpreting the data. In this paper, we describe an automated system for choosing positions and labels for axis tick marks. Our system extends Wilkinson’s optimization-based labeling approach to create a more robust, full-featured axis labeler. We define an expanded space of axis labelings by automatically generating additional nice numbers as needed and by permitting the extreme labels to occur inside the data range. These changes provide flexibility in problematic cases, without degrading quality elsewhere. We also propose an additional optimization criterion, legibility, which allows us to simultaneously optimize over label formatting, font size, and orientation. To solve this revised optimization problem, we describe the optimization function and an efficient search algorithm. Finally, we compare our method to previous work using both quantitative and qualitative metrics. This paper is a good example of how ideas from automated graphic design can be applied to information visualization.
Online Aggregation and Continuous Query support in MapReduce
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, Russell Sears. SIGMOD 2010 Demo.
Abstract
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to disk before it is consumed. In this demonstration, we describe a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We demonstrate a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see “early returns” from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop, and can run unmodified user-defined MapReduce programs.
EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers
Justin Talbot, Bongshin Lee, Desney Tan, Ashish Kapoor. CHI 2009.
Abstract
Machine learning is an increasingly used computational tool within human-computer interaction research. While most researchers currently utilize an iterative approach to refining classifier models and performance, we propose that ensemble classification techniques may be a viable and even preferable alternative. In ensemble learning, algorithms combine multiple classifiers to build one that is superior to its components. In this paper, we present EnsembleMatrix, an interactive visualization system that presents a graphical view of confusion matrices to help users understand relative merits of various classifiers. EnsembleMatrix allows users to directly interact with the visualizations in order to explore and build combination models. We evaluate the efficacy of the system and the approach in a user study. Results show that users are able to quickly combine multiple classifiers operating on multiple feature sets to produce an ensemble classifier with accuracy that approaches best-reported performance classifying images in the CalTech-101 dataset.
Vispedia: On-demand Data Integration for Interactive Visualization and Exploration
Bryan Chan, Justin Talbot, Leslie Wu, Nathan Sakunkoo, Mike Cammarano, Pat Hanrahan, ACM SIGMOD Demo Paper, 2009.
Vispedia: Interactive Visual Exploration of Wikipedia Data via Search-Based Integration
Bryan Chan, Leslie Wu, Justin Talbot, Mike Cammarano, Pat Hanrahan. IEEE Information Visualization, 2008.
Abstract
Wikipedia is an example of the collaborative, semi-structured data sets emerging on the Web. These data sets have large, non-uniform schema that require costly data integration into structured tables before visualization can begin. We present Vispedia, a Web-based visualization system that reduces the cost of this data integration. Users can browse Wikipedia, select an interesting data table, then use a search interface to discover, integrate, and visualize additional columns of data drawn from multiple Wikipedia articles. This interaction is supported by a fast path search algorithm over DBpedia, a semantic graph extracted from Wikipedia’s hyperlink structure. Vispedia can also export the augmented data tables produced for use in traditional visualization systems. We believe that these techniques begin to address the “long tail” of visualization by allowing a wider audience to visualize a broader class of data. We evaluated this system in a first-use formative lab study. Study participants were able to quickly create effective visualizations for a diverse set of domains, performing data integration as needed.
Structuring collections with Scatter/Gather extensions
Omar Alonso, Justin Talbot. SIGIR 2008: 697-698.
Abstract
A major component of sense-making is organizing—grouping, labeling, and summarizing—the data at hand in order to form a useful mental model, a necessary precursor to identifying missing information and to reasoning about the data. Previous work has shown the Scatter/Gather model to be useful in exploratory activities that occur when users encounter unknown document collections. However, the topic structure communicated by Scatter/Gather is closely tied to the behavior of the underlying clustering algorithm; this structure may not reflect the mental model most applicable to the information need. In this paper we describe the initial design of a mixed-initiative information structuring tool that leverages aspects of the well-studied Scatter/Gather model but permits the user to impose their own desired structure when necessary.
Visualization of Heterogeneous Data
Mike Cammarano, Xin Luna Dong, Bryan Chan, Jeff Klingner, Justin Talbot, Alon Y. Halevy, Pat Hanrahan. IEEE Trans. Vis. Comput. Graph. 13(6): 1200-1207 (2007).
Abstract
Both the Resource Description Framework (RDF), used in the semantic web, and Maya Viz u-forms represent data as a graph of objects connected by labeled edges. Existing systems for flexible visualization of this kind of data require manual specification of the possible visualization roles for each data attribute. When the schema is large and unfamiliar, this requirement inhibits exploratory visualization by requiring a costly up-front data integration step. To eliminate this step, we propose an automatic technique for mapping data attributes to visualization attributes. We formulate this as a schema matching problem, finding appropriate paths in the data model for each required visualization attribute in a visualization template.
Two Stage Importance Sampling for Direct Lighting
David Cline, Parris K. Egbert, Justin F. Talbot, and David L. Cardon. In Rendering Techniques 2006 (Eurographics Symposium on Rendering), pp. 103-113, (June 2006).
Abstract
We describe an importance sampling method to generate samples based on the product of a BRDF and an environment map or large light source. The method works by creating a hierarchical partition of the light source based on the BRDF function for each primary (eye) ray in a ray tracer. This partition, along with a summed area table of the light source, form an approximation to the product function that is suitable for importance sampling. The partition is used to guide a sample warping algorithm to transform a uniform distribution of points so that they approximate the product distribution. The technique is unbiased, requires little precomputation, and we demonstrate that it works well for a variety of BRDF types. Further, we present an adaptive method which allocates varying numbers of samples to different image pixels to reduce shadow artifacts
Importance Resampling for Global Illumination
Justin Talbot, Master’s Thesis, Brigham Young University, 2005. bibtex
Abstract
This thesis develops a generalized form of Monte Carlo integration called Resampled Importance Sampling. It is based on the importance resampling sample generation technique. Resampled Importance Sampling can lead to significant variance reduction over standard Monte Carlo integration for common rendering problems. We show how to select the importance resampling parameters for near optimal variance reduction. We also combine RIS with stratification and with Multiple Importance Sampling for further variance reduction. We demonstrate the robustness of this technique on the direct lighting problem and achieve up to a 33% variance reduction over standard techniques. We also suggest using RIS as a default BRDF sampling technique.
PDF (2,307K)
Powerpoint (7,919K)
Energy Redistribution Path Tracing
David Cline, Justin F. Talbot, and Parris K. Egbert. ACM Transactions on Graphics (SIGGRAPH 2005 Proceedings). bibtex
Abstract
We present Energy Redistribution (ER) sampling as an unbiased method to solve correlated integral problems. ER sampling is a hybrid algorithm that uses Metropolis sampling-like mutation strategies in a standard Monte Carlo integration setting, rather than resorting to an intermediate probability distribution step. In the context of global illumination, we present Energy Redistribution Path Tracing (ERPT). Beginning with an inital set of light samples taken from a path tracer, ERPT uses path mutations to redistribute the energy of the samples over the image plane to reduce variance. The result is a global illumination algorithm that is conceptually simpler than Metropolis Light Transport (MLT) while retaining its most powerful feature, path mutation. We compare images generated with the new technique to standard path tracing and MLT.
PDF (6,345K)
Importance Resampling for Global Illumination
Justin F. Talbot, David Cline, and Parris K. Egbert. Eurographics Symposium on Rendering 2005, pages 139-146. bibtex
Abstract
This paper develops importance resampling into a variance reduction technique for Monte Carlo integration. Importance resampling is a sample generation technique that can be used to generate more equally weighted samples for importance sampling. This can lead to significant variance reduction over standard importance sampling for common rendering problems. We show how to select the importance resampling parameters for near optimal variance reduction. We demonstrate the robustness of this technique on common global illumination problems and achieve a 10%-70% variance reduction over standard importance sampling for direct lighting. We conclude that further variance reduction could be achieved with cheaper sampling methods.
PDF (3,706K)
Powerpoint (1307K)
VTVS – A Complete Implementation of a Virtual Terrain Visualization System
Justin F. Talbot and Parris K. Egbert. IASTED VIIP 2003, Benalmadena, Spain, September 2003. bibtex
Abstract
VTVS (Virtual Terrain Visualization System) is a complete terrain visualization system for interactive exploration of models of real world locations. It includes a fast LOD system for terrain rendering and support for large numbers of plants and buildings on top of the terrain. In this paper we discuss the implementation of the system and various algorithms we have developed. We provide examples of the success of VTVS and we discuss possible directions for future research.
PDF (968K)
VTVS (Virtual Terrain Visualization System) is a complete terrain visualization system for interactive exploration of models of real world locations. It includes a fast LOD system for terrain rendering and support for large numbers of plants and buildings on top of the terrain. In this paper we discuss the implementation of the system and various algorithms we have developed. We provide examples of the success of VTVS and we discuss possible directions for future research.