Information visualizations, such as charts and maps, can greatly enhance news articles by adding context, helping the reader understand complex facts, aiding in decision making, and making information more memorable. Unfortunately, creating good news visualizations is a difficult and labor-intensive task that involves numerous complex decisions. A designer must identify data relevant to the article, clean the data, generate the visualization (a complex process on its own), and provide annotations to connect the article and visualization. While some design guidelines have been developed, many decisions are based on designer intuition, a process that is not scalable to the thousands of news articles that are published every day. This project seeks to build intelligent tools to help designers more quickly create good news visualizations and to develop systems that generate news visualizations autonomously. This research project will enhance citizen understanding of complex information in the news and improve numerical, graphical, and geography literacy. Additionally, the research will provide support for new job categories (e.g., data scientists, computational journalists, data analysts, etc.) and existing companies (e.g., online media, search engines, etc.) in their evolution to new interactive platforms. The research results will be integrated into a broad set of widely accessible educational materials for a variety of courses (visualization, spatial computing, and text analysis) and will serve as research and practical training for undergraduates, graduates, and professionals.
Providing a scalable solution to automatically generating contextually-relevant visualizations requires the understanding and encoding of the design process. Specifically, the goals of this project are (a) identifying the decision process of visualization designers, (b) creating automated components that operationalize these decisions including text processing, searching through a wide range of heterogeneous data sources and datasets (e.g. census data, stock market data, government macroeconomic data), and automatic visualization construction and annotation, and (c) ranking of the visualizations based on well-known quantitative metrics from information retrieval and information visualization such as relevance, expressiveness, and effectiveness. By extracting key comparisons from an article's text through the use of natural language processing and using existing visualization-article pairs as an evaluation corpus, the system will ensure that relevant datasets are found and that the selected visual forms preserve and enhance the information conveyed in the article. For example, the system will automatically create thematic maps for geospatial comparisons of population change in the U.S. and time series for longitudinal comparisons of company financial results. Although the focus of this work in on the news domain, the research can be extended to other application areas including textbooks, internal company reports, and more generally, to any texts that implicitly or explicitly correspond to quantitative data.