These are some of the key sub-projects of our effort. You can find a complete list of publications, code, demos, and data here.



VizByWiki automatically retrieves relevant data visualizations from the Wikimedia Commons to support arbitrary news articles. Using a novel ground truth dataset, we found that VizByWiki can successfully augment as many as 48% of popular online news articles with news visualizations. We also demonstrated that VizByWiki can automatically rank visualizations according to their usefulness with reasonable accuracy (nDCG@5 of 0.82). Paper to appear at WWW'18 (data and code will be available)



NewsViews and Contextifier were the original pilot projects for this research. With Contextifier we were able to automatically generate annotated stock visualizations that were contextualized to the content of news articles. NewsViews took this a step further by generating maps automatically based on article text (the system would automatically find the appropriate dataset, pick the right way to visualize it, and annotate it given the context of the article). A Contextifier video is available here. We also have a NewsViews video and demo.



PersaLog is a Domain Specific Language (DSL) and system for creating personalized news article content. PersaLog supports personalization of both new content and existing, unpersonalized, information. In addition to allowing a journalist to personalize generic text content, PersaLog provides similar support for existing interactive visualizations. More information (videos/demos/papers) are available on the PersaLog website, persalog.news.



Atlasify is a "Geography of Everything." To understand how Atlasify works, let’s walk through a quick example. Say you’re interested in learning about World War II. You Google "World War II" and you get lots of interesting links, for instance to the "World War II" Wikipedia page. These days, search engines will also give you lots of structured facts about your query like the start date and end date of the war. Atlasify complements these links and structured facts with an entirely new way to explore information about your queries: automatically generated "heat maps" (read more) or try it out at atlasify.com.



Comparifact is our combined system that merges the NewViews and Atlasify work and serves as a platform for new developments. The system is based on a Chrome plugin that generates contextually interesting visualizations (either using the NewsViews algorithms or Atlasify). The server code is available here and the front end extension is here (use instructions coming soon)



VizItCards are used in a card-driven workshop developed for our graduate infovis class. The workshop is intended to provide practice with good design techniques and to simultaneously reinforce key concepts. VizItCards relies on principles of collaborative-learning and research on parallel design to generate positive collaborations and high-quality designs. We designed VizItCards because shifts in information visualization practice are forcing a reconsideration of how infovis is taught. Traditional curricula that focused on conveying research-derived knowledge are slowly integrating design thinking as a key learning objective. In part, this is motivated by the realization that infovis is a wicked design problem, requiring a different kind of design work. What we've learned from automating some aspects of visualization generation can be applied to design practice. A site for VizItCards is at vizitcards.org.



Answering questions with data is a difficult and time-consuming process. Visual dashboards and templates make it easy to get started, but asking more sophisticated questions often requires learning a tool designed for expert analysts. Natural language interaction allows users to ask questions directly in complex programs without having to learn how to use an interface. However, natural language is often ambiguous. DataTone is a mixed-initiative approach to managing ambiguity in natural language interfaces for data visualization. We model ambiguity throughout the process of turning a natural language query into a visualization and use algorithmic disambiguation coupled with interactive ambiguity widgets. These widgets allow the user to resolve ambiguities by surfacing system decisions at the point where the ambiguity matters. Corrections are stored as constraints and influence subsequent queries. DataTone makes is easy to learn and lets users ask questions without worrying about syntax and proper question form. video and presentation


Wikipedia Source Visualization

Our research studies where information in Wikipedia comes from, a characteristic we call geoprovenance. We focus on the four million Wikipedia articles about places that, along with information such as TripAdvisor reviews and geotagged flickr images, constitutes the rising class of information renowned geographer Michael Goodchild calls volunteered geographic information (VGI). As some have argued (Kitchin and Dodge, 2011; Graham et al., 2013), VGI does not just represent the world, but also becomes part of the world. It forms layers of code and information that augment everyday activities: it shapes where we go, what we do, and how we perceive and understand the world that we live in. The study of geoprovenance, which we undertake here, helps us understand the perspectives, gaps, and inequalities in VGI repositories. More information and demo


  • February 2019: We will be presenting our work to the Wikimedia foundation
  • July 2018: Our work on the Wikipedia Commons images was presented at WWW'18 (aka WebConf'18)
  • June 2018: Our paper on cross-lingual Wikipedia won the best paper award at ICWSM'18
  • May 2017: We've released persalog, a system for personalizing news articles and visualizations. A paper on the work appeared at CHI'17 (see our publication page)

Project Sponsors