Visualizations: Explanatory, Exploratory, Diagnostic

Remake of Playfair's chart, Francois Dion

In my previous post ("Of Poets and Visualizations"), I said:

"Even when we agree on the type of visualization (say, a bar chart), we have a considerable latitude based on: 
  • what we want to convey
  • who we want to reach"

Intent, and audience. This will define what type of visualization one is doing. Let's start first with the main type of visualization found in the media, explanatory visualization.

Explanatory Visualization

Is an explanatory visualization a pie chart, a bar chart, an interactive map or an infographic? It can be none, all, or some, depending on the factors identified before. Our goal is to convey some insight we discovered in our research. In the above tweet, Priska Walliman, condenses a lot of information into an explanatory map of bird migrations into something that is easily digested by her audience (Sonntags Blick readers).

Violin and Box plots
The audience is very important. If the audience is yourself, whatever chart or visualization provides an explanation to you could be considered an explanatory visualization. On the other hand, if the audience is the population at large, it is quite unlikely that a violin plot or a box-and-whisker plot will convey the information as intended.

Depending on where the audience lies between these two extremes will dictate the choices to be made in term of type of chart, number of charts, how much text to include, what colors to use, what options are provided to the audience to personalize, so on and so forth. The type of media used to reach the audience will also impact the choices made.

Another aspect of explanatory visualization is that it may or may not be designed with the intention to convey accurate information. Sometimes they are designed with the pathos in mind, especially when a visualization is showing only a fraction of the information (bias) and framed with some text that appeals to our emotions.

At the end of the day, however, only you know if you are trying to create a (biased) explanatory visualization or not. It totally depends on your intent (even if your execution is lacking).

Are there other types of data visualizations? Glad you asked. Not every data visualization is about explanation or story telling. Exploratory and Diagnostic visualizations are in fact almost never about story telling and are the most common for data scientists.

Exploratory Visualization

Class representation for vehicle drive types
In the above screenshot, after loading a data set into VISUAI, I selected a variable to explore. This exploration gave me some insight on the data set. First, that it covers 4 different types, or family, of vehicle drive trains: rear wheel drive, front wheel drive, 4 wheel drive and all wheel drive.

It also gives me the number of observations (or rows) for each of these classes (or types). It is also showing me at a glance that each of these classes are quite different in terms of number of observations. This is known as class imbalance and should inform my model building approach.

Combining a few dozen or more plots, a complete exploration of the data set can be made visually. These are typically used by analysts and data scientists as part of their discovery process. This is not a new trend. In fact, it has been an integral part of data analysis for over 50 years!

In 1962, John Tukey published "The Future of Data Analysis". In it, he laid the foundation to modern day exploratory data analysis. Some visualizations were included, but he did not fully integrate visualizations as part of the exploratory process until 1970 when he published a limited edition of "Exploratory Data Analysis" (1977 for the regular edition), a fairly large book:

John Tukey's Exploratory Data Analysis Vol.1 - D-size battery for scale
In it, he covered many of the tools we can use for visual analysis of data: box-and-whiskers plot, stem-and-leaf plots, hanging histograms and many more. All of these, like scatter plots, heat maps, bar charts and the like can also be used for a different purpose. Not to gain insight, but to make a diagnosis.

Diagnostic Visualization

Depending on the domain, this is also sometimes called confirmatory visualization.

A Diagnostic plot can be the same as one we would use for exploratory analysis. For example the chart "Class representation for vehicle drive types" in the previous section (Exploratory Analysis) is a bar chart comparing how many classes are present for the feature, and how many observations of each classes are in the data set. When using this as part of a machine learning pipeline to make sure our classes are balanced, the exploratory tool becomes a diagnostic tool.

Once more, we see how important the intent is. Some other types of diagnostic visualizations for machine learning or statistics are those that compare a reference (ground truth, y) with an estimation (ลท):

Prediction Error plot, using Yellowbrick

Automating Exploratory and Diagnostic Visualizations

Because a lot of the exploratory and diagnostic visualization entails a lot of repetition, of decision trees and learnable heuristics, we incorporated part of this knowledge into VISUAI so it can quickly present these visualizations, without having to suffer through data ingestion issues, or iterating through scenarios that are less than optimal. VISUAI even learns from experience and gets better at it over time.

Using VISUAI will leave you more time to work on your final visualization, the explanatory one you present to the board...