August 11, 2015

Abstract

We present the use of new visualization methods for Introductory Statistics classes at the undergraduate, major, and graduate levels. Because of the wide availability of bubble charts and maps in software, these types of plots can easily be included in the discussion during the beginning parts of an Introductory Statistics course.

In addition to pie charts, bar graphs, box-plots, etc., bubble charts can be introduced with motion to visualize time. The motion is an excellent demonstration of modern software and more engaging for beginning Statistics students. Bubble charts can be used to present up to 4 and 5 dimensional data.

Abstract

Maps are very easily made with software and because they are commonly used these days online, giving students the ability to connect maps to the concepts being presented in class adds to the class experience.

Outline

Three levels of students learning Statistics

  • Undergraduates
  • Majors
  • Masters

New visualizations

  • bubble charts
  • maps

New Visualizations

During the last decade many new visualization techniques have been developed and software to implement the visualizations have become available.

However, most of the introductory Statistics textbooks have yet to incorporate discussion and examples of these new techniques.

At this point all of the main statistical computer programs include these techniques, so students have access to software that can be used to create these visualization.

New Visualizations

The opportunity to introduce the idea of conditioning though bubble graphs and maps is now easily implemented.

Introducing the concept of conditioning through visualizations can facilitate the introduction to probability and conditional probability and modeling (t-test, ANOVA, Regression, ANCOVA, Time Series Analysis, Bayesian Modeling).

New Visualizations - Bubble charts in Motion

The TED talks of Hans Rosling should become required viewing of all new students of Statistics.

For undergraduate students of Statistics, the bubble charts in motion are very interesting and the storytelling that Rosling uses is every engaging.

For Statistics majors, the bubble charts in motion show the association/dependence of a variable on multiple other variables, including time.

New Visualization - Bubble charts in Motion

For master's students of Statistics, the bubble charts in motion should be reproduced using modern software. An effort needs to be made to connect the visualization to conditional probability and modeling.

New Visualizations - Maps

Almost all students at this point are familiar with and conformatable with maps, from GPS to google maps. Using maps to show differences between regions is possible using software.

What used to be only possible with advanced GIS software is now possible using most statistical software.

Probability and Expecation presented more visually

At the introductory level, there has been effort made to reduce the amount of probability presented over the last couple of decades.

In Introductory Statistics books there are three main topics

  1. Descriptive Statistics
  2. Probability
  3. Statistical Inference

Probability and Expecation presented more visually

The visualizations presented in the first part of the books are not so commonly used in the later parts of the books.

With new visualization tools connections between topics can be more effectively made.

Probability and Expectation presented more visually

  • The concept of conditioning could be introduced earlier when aggregating for computing descriptive statistics and for making plots, within groups.
  • Some of the effort to compute (mathematically/using tables) z-scores and normal probabilities could be replaced with visualization of normal density plots. (This is an different discussion.)
  • Conditional plots related to standard statistical tests could be introduced much eariler with the expectation of formalizing the tests later. And the plots could be reinforced when arriving at the related topics later.

Bubble charts

There are many excellent resources for learning about bubble chart available online.

One resource is by Nathan Yau, of the FlowingData blog/website and the author of Visualize This.

blog post: How to make Bubble Charts

His tutorial is for R. He examines crime data by state.

Bubble charts - population size

Maps

There are many excellent resources for learning about making maps available online. Using google Sheets. Murder Rate by State.

Classroom Activities

At all levels of Statistics, class instruction and in books, more use of bubble plots and maps could be introduced eariler.

The use of these plots could be used to discuss conditioning earlier than in the independence/dependence discussion in probability or in the more advanced discussion of expectation/conditional expectation/conditional variance. Integrating the concept of conditioning could be used in the Descriptive Statistics section early in the classes and books.

Undergraduate Statistics classes

Exercise 1: Watch the TED Talk The Best Stats You've Ever Seen, Jan. 16, 2007, by Hans Rosling.
What are the variables that Hans plots in the dynamic bubble plot presented duing the 4th to 5th minute of the video? What are the variables that Hans plots in the dynamic bubble plot presented duing the 5th to 6th minute of the video? How many variables are plotted in each bubble plot? What relationships are presented?
Exercise 2: Find a dataset that includes a rate, proportion, or percentage of a variable for each state in the United States. Plot the data on a map of the United States and adjust the color to show Low, Medium, and High levels. Be sure to include an appropriate title and legend.

Statistics Major classes

Exercise 1: From the gapminder.org website, download the Gapminder World Offline and reproduce the bubble plots Hans presents in the TED Talk The Best Stats You've Ever Seen, Jan. 16, 2007, by Hans Rosling.

Exercise 2: Does the Gapminder World Offline include any maps? How are they color coded?

Exercise 3: Watch the 6th to the 9th minute of the video, examine the use of conditional densities and density estimation. What are the variables that Hans plots in the dynamic density plot?

Master's level Statistics classes

Exercise 1: Try the googleVis library in R to make a bubble plot.

Exercise 2: Use tableauPublic to plot data on a map. Use plotly to make a map.

Exercise 3: Read the paper Product plots by Hadley Wickham and Heike Hofmann.

Conclusions

  • The use of more visualization in the early Descriptive Statistics part of an introductory course could better inform the ideas presented in the Probability part of the course.
  • The use of more visualization in the Probability part of a Statistics course could better inform the ideas presented in the Statistical Inference part of the course.
  • Students relate to visualzation.

References