Data Visualization

M. Drew LaMar
January 25, 2017

“Numerical quantities focus on expected values, graphical summaries on unexpected values.”

- John Tukey

Course Announcements

  • No reading quiz for Friday!
  • Lab #2 is posted on Blackboard
  • You should have gotten an email from DataCamp to join our class group
  • We will discuss more information regarding Homework #2 in lab this week

In the news - Cervical cancer

In the news - Cervical cancer

Quote: We don’t include men in our calculation because they are not at risk for cervical cancer and by the same measure, we shouldn’t include women who don’t have a cervix
- Anne F. Rositch

In the news - Cervical cancer

Quote: The researchers found that black women have a mortality rate of 10.1 per 100,000. For white women, the rate is 4.7 per 100,000. Past estimates had those rates at 5.7 and 3.2, respectively. The new death rate for black women in the US is on par with that of developing countries.

Discuss: Does this warrant the “much deadlier” headline?

Discuss: Why do you think the death rate for black woman in the US is higher?

Why is data visualization important?

Communicating with data visualization

Data is beautiful!

Data is ugly!

What is data?

Definition: Variables are characteristics that differ among objects of interest.

Definition: Data are the measurements of one or more variables made on a sample of objects of interest.

Data, essentially, is any measurement of the real world since

  • \( n=1 \) counts as a sample,
  • variables can technically have only one possible value (i.e. no variation)

Types of data

  • Categorical variable (qualitative)

    • Nominal (levels have no inherent ordering)
    • Ordinal (levels have an inherent ordering)

    Remember the factor data type in R?

  • Numerical variable (quantitative)

    • Continuous
    • Discrete

    Remember the numeric data type in R?

Types of data (Class discussion)

Discuss: Would the fraction of birds in a large sample infected with avian flu virus be a discrete or continuous numerical variable?

Answer: Neither! The variable of interest here is actually categorical (nominal). Why?

Ask yourself the following questions:

  • What is the population of interest?
  • What measurement is being taken on objects in population?
  • What are the characteristics of this measurement (i.e. data type)?

Plots and data types

Frequency distributions of univariate data

Type of data Graphical method
Categorical Bar graph
Numerical Histogram

Plots and data types

Showing association of bivariate data

Type of data Graphical method
Two numerical Scatter plot
Line plot
Map
Two categorical Grouped bar graph
Mosaic plot
Mixed Strip chart
Box plot
Multiple histograms
Cumulative frequency distributions

Plots and data types

Visualize before you analyze!!!

Data visualization is one step in exploratory data analysis.

Quote: …the first step in any data analysis or statistical procedure is to graph the data and look at it. Humans are a visual species, with brains evolved to process visual information. Take advantage of millions of years of evolution, and look at visual representations of your data before doing anything else.
- Whitlock & Schluter

Important!

W&S 4 Rules of Graphing

  • Show the data.
  • Make patterns in the data easy to see.
  • Represent magnitudes honestly.
  • Draw graphical elements clearly.

If you want to graph some data, you most likely will need to manipulate the data first to put it in the right form.