Data Visualization

M. Drew LaMar
September 4, 2019

“Numerical quantities focus on expected values, graphical summaries on unexpected values.”

- John Tukey

Course Announcements

  • No reading quiz for Friday!
  • Lab #2 and Homework #2 are posted on Blackboard
    • Updated Lab #2!
    • Complete the following DataCamp course: Communicating with Data in the Tidyverse [only Chapter 3: Introduction to RMarkdown]
  • Reminder: We will discuss more information regarding Homework #2 in lab this week
  • Office hours:
    • Instructor: M 11 am; R 1 pm
    • TAs: F 9 am

Data as Information (recap)

For your question, there is desired and undesired information in your data.

Goals:

  • Get accurate information by reducing bias
  • Get precise information by reducing sampling error due to random variation (increase signal-to-noise ratio)
Definition: Bias is a systematic discrepancy between the estimates we would obtain, if we could sample a population again and again, and the true population characteristic.

Data as Information (recap)

For your question, there is desired and undesired information in your data.

Goals:

  • Get accurate information by reducing bias
  • Get precise information by reducing sampling error due to random variation (increase signal-to-noise ratio)
Definition: Sampling error is the difference between an estimate and the population parameter being estimated caused by chance.

Precision vs Accuracy

Random sampling

The main assumptions of all statistical techniques is that your data come from a random sample.

Definition: In a random sample, each member of a population has an equal and independent chance of being selected.


Random sampling

  1. minimizes bias (equal) and
  2. makes it possible to measure the amount of (quantify precision) sampling error (independent)

Random sampling (Class discussion)

In a recent study, researchers took electrophysiological measurements from the brains of two rhesus macaques (monkeys). Forty neurons were tested in each monkey, yielding a total of 80 measurements.

  1. Do the 80 neurons constitute a random sample? Why or why not?

    Lack of independence

  2. If the 80 measurements were analyzed as though they constituted a random sample, what consequences would this have for the estimate of the measurement in the monkey population?

    Incorrect precision of estimate (most likely underestimated)

Why is data visualization important?

Communicating with data visualization

Data is beautiful!

Data is ugly!