Note. The content is based on Wickham (2017), chapter 5.

knitr::opts_chunk$set(
    message = FALSE,
    warning = FALSE,
    include = FALSE
)
library(tidyverse)

To illustrate typical questions on data, we use the diamonds (to be described on a web page via a link from here) data set

Two very important question for any exploratory analysis are: 1. How do the values in one variable vary? (What is the distribution of values?)

  1. How do two variables co-vary?

1. Variation of values in one variable

If a variable is categorical, the distribution of values can be visualised with a bar chart:

ggplot(diamonds, aes(x=cut)) + 
  geom_bar()

The numbers calculated with dplyr’s count() function as well:

diamonds %>% 
  count(cut)

For numerical variables, the distribution can be inspected with a histogram.

ggplot(data = diamonds) + 
  geom_histogram(mapping = aes(x = carat), binwidth = 0.5)

A histogram divides the x-axis into equally spaced bins and then uses the height of each bar to display the number of observations that fall in each bin.The number of bins an be set in the geom.

To compare multiple distribution visually in on graph, the frequency polygon is the right format:

ggplot(diamonds, aes(x = carat, color = cut)) + 
  geom_freqpoly(binwidth = 0.1)

Typical values

  • Which values are the most common? Why?
  • Which values are rare? Why? Does that match your expectations?
  • Can you see any unusual patterns? What might explain them?

Let’s look at this in the context of diamonds with 3 carat or less:

smaller <- diamonds %>% 
  filter(carat < 3)
ggplot(data = smaller, mapping = aes(x = carat)) + 
  geom_histogram(binwidth = 0.01)

Unusual values

Missing values

2. Covariation

LS0tCnRpdGxlOiAiRURBIFdvcmtib29rIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCipOb3RlLiogVGhlIGNvbnRlbnQgaXMgYmFzZWQgb24gV2lja2hhbSAoMjAxNyksIGNoYXB0ZXIgNS4gCgpgYGB7ciBzZXR1cH0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KAoJbWVzc2FnZSA9IEZBTFNFLAoJd2FybmluZyA9IEZBTFNFLAoJaW5jbHVkZSA9IEZBTFNFCikKbGlicmFyeSh0aWR5dmVyc2UpCmBgYAoKVG8gaWxsdXN0cmF0ZSB0eXBpY2FsIHF1ZXN0aW9ucyBvbiBkYXRhLCB3ZSB1c2UgdGhlICBgZGlhbW9uZHNgICh0byBiZSBkZXNjcmliZWQgb24gYSB3ZWIgcGFnZSB2aWEgYSBsaW5rIGZyb20gaGVyZSkgZGF0YSBzZXQKClR3byB2ZXJ5IGltcG9ydGFudCBxdWVzdGlvbiBmb3IgYW55IGV4cGxvcmF0b3J5IGFuYWx5c2lzIGFyZTogCjEuIEhvdyBkbyB0aGUgdmFsdWVzIGluIG9uZSB2YXJpYWJsZSB2YXJ5PyAoV2hhdCBpcyB0aGUgZGlzdHJpYnV0aW9uIG9mIHZhbHVlcz8pCgoKMi4gSG93IGRvIHR3byB2YXJpYWJsZXMgY28tdmFyeT8gCgojIyAxLiBWYXJpYXRpb24gb2YgdmFsdWVzIGluIG9uZSB2YXJpYWJsZQpJZiBhIHZhcmlhYmxlIGlzICoqY2F0ZWdvcmljYWwqKiwgdGhlIGRpc3RyaWJ1dGlvbiBvZiB2YWx1ZXMgY2FuIGJlIHZpc3VhbGlzZWQgd2l0aCBhIGJhciBjaGFydDoKYGBge3J9CmdncGxvdChkaWFtb25kcywgYWVzKHg9Y3V0KSkgKyAKICBnZW9tX2JhcigpCmBgYApUaGUgbnVtYmVycyBjYWxjdWxhdGVkIHdpdGggZHBseXIncyBjb3VudCgpIGZ1bmN0aW9uIGFzIHdlbGw6CmBgYHtyfQpkaWFtb25kcyAlPiUgCiAgY291bnQoY3V0KQpgYGAKRm9yICoqbnVtZXJpY2FsKiogdmFyaWFibGVzLCB0aGUgZGlzdHJpYnV0aW9uIGNhbiBiZSBpbnNwZWN0ZWQgd2l0aCBhIFtoaXN0b2dyYW1dKGh0dHA6Ly9rY29sYWIub3JnL3dpa2kvcG13aWtpLnBocC9EYXRhdml6L0hpc3RvZ3JhbSkuIAoKYGBge3J9CmdncGxvdChkYXRhID0gZGlhbW9uZHMpICsgCiAgZ2VvbV9oaXN0b2dyYW0obWFwcGluZyA9IGFlcyh4ID0gY2FyYXQpLCBiaW53aWR0aCA9IDAuNSkKYGBgCkEgaGlzdG9ncmFtIGRpdmlkZXMgdGhlIHgtYXhpcyBpbnRvIGVxdWFsbHkgc3BhY2VkIGJpbnMgYW5kIHRoZW4gdXNlcyB0aGUgaGVpZ2h0IG9mIGVhY2ggYmFyIHRvIGRpc3BsYXkgdGhlIG51bWJlciBvZiBvYnNlcnZhdGlvbnMgdGhhdCBmYWxsIGluIGVhY2ggYmluLlRoZSBudW1iZXIgb2YgYmlucyBhbiBiZSBzZXQgaW4gdGhlIGdlb20uIAoKVG8gY29tcGFyZSBtdWx0aXBsZSBkaXN0cmlidXRpb24gdmlzdWFsbHkgaW4gb24gZ3JhcGgsIHRoZSBmcmVxdWVuY3kgcG9seWdvbiBpcyB0aGUgcmlnaHQgZm9ybWF0OiAKCmBgYHtyfQpnZ3Bsb3QoZGlhbW9uZHMsIGFlcyh4ID0gY2FyYXQsIGNvbG9yID0gY3V0KSkgKyAKICBnZW9tX2ZyZXFwb2x5KGJpbndpZHRoID0gMC4xKQpgYGAKIyMjIFR5cGljYWwgdmFsdWVzCgoqIFdoaWNoIHZhbHVlcyBhcmUgdGhlIG1vc3QgY29tbW9uPyBXaHk/CiogV2hpY2ggdmFsdWVzIGFyZSByYXJlPyBXaHk/IERvZXMgdGhhdCBtYXRjaCB5b3VyIGV4cGVjdGF0aW9ucz8KKiBDYW4geW91IHNlZSBhbnkgdW51c3VhbCBwYXR0ZXJucz8gV2hhdCBtaWdodCBleHBsYWluIHRoZW0/CgpMZXQncyBsb29rIGF0IHRoaXMgaW4gdGhlIGNvbnRleHQgb2YgZGlhbW9uZHMgd2l0aCAzIGNhcmF0IG9yIGxlc3M6CmBgYHtyfQpzbWFsbGVyIDwtIGRpYW1vbmRzICU+JSAKICBmaWx0ZXIoY2FyYXQgPCAzKQoKZ2dwbG90KGRhdGEgPSBzbWFsbGVyLCBtYXBwaW5nID0gYWVzKHggPSBjYXJhdCkpICsgCiAgZ2VvbV9oaXN0b2dyYW0oYmlud2lkdGggPSAwLjAxKQpgYGAKIyMjIFVudXN1YWwgdmFsdWVzCgojIyMgTWlzc2luZyB2YWx1ZXMKCiMjIDIuIENvdmFyaWF0aW9uCgoK