Anscombe's Quartet

Alexander Carlton
March 31st, 2016

In 1973, Francis Anscombe published a paper pressing his fellow statisticians to look at their data graphically, not just calculate some summary statistics.
(Note: Wikipiedia has some background on Anscombe's Quartet).

In this AnscombesQuartet Shiny App we invite the reader to see for themselves the difference between a tabular and a graphical view of data, and to play with the values in Anscombe's Quartet to see if they can work out any similar calculations.

Anscombe's Tables versus Anscombe's Plots

The cornerstone of his paper was a dataset with four X,Y data series.

anscombe
   x1 x2 x3 x4    y1   y2    y3    y4
1  10 10 10  8  8.04 9.14  7.46  6.58
2   8  8  8  8  6.95 8.14  6.77  5.76
3  13 13 13  8  7.58 8.74 12.74  7.71
4   9  9  9  8  8.81 8.77  7.11  8.84
5  11 11 11  8  8.33 9.26  7.81  8.47
6  14 14 14  8  9.96 8.10  8.84  7.04
7   6  6  6  8  7.24 6.13  6.08  5.25
8   4  4  4 19  4.26 3.10  5.39 12.50
9  12 12 12  8 10.84 9.13  8.15  5.56
10  7  7  7  8  4.82 7.26  6.42  7.91
11  5  5  5  8  5.68 4.74  5.73  6.89

Each of the X,Y sets had very similar statistics: mean, variance, even their linear models were similar.

But once plotted, each of these X,Y series is obviously very different.

plot of chunk unnamed-chunk-3

Linear Models of Anscombe's Quartet

  • The linear models are straight forward.
for (i in 1:4) {
    fit[[i]] <- lm(y ~ x, data = data.frame(x = anscombe[,i], y = anscombe[,i+4]))
}
summary(fit[[1]])

Call:
lm(formula = y ~ x, data = data.frame(x = anscombe[, i], y = anscombe[, 
    i + 4]))

Residuals:
     Min       1Q   Median       3Q      Max 
-1.92127 -0.45577 -0.04136  0.70941  1.83882 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   3.0001     1.1247   2.667  0.02573 * 
x             0.5001     0.1179   4.241  0.00217 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.237 on 9 degrees of freedom
Multiple R-squared:  0.6665,    Adjusted R-squared:  0.6295 
F-statistic: 17.99 on 1 and 9 DF,  p-value: 0.00217

What is surprising is the degree to which all of these models match their statistics.

           X1,Y1  X2,Y2  X3,Y3  X4,Y4
Corr        0.82   0.82   0.82   0.82
Intercept   3.00   3.00   3.00   3.00
Slope       0.50   0.50   0.50   0.50
Slope_SE    0.12   0.12   0.12   0.12
R_Squared   0.67   0.67   0.67   0.67
RSS        13.76  13.78  13.76  13.74
Regres_SS  27.51  27.50  27.47  27.49
SS_of_X   110.00 110.00 110.00 110.00

This degree of match in linear models for what are obviously such different data series was one of the 'gotcha' aspects of Anscombe's Quartet.

Why Use The AnscombesQuartet App?

  • In the AnscomesQuartet Shiny app, the user is given a chance to play around with the values in Anscombe's Quartet

    • Work with the raw data.
      • See how small differences cause the matching statistics to drift from each other.
      • See what happens to the statistics or the charts as the outliers are brought into line.
    • Perhaps see if the quartet can be improved.
      • Can more statistics be made to match across the four datasets?
      • Can the precision of the matches be improved?

Operation of the AnscombesQuartet App

The primary tool in the AnscombesQuartet app is a live table of data values.

  • Click on a cell in the table to change a value, the value is commited once the user moves to any other cell, and the rest of the app is updated to match the new value.

The rest of the app is a pair of tabs:

  • The 'Tables' tab displays the tables of statistics calculated from the 'hot' table.
  • The 'Plots' tab displays plots of the data as X,Y plots
  • All tables and plots are updated when each cell in the 'hot' table is updated.

Reloading the app will cause the main table to reset back to Anscombe's original values.