Alexander Carlton
March 31st, 2016
In 1973, Francis Anscombe published a paper pressing
his fellow statisticians to look at their data graphically,
not just calculate some summary statistics.
(Note: Wikipiedia has some background on Anscombe's Quartet).
In this AnscombesQuartet Shiny App we invite the reader to see for themselves the difference between a tabular and a graphical view of data, and to play with the values in Anscombe's Quartet to see if they can work out any similar calculations.
The cornerstone of his paper was a dataset with four X,Y data series.
anscombe
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89
Each of the X,Y sets had very similar statistics: mean, variance, even their linear models were similar.
But once plotted, each of these X,Y series is obviously very different.
for (i in 1:4) {
fit[[i]] <- lm(y ~ x, data = data.frame(x = anscombe[,i], y = anscombe[,i+4]))
}
summary(fit[[1]])
Call:
lm(formula = y ~ x, data = data.frame(x = anscombe[, i], y = anscombe[,
i + 4]))
Residuals:
Min 1Q Median 3Q Max
-1.92127 -0.45577 -0.04136 0.70941 1.83882
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0001 1.1247 2.667 0.02573 *
x 0.5001 0.1179 4.241 0.00217 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.237 on 9 degrees of freedom
Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
What is surprising is the degree to which all of these models match their statistics.
X1,Y1 X2,Y2 X3,Y3 X4,Y4
Corr 0.82 0.82 0.82 0.82
Intercept 3.00 3.00 3.00 3.00
Slope 0.50 0.50 0.50 0.50
Slope_SE 0.12 0.12 0.12 0.12
R_Squared 0.67 0.67 0.67 0.67
RSS 13.76 13.78 13.76 13.74
Regres_SS 27.51 27.50 27.47 27.49
SS_of_X 110.00 110.00 110.00 110.00
This degree of match in linear models for what are obviously such different data series was one of the 'gotcha' aspects of Anscombe's Quartet.
In the AnscomesQuartet Shiny app, the user is given a chance to play around with the values in Anscombe's Quartet
The primary tool in the AnscombesQuartet app is a live table of data values.
The rest of the app is a pair of tabs:
Reloading the app will cause the main table to reset back to Anscombe's original values.