I restructured the data from Wikipedia’s Anscombe’s Quartet in Excel to create a new .xlsx file with 44 observations of 3 variables, and read the new file into my main data frame.
Note: My new file, called “aquartet.xlsx”, is included in the submitted zip file.
df <- read_excel("aquartet.xlsx")
I created a new data frame for just the summary statistics, so that I could easily reference them in the plots below. I formatted the numbers in the summary statistics data frame to only show 4 decimal places where ever these stats are referenced.
sumstat <- df %>%
group_by(id) %>%
summarise(meanX = mean(x),
meanY = mean(y),
sdX = sd(x),
sdY = sd(y),
rsr = summary(lm(y ~ x))$r.squared) %>%
mutate_if(is.numeric, format, digits=4,nsmall=4)
I created a table for the summary data for easy viewing, using kableExtra.
kbl(sumstat,
col.names = c("Dataset","Mean X","Mean Y","STD X","STD Y","R^2")) %>%
kable_material("hover")
| Dataset | Mean X | Mean Y | STD X | STD Y | R^2 |
|---|---|---|---|---|---|
| Dataset 1 | 9.0000 | 7.5009 | 3.3166 | 2.0316 | 0.6665 |
| Dataset 2 | 9.0000 | 7.5009 | 3.3166 | 2.0317 | 0.6662 |
| Dataset 3 | 9.0000 | 7.5000 | 3.3166 | 2.0304 | 0.6663 |
| Dataset 4 | 9.0000 | 7.5009 | 3.3166 | 2.0306 | 0.6667 |
I used ggplot to create 4 scatter plots, and displayed them using the facet_wrap function. I used geom_label to directly add the summary statistics to each plot.
plots <- ggplot(df, aes(x=x, y=y, color=factor(id))) +
geom_point(size=2, alpha=.6) +
geom_smooth(method="lm", se=FALSE, data=df) +
geom_label(data = sumstat, aes(x=20, y=5.5, label=paste0(
"Mean X: ", meanX, "\n",
"Mean Y: ", meanY, "\n",
"Std X: ", sdX, "\n",
"Std Y: ", sdY, "\n",
"R^2: ", rsr)),
size=2.5, hjust="right", color="black", fill="gray97") +
labs(
title="Anscombe's Quartet",
x="X values",
y="Y values",
subtitle="The summary statistics of all four sets are nearly identical, although the plots themselves look very different.") +
theme_bw() +
theme(legend.position = "none", plot.subtitle = element_text(size = 9, color = "gray31")) +
facet_wrap(~id, nrow=2)
plots
Data source: Wikipedia: Anscombe’s Quartet