Data Aquisition

I restructured the data from Wikipedia’s Anscombe’s Quartet in Excel to create a new .xlsx file with 44 observations of 3 variables, and read the new file into my main data frame.

Note: My new file, called “aquartet.xlsx”, is included in the submitted zip file.

df <- read_excel("aquartet.xlsx")

Summary Statitics

I created a new data frame for just the summary statistics, so that I could easily reference them in the plots below. I formatted the numbers in the summary statistics data frame to only show 4 decimal places where ever these stats are referenced.

sumstat <- df %>%
  group_by(id) %>%
  summarise(meanX = mean(x),
            meanY = mean(y),
            sdX = sd(x),
            sdY = sd(y),
            rsr = summary(lm(y ~ x))$r.squared) %>%
  mutate_if(is.numeric, format, digits=4,nsmall=4)

I created a table for the summary data for easy viewing, using kableExtra.

kbl(sumstat,
    col.names = c("Dataset","Mean X","Mean Y","STD X","STD Y","R^2")) %>%
    kable_material("hover")
Dataset Mean X Mean Y STD X STD Y R^2
Dataset 1 9.0000 7.5009 3.3166 2.0316 0.6665
Dataset 2 9.0000 7.5009 3.3166 2.0317 0.6662
Dataset 3 9.0000 7.5000 3.3166 2.0304 0.6663
Dataset 4 9.0000 7.5009 3.3166 2.0306 0.6667

Small Multiples Plots

I used ggplot to create 4 scatter plots, and displayed them using the facet_wrap function. I used geom_label to directly add the summary statistics to each plot.

plots <- ggplot(df, aes(x=x, y=y, color=factor(id))) + 
  geom_point(size=2, alpha=.6) +
  geom_smooth(method="lm", se=FALSE, data=df) +
  geom_label(data = sumstat, aes(x=20, y=5.5, label=paste0(
    "Mean X: ", meanX, "\n", 
    "Mean Y: ", meanY, "\n", 
    "Std X: ", sdX, "\n", 
    "Std Y: ", sdY, "\n", 
    "R^2: ", rsr)), 
    size=2.5, hjust="right", color="black", fill="gray97") +
  labs(
    title="Anscombe's Quartet",
    x="X values",
    y="Y values",
    subtitle="The summary statistics of all four sets are nearly identical, although the plots themselves look very different.") +
  theme_bw() +
  theme(legend.position = "none",  plot.subtitle = element_text(size = 9, color = "gray31")) +
  facet_wrap(~id, nrow=2)
plots

Data source: Wikipedia: Anscombe’s Quartet