An example R Markdown file with Mother/Daughter heights data

Biostatistics in Practice HPC workshop, February 2014

require(alr3)
data(heights)

The heights dataset contains 1375 observations of heights of a mother and her daughter. This dataset was compiled by the statistician Karl Pearson in the late 1800s. To make this a more interesting example of a permutation test, we will limit our sample size and only consider a random sample of 200 rows of the dataset. We will start by describing the data visually with a simple graphic.

summary(heights)
##     Mheight        Dheight    
##  Min.   :55.4   Min.   :55.1  
##  1st Qu.:60.8   1st Qu.:62.0  
##  Median :62.4   Median :63.6  
##  Mean   :62.5   Mean   :63.8  
##  3rd Qu.:63.9   3rd Qu.:65.6  
##  Max.   :70.8   Max.   :73.1

Now let's plot the data using ggplot2. First we're going to plot side by side boxplots of the mother's heights next to the daughter's heights.

require(ggplot2)
## Loading required package: ggplot2
heightStack <- data.frame(height = c(heights$Mheight, heights$Dheight), who = factor(c(rep("Mother", 
    1375), rep("Daughter", 1375)), levels = c("Mother", "Daughter")))
ggplot(heightStack, aes(who, height)) + geom_boxplot(outlier.colour = "darkgreen", 
    aes(fill = who))

plot of chunk unnamed-chunk-2

Now we'll do a scatter plot of mother's heights versus daughter's heights. The first line in the code below changes the theme. qplot acts similarly to base R command plot.

theme_set(theme_bw())  ## sets default white background theme for ggplot
qplot(Mheight, Dheight, data = heights, col = "red", alpha = 0.5) + theme(legend.position = "none")

plot of chunk unnamed-chunk-3

We can also do this using ggplot.

theme_set(theme_bw())  ## sets default white background theme for ggplot
ggplot(heights, aes(Mheight, Dheight)) + geom_point(col = "red", alpha = 0.5) + 
    stat_smooth()

plot of chunk describeData