12.10 Exercises

0. We are going to use the HistData package. If it is not installed you can install it like this:

install.packages("HistData")

Load the height data set and create a vector x with just the male heights used in Galton’s data on the heights of parents and their children from his historic research on heredity.

library(HistData) 
data(Galton) 
x <- Galton$child 
x
##   [1] 61.7 61.7 61.7 61.7 61.7 62.2 62.2 62.2 62.2 62.2 62.2 62.2 63.2 63.2 63.2
##  [16] 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2
##  [31] 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 63.2 64.2
##  [46] 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2
##  [61] 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2
##  [76] 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2
##  [91] 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 64.2 65.2 65.2
## [106] 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2
## [121] 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2
## [136] 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2 65.2
## [151] 65.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2
## [166] 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2
## [181] 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2
## [196] 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2
## [211] 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2
## [226] 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2
## [241] 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2
## [256] 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 66.2 67.2 67.2
## [271] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [286] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [301] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [316] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [331] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [346] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [361] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [376] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [391] 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2 67.2
## [406] 67.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2
## [421] 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2
## [436] 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2
## [451] 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2
## [466] 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2
## [481] 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2
## [496] 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2
## [511] 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2 68.2
## [526] 68.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [541] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [556] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [571] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [586] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [601] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [616] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [631] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [646] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [661] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [676] 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2 69.2
## [691] 69.2 69.2 69.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2
## [706] 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2
## [721] 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2
## [736] 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2
## [751] 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2
## [766] 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2
## [781] 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 70.2 71.2 71.2 71.2
## [796] 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2
## [811] 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2
## [826] 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2
## [841] 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2 71.2
## [856] 71.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2
## [871] 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2
## [886] 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 73.2 73.2 73.2
## [901] 73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.7
## [916] 73.7 73.7 73.7 73.7 73.7 73.7 73.7 73.7 73.7 73.7 73.7 73.7 73.7

1. Compute the average and median of these data.

averagex<-mean(x)
medx<-median(x)

2. Compute the median and median absolute deviation of these data.

medx<-median(x)
madx<-mad(x)

3. Now suppose Galton made a mistake when entering the first value and forgot to use the decimal point. You can imitate this error by typing:

How many inches does the average grow after this mistake?

x_with_error<-x
x_with_error[1]<-x_with_error[1]*10
mean(x_with_error)-averagex
## [1] 0.5983836

4. How many inches does the SD grow after this mistake?

sd(x_with_error)-sd(x)
## [1] 15.6746

5. How many inches does the median grow after this mistake?

median(x_with_error)-median(x)
## [1] 0

6. How many inches does the MAD grow after this mistake?

mad(x_with_error)-mad(x)
## [1] 0

7. How could you use exploratory data analysis to detect that an error was made?

We would see an obvious shift in the distribution. A boxplot, histogram, or qq-plot would reveal a clear outlier.

8. How much can the average accidentally grow with mistakes like this? Write a function called error_avg that takes a value k and returns the average of the vector x after the first entry changed to k. Show the results for k=10000 and k=-10000.

error_avg<-function(x, k){
x_with_error<-x
x_with_error[1]<-k
newmean<-mean(x_with_error)
change<-mean(x)-newmean
return(c(newmean, change))
}

j<-c(50, 9, 10, 100, 10, 1600)
k<-10000
print(error_avg(j, 10000))
## [1]  1954.833 -1658.333
print(error_avg(j, -10000))
## [1] -1378.5  1675.0