This tutorial is based on the lecture found at https://class.coursera.org/statinference-033/lecture/269.
Consider the dataset generated below.
library(UsingR)
## Loading required package: MASS
## Loading required package: HistData
## Loading required package: Hmisc
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
##
##
## Attaching package: 'UsingR'
##
## The following object is masked from 'package:ggplot2':
##
## movies
##
## The following object is masked from 'package:survival':
##
## cancer
data(father.son)
x<-father.son$sheight
n<-length(x)
Now we resample using the single dataset given B = 10,000 times and produce a matrix with the results.
B<-10000 ##number of resamples
resamples<-matrix(sample(x,n*B,replace=TRUE),B,n)
resampledMedians<-apply(resamples,1,median)
sd(resampledMedians)
## [1] 0.08331873
quantile(resampledMedians,c(0.025,0.975))
## 2.5% 97.5%
## 68.44579 68.81461
We visualize the operations executed above.
library(ggplot2)
g<-ggplot(data.frame(medians=resampledMedians),aes(x=medians))
g<-g+geom_histogram(color="black",fill="lightblue",binwidth=0.05)
g
The plot above estimates the sampling distribution of the median.