PART 2: applying the above to real sample data

Download and bring in the “trees.csv” file from Canvas, call it “trees”. This dataset contains tree measurements from 31 individual trees. We are going to apply bootstrapping to repeatedly sample from this sample!

Perform the following tasks: 1. Code: determine the sample size, the SD of height, and the median of height 2. Code: copy/paste all of the code from section 2.1 in th link above, but you will need to replace certain values: a. you do not need the set.seed() code b. sample.size should be the sample size from our trees data c. any mention of “myData” should be replaced with “trees$Height”

trees<- read.table('trees.csv', ',', header = TRUE)
trees
##    rownames Girth Height Volume
## 1         1   8.3     70   10.3
## 2         2   8.6     65   10.3
## 3         3   8.8     63   10.2
## 4         4  10.5     72   16.4
## 5         5  10.7     81   18.8
## 6         6  10.8     83   19.7
## 7         7  11.0     66   15.6
## 8         8  11.0     75   18.2
## 9         9  11.1     80   22.6
## 10       10  11.2     75   19.9
## 11       11  11.3     79   24.2
## 12       12  11.4     76   21.0
## 13       13  11.4     76   21.4
## 14       14  11.7     69   21.3
## 15       15  12.0     75   19.1
## 16       16  12.9     74   22.2
## 17       17  12.9     85   33.8
## 18       18  13.3     86   27.4
## 19       19  13.7     71   25.7
## 20       20  13.8     64   24.9
## 21       21  14.0     78   34.5
## 22       22  14.2     80   31.7
## 23       23  14.5     74   36.3
## 24       24  16.0     72   38.3
## 25       25  16.3     77   42.6
## 26       26  17.3     81   55.4
## 27       27  17.5     82   55.7
## 28       28  17.9     80   58.3
## 29       29  18.0     80   51.5
## 30       30  18.0     80   51.0
## 31       31  20.6     87   77.0
nrow(trees)
## [1] 31
sd(trees$Height)
## [1] 6.371813
median(trees$Height)
## [1] 76
sample.size <- nrow(trees)
n.samples <- 1000
bootstrap.results <- c()
for (i in 1:n.samples)
{
  bootstrap.results[i] <- mean(rnorm(2000,20,4.5))
}
length(bootstrap.results)
## [1] 1000
summary(bootstrap.results)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.70   19.94   20.00   20.00   20.06   20.29
sd(bootstrap.results)
## [1] 0.09834086
par(mfrow=c(2,1), pin=c(5.8,0.98)) 

hist(bootstrap.results, 
     col="#d83737",
     xlab="Mean",
     main=paste("Means of 1000 bootstrap samples from the DGP"))

hist(trees$Height,
     col="#37aad8", 
     xlab="Value", 
     main=paste("Distribution of myData"))