https://rpubs.com/evelynebrie/bootstrapping
This tutorial outlines how to generate data with certain parameters (things you are creating but could very easily determine for yourself from your own sample data).
There are 2 EXERCISES listed at the bottom of the linked page, please complete those 2 exercises as part of this assignment and include th code and output for them below.
set.seed(300)
myData <- rnorm(2000,20,4.5)
length(myData)
## [1] 2000
mean(myData)
## [1] 20.25773
sd(myData)
## [1] 4.590852
set.seed(200)
sample.size <- 2000
n.samples <- 1000
bootstrap.results <- c()
for (i in 1:n.samples)
{
obs <- sample(1:sample.size, replace = TRUE)
bootstrap.results[i] <- mean(myData[obs])
}
length(bootstrap.results)
## [1] 1000
summary(bootstrap.results)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.92 20.19 20.26 20.26 20.33 20.57
sd(bootstrap.results)
## [1] 0.1021229
par(mfrow=c(2,1), pin=c(5.8,0.98))
hist(bootstrap.results,
col="#d83737",
xlab="Mean",
main=paste("Means of 1000 Bootstrap samples from myData"))
hist(myData,
col="#37aad8",
xlab="value",
main=paste("Distribution of myData"))
set.seed(200)
sample.size <- 2000
n.samples <- 1000
bootstrap.results <- c()
for (i in 1:n.samples)
{
bootstrap.results[i] <- mean(rnorm(2000,20,4.5))
}
length(bootstrap.results)
## [1] 1000
summary(bootstrap.results)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.64 19.93 20.00 20.00 20.07 20.32
sd(bootstrap.results)
## [1] 0.1041927
par(mfrow=c(2,1), pin=c(5.8,0.98))
hist(bootstrap.results,
col="#d83737",
xlab="Mean",
main=paste("Means of 1000 bootstrap samples from the DGP"))
hist(myData,
col="#37aad8",
xlab="Value",
main=paste("Distribution of myData"))
EXERCISE #1&2:
set.seed(150)
myData <- rnorm(1000,30,2.5)
length(myData)
## [1] 1000
mean(myData)
## [1] 29.92068
sd(myData)
## [1] 2.475175
set.seed(150)
sample.size <- 1000
n.samples <- 50
bootstrap.results <- c()
for (i in 1:n.samples)
{
bootstrap.results[i] <- mean(rnorm(2000,30,2.5))
}
length(bootstrap.results)
## [1] 50
summary(bootstrap.results)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.82 29.95 30.02 30.01 30.06 30.14
sd(bootstrap.results)
## [1] 0.06822818
par(mfrow=c(2,1), pin=c(5.8,0.98))
hist(bootstrap.results,
col="#d83737",
xlab="Mean",
main=paste("Means of 50 bootstrap samples from the DGP"))
hist(myData,
col="#37aad8",
xlab="Value",
main=paste("Distribution of myData"))
Download and bring in the “trees.csv” file from Canvas, call it “trees”. This dataset contains tree measurements from 31 individual trees. We are going to apply bootstrapping to repeatedly sample from this sample!
Perform the following tasks: 1. Code: determine the sample size, the SD of height, and the median of height 2. Code: copy/paste all of the code from section 2.1 in th link above, but you will need to replace certain values: a. you do not need the set.seed() code b. sample.size should be the sample size from our trees data c. any mention of “myData” should be replaced with “trees$Height”
trees<- read.table('trees.csv', ',', header = TRUE)
trees
## rownames Girth Height Volume
## 1 1 8.3 70 10.3
## 2 2 8.6 65 10.3
## 3 3 8.8 63 10.2
## 4 4 10.5 72 16.4
## 5 5 10.7 81 18.8
## 6 6 10.8 83 19.7
## 7 7 11.0 66 15.6
## 8 8 11.0 75 18.2
## 9 9 11.1 80 22.6
## 10 10 11.2 75 19.9
## 11 11 11.3 79 24.2
## 12 12 11.4 76 21.0
## 13 13 11.4 76 21.4
## 14 14 11.7 69 21.3
## 15 15 12.0 75 19.1
## 16 16 12.9 74 22.2
## 17 17 12.9 85 33.8
## 18 18 13.3 86 27.4
## 19 19 13.7 71 25.7
## 20 20 13.8 64 24.9
## 21 21 14.0 78 34.5
## 22 22 14.2 80 31.7
## 23 23 14.5 74 36.3
## 24 24 16.0 72 38.3
## 25 25 16.3 77 42.6
## 26 26 17.3 81 55.4
## 27 27 17.5 82 55.7
## 28 28 17.9 80 58.3
## 29 29 18.0 80 51.5
## 30 30 18.0 80 51.0
## 31 31 20.6 87 77.0
nrow(trees)
## [1] 31
sd(trees$Height)
## [1] 6.371813
median(trees$Height)
## [1] 76
sample.size <- nrow(trees)
n.samples <- 1000
bootstrap.results <- c()
for (i in 1:n.samples)
{
bootstrap.results[i] <- mean(rnorm(2000,20,4.5))
}
length(bootstrap.results)
## [1] 1000
summary(bootstrap.results)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.70 19.94 20.00 20.00 20.06 20.29
sd(bootstrap.results)
## [1] 0.09834086
par(mfrow=c(2,1), pin=c(5.8,0.98))
hist(bootstrap.results,
col="#d83737",
xlab="Mean",
main=paste("Means of 1000 bootstrap samples from the DGP"))
hist(trees$Height,
col="#37aad8",
xlab="Value",
main=paste("Distribution of myData"))