I pledge on my honor that i have not given or recieved any unauthorized assistance on an assignment or examination

1 Bootstrap estimates

In this assignment we will explore the technique of bootstrap estimation.

Load the iris dataset

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data(iris)

Use the filter command to get the dataframe iris_set, corresponding to the species Setosa.

iris_set <- filter(iris, Species == "setosa")

Extract the four variables of iris_set into the vectors: pet_len, pet_wid, sep_len, sep_wid

pet_len <- iris_set$Petal.Length
pet_wid <- iris_set$Petal.Width
sep_len <- iris_set$Sepal.Length
sep_wid <- iris_set$Sepal.Width

Recall that a bootstrap sample is obtained by sampling with replacement a sample of size n from a sample of n data values. A bootstrap point estimate will then be the point estimator evaluate on a given bootstrap sample. We will calculate bootstrap estimates for some standard estimators, calculate the empirical expected values, and the bootstrap standard error.

1.1 Bootstrap estimates for the sample mean

Calculate the bootstrap estimates for the sample mean, store these in the vector “boot_means_pet_len” etc and then calculate the standard deviation of this vector to estimate the standard error for sample mean as the point estimator (compare this with the standard error estimated by the Central limit theorem). Note down your observations.

set.seed(2020)

B <- 100000
boot_means_pet_len <- replicate(B, mean(sample(pet_len, size = length(pet_len), replace = TRUE)))
mean(pet_len)

## [1] 1.462

sd(pet_len)/sqrt(50) #standard error from Central Limit Theorem

## [1] 0.0245598

mean(boot_means_pet_len)

## [1] 1.461951

sd(boot_means_pet_len) #bootstrap standard error

## [1] 0.02433344

set.seed(2020)

B <- 100000
boot_means_pet_wid <- replicate(B, mean(sample(pet_wid, size = length(pet_wid), replace = TRUE)))
mean(pet_wid)

## [1] 0.246

sd(pet_wid)/sqrt(50) #standard error from Central Limit Theorem

## [1] 0.01490377

mean(boot_means_pet_wid)

## [1] 0.246004

sd(boot_means_pet_wid) #bootstrap standard error

## [1] 0.01474575

set.seed(2020)

B <- 100000
boot_means_sep_len <- replicate(B, mean(sample(sep_len, size = length(sep_len), replace = TRUE)))
mean(sep_len)

## [1] 5.006

sd(sep_len)/sqrt(50) #standard error from Central Limit Theorem

## [1] 0.04984957

mean(boot_means_sep_len)

## [1] 5.005718

sd(boot_means_sep_len) #bootstrap standard error

## [1] 0.04940281

set.seed(2020)

B <- 100000
boot_means_sep_wid <- replicate(B, mean(sample(sep_wid, size = length(sep_wid), replace = TRUE)))
mean(sep_wid)

## [1] 3.428

sd(sep_wid)/sqrt(50) #standard error from Central Limit Theorem

## [1] 0.0536078

mean(boot_means_sep_wid)

## [1] 3.427784

sd(boot_means_sep_wid) #bootstrap standard error

## [1] 0.05318353

In all cases bootstrap standard errors and standard error estimates from the Central Limit Theorem matched up to three decimal points. Bootstrap mean and sample mean were also very close.

1.2 Bootstrap Estimates for the sample standard deviation.

Calculate the bootstrap estimates for the sample standard deviation, store these in the vector “boot_sd_pet_len” etc and then calculate the standard deviation of this vector to estimate the standard error for sample standard deviation as the point estimator. Note down your observations.

set.seed(2020)

B <- 100000
boot_sd_pet_len <- replicate(B, sd(sample(pet_len, size = length(pet_len), replace = TRUE)))
sd(pet_len)

## [1] 0.173664

mean(boot_sd_pet_len)

## [1] 0.1706416

sd(boot_sd_pet_len)

## [1] 0.02061946

set.seed(2020)

B <- 100000
boot_sd_pet_wid <- replicate(B, sd(sample(pet_wid, size = length(pet_wid), replace = TRUE)))
sd(pet_wid)

## [1] 0.1053856

mean(boot_sd_pet_wid)

## [1] 0.1034229

sd(boot_sd_pet_wid)

## [1] 0.01384665

set.seed(2020)

B <- 100000
boot_sd_sep_len <- replicate(B, sd(sample(sep_len, size = length(sep_len), replace = TRUE)))
sd(sep_len)

## [1] 0.3524897

mean(boot_sd_sep_len)

## [1] 0.3474428

sd(boot_sd_sep_len)

## [1] 0.0322138

set.seed(2020)

B <- 100000
boot_sd_sep_wid <- replicate(B, sd(sample(sep_wid, size = length(sep_wid), replace = TRUE)))
sd(sep_wid)

## [1] 0.3790644

mean(boot_sd_sep_wid)

## [1] 0.3727383

sd(boot_sd_sep_wid)

## [1] 0.04404392

Bootstrap estimates of standard deviations and sample standard deviations were close, in all cases we got sample standard deviation a little bit higher than the bootstrap estimate. We also observed that the estimated standard deviations and their estimated standard errors were higher for sepals, probably because of a larger scale of this part of the Iris Setosa flowers.

1.3 Bootstrap Estimates for area of petal.

Recall that the area of an ellipse with major axis \(a\) and minor axis \(b\) is given by \[ \pi a b.\] Under the assumption that the shape of the petal of the iris flower is approximately an ellipse with the major and minor axes given by petal length and petal width (or vice verse), we can estimate the petal surface area (only one side) of iris of Setosa species by the following estimator: \[ \hat\theta = \pi \cdot \overline{X}\cdot\overline{Y},\] where \(\overline{X}\) and \(\overline{Y}\) are the sample means of petal length and petal width respectively. \ Estimate the surface area of the petal of a iris Setosa flower using the given dataset.

petal_area_set <- pi*mean(pet_len)*mean(pet_wid)
petal_area_set

## [1] 1.12988

Now use bootstrap sampling to estimate the standard error in your estimated petal surface area. Make sure you identify the correct output.

set.seed(2020)

B <- 100000
boot_pet_area_set <- replicate(B, {
  x <- mean(sample(pet_len, size = length(pet_len), replace = TRUE))
  y <- mean(sample(pet_wid, size = length(pet_wid), replace = TRUE))
  pi*x*y
  }
  )

mean(boot_pet_area_set)

## [1] 1.129927

sd(boot_pet_area_set)

## [1] 0.07033764

The bootstrap estimate of the standard error of my estimated petal surface area is 0.07033764.

Bootstrap Estimation

Hana Goffe/ UID 112542202

1 Bootstrap estimates

1.1 Bootstrap estimates for the sample mean

1.2 Bootstrap Estimates for the sample standard deviation.

1.3 Bootstrap Estimates for area of petal.