R Lab 8 BIN510

The purpose of this R lab is to have you all practice working with bootstrapping. There is a really easy-to-follow tutorial in the link below:

https://rpubs.com/evelynebrie/bootstrapping

This tutorial outlines how to generate data with certain parameters (things you are creating but could very easily determine for yourself from your own sample data).

PART 1: For this assignment, I would like you to follow the instruction in the link, copying/pasting code from each section and making sure your results match what is on the page.

There are 2 EXERCISES listed at the bottom of the linked page, please complete those 2 exercises as part of this assignment and include th code and output for them below.

set.seed(300)

myData <- rnorm(2000,20,4.5)

length(myData)

## [1] 2000

mean(myData)

## [1] 20.25773

sd(myData)

## [1] 4.590852

set.seed(200)
sample.size <- 2000
n.samples <- 1000
bootstrap.results <- c()

for (i in 1:n.samples)
{
  obs <- sample(1:sample.size, replace = TRUE)
  bootstrap.results[i] <- mean(myData[obs])
}
length(bootstrap.results)

## [1] 1000

summary(bootstrap.results)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.92   20.19   20.26   20.26   20.33   20.57

sd(bootstrap.results)

## [1] 0.1021229

par(mfrow=c(2,1), pin=c(5.8,0.98))

hist(bootstrap.results,
     col="#d83737",
     xlab="Mean",
     main=paste("Means of 1000 Bootstrap samples from myData"))

hist(myData,
     col="#37aad8",
     xlab="value",
     main=paste("Distribution of myData"))

set.seed(200)
sample.size <- 2000
n.samples <- 1000
bootstrap.results <- c()
for (i in 1:n.samples)
{
  bootstrap.results[i] <- mean(rnorm(2000,20,4.5))
}
length(bootstrap.results)

## [1] 1000

summary(bootstrap.results)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.64   19.93   20.00   20.00   20.07   20.32

sd(bootstrap.results)

## [1] 0.1041927

par(mfrow=c(2,1), pin=c(5.8,0.98)) 

hist(bootstrap.results, 
     col="#d83737",
     xlab="Mean",
     main=paste("Means of 1000 bootstrap samples from the DGP"))

hist(myData,
     col="#37aad8", 
     xlab="Value", 
     main=paste("Distribution of myData"))

EXERCISE #1&2:

set.seed(150)

myData <- rnorm(1000,30,2.5)

length(myData)

## [1] 1000

mean(myData)

## [1] 29.92068

sd(myData)

## [1] 2.475175

set.seed(150)
sample.size <- 1000
n.samples <- 50
bootstrap.results <- c()
for (i in 1:n.samples)
{
  bootstrap.results[i] <- mean(rnorm(2000,30,2.5))
}
length(bootstrap.results)

## [1] 50

summary(bootstrap.results)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.82   29.95   30.02   30.01   30.06   30.14

sd(bootstrap.results)

## [1] 0.06822818

par(mfrow=c(2,1), pin=c(5.8,0.98))

hist(bootstrap.results,
     col="#d83737", 
     xlab="Mean", 
     main=paste("Means of 50 bootstrap samples from the DGP")) 

hist(myData, 
     col="#37aad8", 
     xlab="Value", 
     main=paste("Distribution of myData"))

PART 2: applying the above to real sample data

Download and bring in the “trees.csv” file from Canvas, call it “trees”. This dataset contains tree measurements from 31 individual trees. We are going to apply bootstrapping to repeatedly sample from this sample!

Perform the following tasks: 1. Code: determine the sample size, the SD of height, and the median of height 2. Code: copy/paste all of the code from section 2.1 in th link above, but you will need to replace certain values: a. you do not need the set.seed() code b. sample.size should be the sample size from our trees data c. any mention of “myData” should be replaced with “trees$Height”

trees<- read.table('trees.csv', ',', header = TRUE)
trees

##    rownames Girth Height Volume
## 1         1   8.3     70   10.3
## 2         2   8.6     65   10.3
## 3         3   8.8     63   10.2
## 4         4  10.5     72   16.4
## 5         5  10.7     81   18.8
## 6         6  10.8     83   19.7
## 7         7  11.0     66   15.6
## 8         8  11.0     75   18.2
## 9         9  11.1     80   22.6
## 10       10  11.2     75   19.9
## 11       11  11.3     79   24.2
## 12       12  11.4     76   21.0
## 13       13  11.4     76   21.4
## 14       14  11.7     69   21.3
## 15       15  12.0     75   19.1
## 16       16  12.9     74   22.2
## 17       17  12.9     85   33.8
## 18       18  13.3     86   27.4
## 19       19  13.7     71   25.7
## 20       20  13.8     64   24.9
## 21       21  14.0     78   34.5
## 22       22  14.2     80   31.7
## 23       23  14.5     74   36.3
## 24       24  16.0     72   38.3
## 25       25  16.3     77   42.6
## 26       26  17.3     81   55.4
## 27       27  17.5     82   55.7
## 28       28  17.9     80   58.3
## 29       29  18.0     80   51.5
## 30       30  18.0     80   51.0
## 31       31  20.6     87   77.0

nrow(trees)

## [1] 31

sd(trees$Height)

## [1] 6.371813

median(trees$Height)

## [1] 76

sample.size <- nrow(trees)
n.samples <- 1000
bootstrap.results <- c()
for (i in 1:n.samples)
{
  bootstrap.results[i] <- mean(rnorm(2000,20,4.5))
}
length(bootstrap.results)

## [1] 1000

summary(bootstrap.results)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.70   19.94   20.00   20.00   20.06   20.29

sd(bootstrap.results)

## [1] 0.09834086

par(mfrow=c(2,1), pin=c(5.8,0.98)) 

hist(bootstrap.results, 
     col="#d83737",
     xlab="Mean",
     main=paste("Means of 1000 bootstrap samples from the DGP"))

hist(trees$Height,
     col="#37aad8", 
     xlab="Value", 
     main=paste("Distribution of myData"))

R Lab 8 BIN510

Gordon Ober

2024-07-26

The purpose of this R lab is to have you all practice working with bootstrapping. There is a really easy-to-follow tutorial in the link below:

PART 1: For this assignment, I would like you to follow the instruction in the link, copying/pasting code from each section and making sure your results match what is on the page.

PART 2: applying the above to real sample data