In collaboration with Bryce O’Connor and Sam Nurmi
mean of treatment = 59 mean of control = 43.25 59 - 43.25 = 15.75
I created a vector using the data then used the sample tool with no replacement
drp <- c(57, 61, 42, 62, 41, 28)
sample(drp, 1, replace = FALSE, prob= NULL)
## [1] 41
using this the frst two values I got would be used as the treatment results 61, 62
and the remaining values the control 57, 42, 41, 28
sim 1:
treatment: 61, 62
control: 57, 42, 41, 28
observed value = 61.5 - 42 = 19.5
sim 2:
treatment:61, 41
control: 57, 42, 62, 28
observed value = 51 - 47.25 = 3.77
sim 3:
treatment: 41, 61
control: 57, 42, 62, 28
observed value = 51 - 47. 25 = 3.75
sim 4:
treatment: 41, 57
control: 62, 42, 61, 28
observed value = 49 - 48.25 = 0.75
sim 5:
treatment: 61, 41
control: 57, 42, 62, 28
observed value = 51 - 47.25 = 3.77
sim 6:
treatment: 57, 61
control: 62, 42, 41, 28
observed value = 59 - 43.25 = 15.75
sim 7:
treatment: 42, 61
control: 57, 62, 41, 28
observed value = 51.5 - 47 = 4.5
sim 8:
treatment: 57, 28
control: 61, 42, 41, 62
observed value = 42.5 - 51.5 = -9
Sim 9:
treatment: 28, 41
control: 57, 42, 61, 62
observed value = 34.5 - 55.5 = -21
sim 10:
treatment: 41, 62
control: 57, 42, 61, 28
observed value = 51.5 - 47 = 4.5
sim 11:
treatment: 42, 28
control: 57, 41, 61, 62
observed value = 35 - 55.25 = -20.25
sim 12:
treatment: 28, 41
control: 57, 42, 61, 62
observed value = 34.5 - 55.5 = -21
sim 13:
treatment: 62, 28
control: 57, 42, 61, 41
observed value = 45 - 50.25 = -5.25
sim 14:
treatment: 28, 41
control: 57, 42, 61, 62
observed value = 34.5 - 55.5 = -21
sim 15:
treatment: 28, 57
control: 41, 42, 61, 62
observed value = 42.5 - 51.5 = -9
sim 16:
treatment: 41, 62
control: 57, 42, 61, 28
observed value = 51.5 - 47 = 4.5
sim 17:
treatment: 57, 28
control: 41, 42, 61, 62
observed value = 42.5 - 51.5 = -9
sim 18:
treatment: 57, 42
control: 41, 28, 61, 62
observed value = 49.5 - 48 = 1.5
sim 19:
treatment: 57, 42
control: 41, 28, 61, 62
observed value = 49.5 - 48 = 1.5
sim 20:
treatment: 57, 41
control: 42, 28, 61, 62
observed value = 49 - 48.25 = 0.75
then I made a histogram
## Warning: package 'tidyverse' was built under R version 3.6.2
## -- Attaching packages -------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'readr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## Warning: package 'stringr' was built under R version 3.6.2
## Warning: package 'forcats' was built under R version 3.6.2
## -- Conflicts ----------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
vals <- c(-21, 1.5, 0.75, -9, -9, -9, -5.25, -21, 3.77, -20.25, 4.5, 15.75, 4.5, 19.5, -21, 0.75, 3.77, 4.5, 1.5, 3.75)
valmatrix <- matrix(vals, nrow=20)
valmatrix <- as.data.frame(valmatrix)
ggplot(valmatrix, aes(vals))+
geom_histogram(bins = 20, color = 'black', fill = 'red')+
theme_bw()+
labs(title <- 'Run Simulation P-Values', x = 'Sim P-Value')
I found the estimated p-value based on the values above estimated p-value = 0.1
p-value = 2/15= 0.133 so our estimated value was not far off the actual p value
trees = read.csv("nspines.csv",
header=TRUE,
na.strings = "?")
boxplot1<-ggplot(trees, aes(x=factor(ns), y=dbh, fill=factor(ns)))+
geom_boxplot()+
theme_bw()+
ggtitle("Side-by-Side Boxplot Plot of Tree location by DBH")
print(boxplot1)
north = trees[1:30,2]
south = trees[31:60,2]
observedStat = mean(north)- mean(south)
observedStat
## [1] -10.83333
bootStrapCI2<-function(data1, data2, nsim){
n1<-length(data1)
n2<-length(data2)
bootCI2<-c()
for(i in 1:nsim){
bootSamp1<-sample(1:n1, n1, replace=TRUE)
bootSamp2<-sample(1:n2, n2, replace=TRUE)
thisXbar<-mean(data1[bootSamp1])-mean(data2[bootSamp2])
bootCI2<-c(bootCI2, thisXbar)
}
return(bootCI2)
}
DBHBoot <- bootStrapCI2(north,south, nsim = 10000)
hist(DBHBoot)
quantile(DBHBoot, c(0.025, 0.975))
## 2.5% 97.5%
## -18.686750 -2.839917
se <- sd(DBHBoot)
mean(north) - mean(south) + c(-1,1) * qt(0.975, df=29) * se
## [1] -19.132659 -2.534008
The ditribution is close to normal, since the distribution is not really skewed the hybrid method should be effective.
The diffrence between the two confidence intervals is small however I prefer the quantile method since it does not have any assumptions about the distribution. I imagine this one can be used in most situations.