HW 3, Alex Matteson

Home Work 3

Promblem 3. c.)

library(tidyverse)

## ── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

x2 <- c(4.5,-9,15.25, .75,15.75,-9,-20.25,.25,3.75,-20.25,3.75,-10.5,1.5,16.5,15.75,4.5,1.5,16.5,1.5,4.5,4.5)
mean(x2)

## [1] 1.988095

hist(x2)

d.)

x2 >= 15.75

##  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE

6/20

## [1] 0.3

.3 or 30% are greater than the original value in part a

e.)

x3<-c(4.9,9,15.75,20.25,.75,20.25,.25,3.75,10.5,1.5,16.5,4.5,1.5,19.5,21,10.75)
x3>=15.75

##  [1] FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE
## [13] FALSE  TRUE  TRUE FALSE

6/15

## [1] 0.4

So this P value is .4 or 40%. Which is some what close to my estimated one of .3

Problem 4

a.)

nspines <- read.csv("nspines.csv")
#head(nspines)
#names(nspines)
by_region <- nspines %>% group_by(ns)
ggplot(by_region, aes(x=ns, y=dbh))+geom_boxplot()

seems like we could do T procedures, but it’s right on the edge. Not really normal but a decent sized sample.

b.)

north<- nspines%>% filter(ns == "n") %>% summarise (meandbh = mean(dbh))
north

south<- nspines%>% filter(ns == "s") %>% summarise (meandbh = mean(dbh))
south

north - south

c.)

north<- nspines%>% filter(ns == "n") 
south<- nspines%>% filter(ns == "s") 


bootStrapCI2<-function(data1, data2, nsim){
  
  n1<-length(data1)
  n2<-length(data2)
  
  bootCI2<-c()
  
  for(i in 1:nsim){
    bootSamp1<-sample(1:n1, n1, replace=TRUE)
    bootSamp2<-sample(1:n2, n2, replace=TRUE)
    thisXbar<-mean(data1[bootSamp1])-mean(data2[bootSamp2])
    bootCI2<-c(bootCI2, thisXbar)
  }
  
  return(bootCI2)
}

treeBoot = bootStrapCI2(north$dbh, south$dbh, nsim = 1000)
hist(treeBoot)

d.)

# Quantile Method 
quantile(treeBoot, c(0.025, 0.975))

##      2.5%     97.5% 
## -18.05683  -2.32900

se<-sd(treeBoot)
-10.83333+c(-1,1)*qt(0.975, df = 1000)*se

## [1] -18.73837  -2.92829

e.) Well it’s a decently big size sample, like 60. It’s not normally distributed. We should be fine to use the hybrid size becasue it’s a biggish sample size.

f.) They are very close (see other problem) but not exactly the same. I would use the boot strap method because than we don’t have to worry about sample size or skew and this sample seems right on the edge.