STAT631

#part 1

library(lsmeans)

## Warning: package 'lsmeans' was built under R version 3.6.2

## Loading required package: emmeans

## Welcome to emmeans.
## NOTE -- Important change from versions <= 1.41:
##     Indicator predictors are now treated as 2-level factors by default.
##     To revert to old behavior, use emm_options(cov.keep = character(0))

## The 'lsmeans' package is now basically a front end for 'emmeans'.
## Users are encouraged to switch the rest of the way.
## See help('transition') for more information, including how to
## convert old 'lsmeans' objects and scripts to work with 'emmeans'.

library(car)

## Warning: package 'car' was built under R version 3.6.2

## Loading required package: carData

brand =as.factor(rep(c(1,2,3),each=5))
lives =c(100,96,92,96,92,76,80,75,84,82,108,100,96,98,100)
battery=data.frame(lives,brand)

Since the P value is 6.141e-06, less than 0.05 significant level, we reject the null hypothesis. Therefore, at least one of the brands is different.
H0: $\mu_1=\mu_2=\mu_3$(there’s no difference in lives between these brands of batteries).H1: $_i _j $ for at least one pair of $i\neq j$( The lives of these brands of batteries are different)

mod1=aov(lives~brand, data=battery);anova(mod1)

No clear patterns observed in the QQ plot. Since p-value=0.2457 less than 0.05, we accept the null hypothesis of normality.

plot(mod1,which=2)

shapiro.test(mod1$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  mod1$residuals
## W = 0.92696, p-value = 0.2457

There’s no clear patterns in the residuals vs fitted plot and P value of the lebenetest is 0.9328 >0.05.It is reasonable to assume constant variance.

plot(mod1,which=1)

leveneTest(lives~brand,data=battery)

brand1-brand2 95%CI is (9.14, 22.46 ). brand1-brand3 95%CI is (-11.86, 1.46 ).brand2-brand3 95%CI is (-27.66, -14.34 ).

lsmbrand=lsmeans(mod1,~brand);lsmbrand

##  brand lsmean   SE df lower.CL upper.CL
##  1       95.2 1.77 12     91.4     99.0
##  2       79.4 1.77 12     75.6     83.2
##  3      100.4 1.77 12     96.6    104.2
## 
## Confidence level used: 0.95

tk=summary(contrast(lsmbrand,method='pairwise',adjust = 'tukey'),infer = c(T,T), level=0.95, side = 'two-sided');tk

from d, brand2 has the shortest life.
The company expected to replace 56.037% of batteries of brand2.

pnorm((80-79.4)/sqrt(15.6))

## [1] 0.5603714

#Part 2 6.1 (a) Since no clear patterns observed in the QQ plot and P value of shapiro test is 0.3454 > 0.05,assumption of normality is satisfied. Since no clear patterns observed in residuals vs fitted plot and P value of Levene’s Test is 0.3332 greater than 0.05, assumption of constant variance is satisfied. The data doesn’t need a transformation.

dat<-read.table("pr6.1",header = TRUE)
dat$ftrt<- as.factor(dat$trt)
mod61=aov(y~ftrt,data=dat);plot(mod61,which=c(1,2))

shapiro.test(mod61$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  mod61$residuals
## W = 0.95494, p-value = 0.3454

leveneTest(y~ftrt,data=dat)

Ho: $\mu_1=\mu_2=\mu_3=\mu_4=\mu_5=\mu_6$. H1: $_i _j $ for at least one pair of $i\neq j$. Since the F ratio is 71.2 and p value is 3.2e-11 less than 0.05, we reject the null hypothesis that all the treatments are equvalent.Not all the treatments are equivalent.

summary(mod61)

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## ftrt         5   6398    1280    71.2 3.2e-11 ***
## Residuals   18    323      18                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It seems that there is a negative linear relationship between Response and Nitrogen levels under noirrigations when we simply plot the nitrogen level and effects.There are no quadratic effects. There are no significant quadratic effects of nitrogen under nonirrigated conditions since p value =0.3294 >0.05. In addition, The coefficient of the quadratic term is not sigfinicant since p-value is 0.3238 when we fit a regression model.

library(emmeans)
lsmTrt<-lsmeans(mod61,~ftrt);lsmTrt

##  ftrt lsmean   SE df lower.CL upper.CL
##  1      95.0 2.12 18     90.5     99.5
##  2      82.2 2.12 18     77.8     86.7
##  3      81.5 2.12 18     77.0     86.0
##  4      68.2 2.12 18     63.8     72.7
##  5      50.5 2.12 18     46.0     55.0
##  6      52.0 2.12 18     47.5     56.5
## 
## Confidence level used: 0.95

#Subset data set excluding trt=2 and trt=6 groups
dat2<-dat[which(dat$trt!=2 & dat$trt!=6),]
#Add nitrogen level
nlev_f<-function(x){
if(x==1){1}else if(x==3){2} else if(x==4){3} else {4}
}
dat2$nlevel<- sapply(dat2$trt,nlev_f)
plot(dat2$nlevel, dat2$y)

summary( contrast(lsmTrt,list("quadratic effec"=c(1,0,-1,-1,1,0))), infer=c(T,T), level=0.95, side="two-sided" )

reg_irri<- lm(y~nlevel+I(nlevel^2),data = dat2)
summary(reg_irri)

## 
## Call:
## lm(formula = y ~ nlevel + I(nlevel^2), data = dat2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7375 -3.2938  0.7375  2.3750  6.4625 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  105.188      5.768  18.235 1.21e-10 ***
## nlevel        -9.362      5.262  -1.779   0.0986 .  
## I(nlevel^2)   -1.063      1.036  -1.026   0.3238    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.144 on 13 degrees of freedom
## Multiple R-squared:  0.9509, Adjusted R-squared:  0.9434 
## F-statistic: 125.9 on 2 and 13 DF,  p-value: 3.099e-09

Yes, there is a significant effect of irrigation. ata supports that irrigation has a negative effect on big bluestem.A positive confidence interval means that the plots without irrigation performs better than those plots with irrigation on average across all levels of nitrogen with 95% confidence.

summary( contrast(lsmTrt,list("irre effect"=c(1/2,-1/2,0,0,1/2,-1/2))), infer=c(T,T), level=0.95, side="two-sided" )

The 95% simultaneous confidence intervals for the difference between treatment groups and control group(treatment 1) doesn’t include 0 with negative endpoints. This indicates that the mean response of the 1st group is significantly larger than those of the other groups.There is a strong evidence that the trt 1 gives us the best result of big bluestem.

summary(contrast(lsmTrt, method="trt.vs.ctrl", adjust="mvt",ref=1), infer=c(T,T), level=0.95, side="two-sided")

7.2 Sample size 77 we would need for each group.

groupmeans <- c(10, 11, 11)
p <- power.anova.test(groups = length(groupmeans), between.var = var(groupmeans), within.var = 4, power=0.9,sig.level=0.05);p

## 
##      Balanced one-way analysis of variance power calculation 
## 
##          groups = 3
##               n = 76.93183
##     between.var = 0.3333333
##      within.var = 4
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

7.4 false;false

STAT631_hw3

YJ

March 17, 2019