R Markdown

19.12, 19.13, 19.31, 19.39, 20.8, 20.9, 20.10, 21.9, 21.10, 21.20

Question 1 - 19.12

a)

## [1] 9.2

## [1] 13

## [1] 13.6

## [1] 16.4

##    1    6   11   16 
##  9.2 13.6 13.0 16.4

Part b

#av$fitted.values
round(sum(av$residuals),6) # Sum to zer

## [1] 0

The residuals sum to 0

Part c

plot(av,1)

Using this plot we cannot find major departures from: \[\sum_i(\alpha\beta)_{ij} = \sum_j(\alpha\beta)_{ij} =0$ and $\epsilon \sim N(0,\sigma^2)\]

Part D - Normal Probability Plot

plot(av,2)

n = length(av$residuals)
se = sqrt(sum(av$residuals^2/n))
exp_values =  sapply(1:n, function(k) se * qnorm((k-.375)/(n+.25)))
cor(exp_values,sort(av$residuals))

## [1] 0.9762576

$\hat{\rho} = 0.9762576$ Using $\alpha =0.05$ and table B.6, $\hat{\rho} = 0.9762576>.951$ so we fail to reject normality. So yes, normality seems reasonable ## Part E

plot(df$Rating[(df$Gender == "Male") & (df$Eye=="Presnt")],residuals(av)[1:5])

plot(df$Rating[(df$Gender == "Male") & (df$Eye=="Absent")],residuals(av)[6:10])

plot(df$Rating[(df$Gender == "Female") & (df$Eye=="Presnt")],residuals(av)[11:15])

plot(df$Rating[(df$Gender == "Female") & (df$Eye=="Absent")],residuals(av)[16:20])

#or 
plot(residuals(av)[1:5])

plot(residuals(av)[6:10])

plot(residuals(av)[11:15])

plot(residuals(av)[16:20])

#plot(df$V4,residuals(av))

They seem to be growing (not sure about this one)

Question 2 - 19.13

Part A

interaction.plot(df$Gender, df$Eye, df$Rating, fun = mean)

Yes there seems to be factor effect since the lines are far apart.

Part B

anova(av)

## Analysis of Variance Table
## 
## Response: Rating
##            Df Sum Sq Mean Sq F value   Pr(>F)   
## Gender      1  76.05  76.050 12.5185 0.002734 **
## Eye         1  54.45  54.450  8.9630 0.008589 **
## Gender:Eye  1   1.25   1.250  0.2058 0.656202   
## Residuals  16  97.20   6.075                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It seems that both sources account for the variation, not just one.

Part C

\[H_0: (\alpha_i\beta_i) = 0 \forall i\] \[H_1:\neg H_0\]

qf(.99,1,16)

## [1] 8.530965

Since $8.530965=F>F^* =0.2058$ we conclude $H_0$ and that the interaction terms are not significant.

Also the p value from the ANOVA table is $p_{value} = 0.656202 > .01$

Part D

They all have the same $F$, and the $F^*$ differ in each case. For $\alpha$ \[H_0: (\alpha_i) = 0 \forall i\] \[H_1:\neg H_0\] For $\beta$ \[H_0: (\beta_i) = 0 \forall i\] \[H_1:\neg H_0\] In both cases $F^*_\alpha = 8.9630 > 8.53 = F$ & $F^*_\beta = 12.5185 > 8.53 =F$ So in both cases we conclude $H_1$ and that the main effects are significant. The P values are $p_\alpha = 0.008589$, $p_{\beta} = 0.002734$ ## Part e \[\alpha \leq 1-(.99)^3 \leq 0.03\] ## Part f The results from C,D agree with the graph in part A. There are factor effects and no interaction effects

CHECK

It is not meaningful to test here since we have an unimportant factor still in the ANOVA model. # Question 3 - 19.31 ## Part a

av = aov(data=df,Rating~Eye+Gender+Eye*Gender)
av$fitted.values

##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
##  9.2  9.2  9.2  9.2  9.2 13.6 13.6 13.6 13.6 13.6 13.0 13.0 13.0 13.0 13.0 
##   16   17   18   19   20 
## 16.4 16.4 16.4 16.4 16.4

model.tables(av,se=T)

## Tables of effects
## 
##  Eye 
## Eye
## Absent Presnt 
##   1.65  -1.65 
## 
##  Gender 
## Gender
## Female   Male 
##   1.95  -1.95 
## 
##  Eye:Gender 
##         Gender
## Eye      Female Male 
##   Absent -0.25   0.25
##   Presnt  0.25  -0.25
## 
## Standard errors of effects
##            Eye Gender Eye:Gender
##         0.7794 0.7794     1.1023
## replic.     10     10          5

t = qt(.995,16)
print("Confidence interval:")

## [1] "Confidence interval:"

13+1.1023*t*c(-1,1)

## [1]  9.780422 16.219578

The fitted values of $Y_{21.}=13$ and $s(Y_{21})=1.1023$, $t(0.995,16)=2.921$. The confidence interval is then $9.780422\leq \mu_{21} \leq 16.219578$

Part B

mu = mean(df$Rating[df$Eye=="Presnt"])
mu+t*0.7794*c(-1,1)

## [1]  9.123543 13.676457

$\hat{Y_{1..}} = 11.4$, $s(\hat{Y_{1..}}) = 0.7794$ from the table above. Using the same t value we get this confidence interval: $9.123543 \leq \mu_{1.}\leq 13.676457$

Part C

mu.1 = mean(df$Rating[df$Gender=="Male"])
mu.2 =mean(df$Rating[df$Gender=="Female"])
barplot(c(mu.1,mu.2),names.arg=c("Male","Female"),main = "Means of factor B")

This plot suggest that Females tend to have higher ratings. ## Part D

# Y1.
mu.1 + 0.7794*t*c(-1,1)

## [1]  8.823543 13.376457

mu.2 + 0.7794*t*c(-1,1)

## [1] 12.72354 17.27646

$8.823543 \leq \mu_{.1} \leq 13.376457$, $12.72354 \leq \mu_{.2} \leq 17.27646$

The family confidence coefficient fo the set of two estimates is $1-.01-.01 =.98$ .

Part E

mu1. = mean(df$Rating[df$Eye=="Presnt"])
mu2. =mean(df$Rating[df$Eye=="Absent"])
barplot(c(mu.1,mu.2),names.arg=c("Present","Absent"),main = "Means of factor A")

This plot suggests that no eye contact has higher rating

Part f

d1 = mu2.-mu1.
d2 = mu.2-mu.1
s.d1 = sqrt(4*6.075/(length(df$Rating)))
t.bonf = qt(1-.05/4,16)
d1+s.d1*t.bonf*c(-1,1)

## [1] 0.5742195 6.0257805

d2+s.d1*t.bonf*c(-1,1)

## [1] 1.174219 6.625781

So we get: $0.5742195 \leq D_1\leq 6.0257805$ and $1.174219 \leq D_2 \leq 6.625781$

Part G

Yes since we are making a few pairwise comparison which are specifically of interest in advance of this study.

Question 4 - 19.39

floor((2.32/(1.2/2.4))^2)

## [1] 21

Using table B.13, with $\lambda =1.2$, $\sigma =2.4$ we get the proper value of n should be 21. # Question 5 - 20.8

df = read.table("http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Chapter%2020%20Data%20Sets/CH20PR08.txt")
df$Color = df$V1
df$Humidity = factor(df$V2)
df$Temp = factor(df$V3)
interaction.plot(df$Temp,df$Humidity, df$Color, fun = mean)

av = aov(Color~Humidity+Temp,data=df)

It appears that there are interactions. Main effects may be present but seem small.

Part b

anova(av)

## Analysis of Variance Table
## 
## Response: Color
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## Humidity   2   2.122   1.061  0.9666    0.4326    
## Temp       3 202.200  67.400 61.4123 6.781e-05 ***
## Residuals  6   6.585   1.097                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We will test the following on $alpha = .025$ \[H_0: (\alpha_i) = 0 \forall i\] \[H_1:\exists i: (\alpha_i) \neq 0 \] \[H_0: (\beta_i) = 0 \forall i\] \[H_1:\exists i: (\beta_i) \neq 0 \] Our $p_a = 0.4326>0.025$ so we fail to reject the null so that $\alpha$ are all 0.

Our $p_b \approx 0 < 0.025$ so we reject the null and conclude that at least one of the $\beta$’s is not 0.

Part C

y.1 = mean(df$Color[df$Temp==1])
y.2 = mean(df$Color[df$Temp==2])
y.3 = mean(df$Color[df$Temp==3])
y.4 = mean(df$Color[df$Temp==4])
d1 = y.2 -y.1
d2 = y.3-y.2
d3=y.4-y.3
s.d = sqrt(8*1.097/(length(df$Temp))) 
bonf = qt(1-.025/3,6) 
d1 +(s.d*bonf)*c(-1,1)

## [1] -2.411367  3.211367

d2 +(s.d*bonf)*c(-1,1)

## [1] 2.588633 8.211367

d3 +(s.d*bonf)*c(-1,1)

## [1] 1.321966 6.944700

Part D

The bonferoni is the most efficient here since these differences are of interest before the expirement. # Question 6 - 20.9 ## A)

y2. = mean(df$Color[df$Humidity==2])
u.23.hat = y2.+y.3 -mean(df$Color)
u.23.hat

## [1] 21.09167

Part B

s.hat = 1.097
var.mu23 = s.hat*(6)/(12) #(\sigma^2(a+b+1)(ab))
var.mu23

## [1] 0.5485

$Var(\hat\mu_{ij}) = \hat\sigma^2(a+b+1)(ab) = 1.097(6)/(12) = 0.5485$

C NOT SURE

sig.mu23 = c(sqrt(var.mu23))
u.23.hat + qt(1-.02/2,6)*sig.mu23*c(-1,1)

## [1] 18.76418 23.41915

There is a 98% probability that the interval above will cover the true value of $\mu_{23}$

mu.23.hat = sin(u.23.hat/2)^2

The interval will not be applicapable if the two factors interact and the interpretation of the arcsine transformed variable will change.

Question 7 - 20.10

We are testing ($\alpha = 0.005$): \[H_0:D=0\] \[H_a:D\neq0\]

require(asbio)

## Loading required package: asbio

## Warning: package 'asbio' was built under R version 3.4.3

## Loading required package: tcltk

tukey.add.test(df$Color, df$Humidity, df$Temp)

## 
## Tukey's one df test for additivity 
## F = 2.0481273   Denom df = 5    p-value = 0.2118051

So we fail to reject the null hypothesis and conclude that $D=0$.

If the additive model to work we can preform other transformations on Y via boxcox or other methods. # Question 8 - 21.9

Part a

Pain tolerance of the subject is a blocking variable because it is not something that can be controlled by an expirement. It is an external factor observed in the expirement.

Part b

The assumption of no interactions. Accupuncture and pain tolerance will likely interect

Part c

av = aov(data =df, Score~Block+A+B+A*B )
plot(av,1)

plot(av,2)

n = length(av$residuals)
se = sqrt(sum(av$residuals^2/n))
exp_values =  sapply(1:n, function(k) se * qnorm((k-.375)/(n+.25)))
cor(exp_values,sort(av$residuals))

## [1] 0.9830392

Using table B.6 and $\alpha=0.05$ we get a $\rho = 0.964 < 0.9830392=\hat{\rho}$ so we fail to reject normality.

The Residual vs. Fitted plot suggest that there may be interaction terms between blocking variables and factors

Part d

df$together[df$A==1 & df$B==1] = 11
df$together[df$A==1 & df$B==2] = 12
df$together[df$A==2 & df$B==1] = 21
df$together[df$A==2 & df$B==2] = 22
df$together = factor(df$together)
interaction.plot(df$together, df$Block, df$Score, fun = mean,col = 2:8)

The interactions do not seem severe.

Part E

We will conduct a Tukey Test for additivity with $\alpha =0.01$

\[H_0:(\tau\rho)_{ij}=0 \forall ij\] \[H_0:(\tau\rho)_{ij}\neq0 \exists (i,j)\]

tukey.add.test(df$Score, df$Block, df$together)

## 
## Tukey's one df test for additivity 
## F = 0.3368083   Denom df = 20    p-value = 0.5681593

We have a $p^* = 0.5681593 > .01 =\alpha$ we we fail to reject the null and conclude that there are no interaction terms between the blocking variabels and factors. # Question 9 - 21.10

Part a

anova(av)

## Analysis of Variance Table
## 
## Response: Score
##           Df Sum Sq Mean Sq  F value    Pr(>F)    
## Block      7 5.5987  0.7998  55.2963 4.126e-12 ***
## A          1 2.3112  2.3112 159.7901 2.773e-11 ***
## B          1 3.3800  3.3800 233.6790 7.465e-13 ***
## A:B        1 0.0450  0.0450   3.1111    0.0923 .  
## Residuals 21 0.3037  0.0145                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Part b

We will test for interaction between $\alpha$ and $\beta$ with $\alpha_{error} =.01$ \[H_0: (\alpha\beta)_{ij} = 0 \forall i,j\] \[H_a: (\alpha\beta)_{ij} \neq 0 \exists (i,j)\] The p value for the F test is $ p^* = 0.0923 >_{error}$ so we fail to reject the null hypothesis and conclude that we can remove these interactions.

Part C - NOT SURE

Part D

\[H_0: (\alpha)_{i} = 0 \forall i,\] \[H_a: (\alpha)_{i} \neq 0 \exists i\] \[H_0: (\beta)_{j} = 0 \forall j\] \[H_a: (\beta)_{j} \neq 0 \exists j\]

anova(aov(data =df, Score~Block+A+B))

## Analysis of Variance Table
## 
## Response: Score
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Block      7 5.5987  0.7998  50.455 4.273e-12 ***
## A          1 2.3112  2.3112 145.799 3.528e-11 ***
## B          1 3.3800  3.3800 213.219 8.428e-13 ***
## Residuals 22 0.3487  0.0159                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In both tests we see that the $p_value < \alpha_{error} = .01$. In both cases we conclude that the main effects are significant. The $p_{value} \approx 0$ for both tests.

Part e

l1= mean(df$Score[df$A==1])- mean(df$Score[df$A==2])
l2= mean(df$Score[df$B==1])- mean(df$Score[df$B==2])
s.d = sqrt(0.0159)/3 # sqrt of MSE since c1^2 =c2^2 = 1
bonf = qt(1-.05/2/2,21) # Not sure about this
l1 +(s.d*bonf)*c(-1,1)

## [1] -0.6389581 -0.4360419

l2 +(s.d*bonf)*c(-1,1)

## [1] -0.7514581 -0.5485419

There is a .95 probability that the interval will cover both the true values of $L_1$ and $L_2$

Part f

\[H_0: (\rho)_{i} = 0 \forall i,\] \[H_a: (\rho)_{i} \neq 0 \exists i\] Looking at the ANOVA table $p_{value} \approx 0 <\alpha_{error}$. We reject the null hypothesis and conclude that blocking variables are significant.

Question 10 - 21.20

s_r2 = (7*0.7998+8*(3)*0.0145)/(8*4-1)
E = s_r2/0.0145
E

## [1] 13.22937

Since we have such a high $\hat{E}$ value, we concure that the blocking variable is efficient. Using this blocking variable is much more useful then just using a CRD.

STP531 HW2

Saar Yalov

March 27, 2018

R Markdown

Question 1 - 19.12

a)

Part b

Part c

Part D - Normal Probability Plot

Question 2 - 19.13

Part A

Part B

Part C

Part D

CHECK

Part B

Part C

Part E

Part f

Part G

Question 4 - 19.39

Part b

Part C

Part D

Part B

C NOT SURE

Question 7 - 20.10

Part a

Part b

Part c

Part d

Part E

Part a

Part b

Part C - NOT SURE

Part D

Part e

Part f

Question 10 - 21.20