Problem 1

A genetics study with beef animals consisted of several sires each mated to a separate group of dams. The matings that resulted in male progeny calves were used for an inheritance study of birth weights. The birth weights of eight male calves in each of five sire groups follow.

Part A.

Question: Assume a random model for the study. Write the linear model, explain each of the terms, compute the analysis of variance, and show the expected mean squares.

Answer: The linear model is \(Y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \beta_4x_4 + \beta_5x_5\)

\(Y = 83.625 + 13.875x_2 - 20.500x_3 - 6x_4 + 7.375x_5\)

The ANOVA is calculated below and the EMS is shown as well. A more thorough calculation is shown in Part B.

bw <- c(61,100,56,113,99,103,75,62,
       75,102,95,103,98,115,98,94,
       58,60,60,57,57,59,54,100,
       57,56,67,59,59,121,101,101,
       59,46,120,115,115,93,105,75)

sire <- as.factor(c(rep("A",8),rep("B",8),rep("C",8),rep("D",8),rep("E",8)))

birthweights <- data.frame(bw, sire)
birthweights

##     bw sire
## 1   61    A
## 2  100    A
## 3   56    A
## 4  113    A
## 5   99    A
## 6  103    A
## 7   75    A
## 8   62    A
## 9   75    B
## 10 102    B
## 11  95    B
## 12 103    B
## 13  98    B
## 14 115    B
## 15  98    B
## 16  94    B
## 17  58    C
## 18  60    C
## 19  60    C
## 20  57    C
## 21  57    C
## 22  59    C
## 23  54    C
## 24 100    C
## 25  57    D
## 26  56    D
## 27  67    D
## 28  59    D
## 29  59    D
## 30 121    D
## 31 101    D
## 32 101    D
## 33  59    E
## 34  46    E
## 35 120    E
## 36 115    E
## 37 115    E
## 38  93    E
## 39 105    E
## 40  75    E

lm(bw~sire,data=birthweights)

## 
## Call:
## lm(formula = bw ~ sire, data = birthweights)
## 
## Coefficients:
## (Intercept)        sireB        sireC        sireD        sireE  
##      83.625       13.875      -20.500       -6.000        7.375

summary(lm(bw~sire,data=birthweights))

## 
## Call:
## lm(formula = bw ~ sire, data = birthweights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -45.000 -16.656  -3.125  16.656  43.375 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   83.625      7.605  10.996 6.68e-13 ***
## sireB         13.875     10.755   1.290   0.2055    
## sireC        -20.500     10.755  -1.906   0.0649 .  
## sireD         -6.000     10.755  -0.558   0.5805    
## sireE          7.375     10.755   0.686   0.4974    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.51 on 35 degrees of freedom
## Multiple R-squared:  0.2563, Adjusted R-squared:  0.1713 
## F-statistic: 3.016 on 4 and 35 DF,  p-value: 0.03081

aov1 <- aov(bw~sire, data=birthweights)
summary(aov1)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## sire         4   5581  1395.3   3.016 0.0308 *
## Residuals   35  16195   462.7                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

library(EMSaov)

EMSanova(bw~sire, data = birthweights, type = c("F"))

##           Df       SS        MS Fvalue Pvalue Sig         EMS
## sire       4  5581.15 1395.2875 3.0155 0.0308   * Error+8sire
## Residuals 35 16194.62  462.7036                         Error

Part B. Variance Component Estimates for sire and samples (progeny)

Question: Using the methods used in our class, give the variance component estimates for sires and samples (progeny).

Answer: EMS for the residuals is 462.70 EMS for the sire is (1395.28 - 462.70)/8 = 116.57

s2trt <- (1395.28 - 462.70)/8
s2trt

## [1] 116.5725

s2tot <- s2trt + 462.70
s2tot

## [1] 579.2725

Part C.

Question: Test the null hypothesis of zero sire-to-sire variability, giving all hypotheses, calculated test statistic, degrees of freedom, p-value, and your clear conclusion.

Answer: \(H_0 = {\sigma_\alpha}^2 = 0\) \(H_A = {\sigma_\alpha}^2 \neq 0\)

F value is 0.25 with degrees of freedom 4 and 35. P-value is 0.90.

With a p-value of 0.9 > \(\alpha = 0.05\), we fail to reject the \(H_0\). That is, we conclude there is 0 sire to sire variability.

s2trt/462.70

## [1] 0.2519397

1-pf(0.2519397,4,35)

## [1] 0.9065265

summary(lm(bw~sire,data=birthweights))

## 
## Call:
## lm(formula = bw ~ sire, data = birthweights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -45.000 -16.656  -3.125  16.656  43.375 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   83.625      7.605  10.996 6.68e-13 ***
## sireB         13.875     10.755   1.290   0.2055    
## sireC        -20.500     10.755  -1.906   0.0649 .  
## sireD         -6.000     10.755  -0.558   0.5805    
## sireE          7.375     10.755   0.686   0.4974    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.51 on 35 degrees of freedom
## Multiple R-squared:  0.2563, Adjusted R-squared:  0.1713 
## F-statistic: 3.016 on 4 and 35 DF,  p-value: 0.03081

summary(aov(lm(bw~sire,data=birthweights)))

##             Df Sum Sq Mean Sq F value Pr(>F)  
## sire         4   5581  1395.3   3.016 0.0308 *
## Residuals   35  16195   462.7                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Part D.

Question: Using the methods used in our class, give the intraclass correlation coefficient (ICC), and give the 90% confidence interval. Clearly interpret the ICC estimate in the context of this situation.

Answer: Of the variance of an observation, 33.5% of this variation is the result of differences between sires and 66.5% of this variation is the result of differences within sires.

The variance component estimates are \(\sigma_\alpha^2 = 116.5725\) and \(\sigma^2 = 462.7\)

The point estimate is 0.2012395.

The 90% CI for the true ICC is (0.01739312, 0.6704723)

MStrt <- 1395.28
MSE <- 462.7
n <- 8
sigma_hat_alpha_sq <- (MStrt - MSE)/n
sigma_hat_alpha_sq

## [1] 116.5725

sigma_hat <- MSE
sigma_hat

## [1] 462.7

pointest <- sigma_hat_alpha_sq/(sigma_hat_alpha_sq + sigma_hat)
pointest

## [1] 0.2012395

1-pointest

## [1] 0.7987605

#Confidence Interval 
L <- (1/n)*(((MStrt/MSE)/qf(0.95, df1 = 4, df2 = 35))-1)
L/(1+L)

## [1] 0.01739312

U <- (1/n)*(((MStrt/MSE)/qf(0.05, df = 4, df2 = 35)) -1)
U/(1+U)

## [1] 0.6704723

Problem 2

The Ames Salmonella microsome assay is used to investigate the potential of environmental toxic substances for their ability to effect heritable change in genetic material. The compound 4-nitroortho-phenylenediamine was tested with strain TA98 Salmonella. The number of visible colonies was counted on plates dosed with 4NoP. Five dose levels of 4NoP were used in this study. The colony counts for seven of the plates at each dose level are shown here.

Part A.

Question: After determining whether a transformation of the response variable is needed – and pointing out why neither log- nor square-root-transformations are suggested here – conduct the appropriate analysis of variance treating dose as a factor. Using the underline method, clearly report the findings of Tukey’s HSD test for these data. Report all findings/work here

Answer: We first plot the data and see the residuals are not normal and the normal QQ plot is fairly straight. Since the variances are not constant, we cannot use log or square root transformation.

We end up using a box cox transformation and obtain that the best lambda is 0.222. So, we transform the colony counts by raising to that power. Once we rerun the anova with the transformed data, we see that the plots of the residuals look much better as does the normal QQ plot.

From Tukey’s HSD, we test the differences among sample means. As 0 is in none of the intervals, we conclude all pairwise differences are signficant. I choose not to make a underline method plot because it would just be a straight line.

cc <- c(11,14,15,17,18,21,25,
        39,43,46,50,52,61,67,
        88,92,104,113,119,120,130,
        222,251,259,283,299,312,337,
        562,604,689,702,710,739,786)

dose <- as.factor(c(rep("0",7),rep("0.3",7),rep("1",7),rep("3",7),rep("10",7)))

salmonella <- data.frame(cc, dose)
salmonella

##     cc dose
## 1   11    0
## 2   14    0
## 3   15    0
## 4   17    0
## 5   18    0
## 6   21    0
## 7   25    0
## 8   39  0.3
## 9   43  0.3
## 10  46  0.3
## 11  50  0.3
## 12  52  0.3
## 13  61  0.3
## 14  67  0.3
## 15  88    1
## 16  92    1
## 17 104    1
## 18 113    1
## 19 119    1
## 20 120    1
## 21 130    1
## 22 222    3
## 23 251    3
## 24 259    3
## 25 283    3
## 26 299    3
## 27 312    3
## 28 337    3
## 29 562   10
## 30 604   10
## 31 689   10
## 32 702   10
## 33 710   10
## 34 739   10
## 35 786   10

aov1 <- aov(cc~dose, data=salmonella)
summary(aov1)

##             Df  Sum Sq Mean Sq F value Pr(>F)    
## dose         4 2106599  526650   334.9 <2e-16 ***
## Residuals   30   47175    1573                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Plot the original untransformed model 
plot(aov1)

k <- lm(cc~dose, data=salmonella)

#Apply Box Cox and obtain 0.222222 as the maximum so we transform 
library(MASS)

## Warning: package 'MASS' was built under R version 3.5.2

bc <- boxcox(k, lambda=seq(0,0.5,by=.05))

bc$x[which.max(bc$y)]

## [1] 0.2222222

#New transformed model
aov2 <- aov(cc^0.222~dose, data=salmonella)
summary(aov2)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## dose         4 24.402   6.100   553.2 <2e-16 ***
## Residuals   30  0.331   0.011                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(aov2)

TukeyHSD(aov2, "dose", ordered = TRUE)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
##     factor levels have been ordered
## 
## Fit: aov(formula = cc^0.222 ~ dose, data = salmonella)
## 
## $dose
##             diff       lwr       upr p adj
## 0.3-0  0.5161357 0.3533150 0.6789564 0e+00
## 1-0    0.9589705 0.7961498 1.1217912 0e+00
## 3-0    1.6169750 1.4541543 1.7797957 0e+00
## 10-0   2.3837676 2.2209469 2.5465883 0e+00
## 1-0.3  0.4428348 0.2800141 0.6056555 1e-07
## 3-0.3  1.1008393 0.9380186 1.2636600 0e+00
## 10-0.3 1.8676319 1.7048112 2.0304526 0e+00
## 3-1    0.6580045 0.4951838 0.8208252 0e+00
## 10-1   1.4247970 1.2619763 1.5876177 0e+00
## 10-3   0.7667925 0.6039718 0.9296132 0e+00

Part B.

Question: It was felt that if dose was transformed using the transformation x=log(dose + 1), then a simple linear regression model would sufficiently fit the transformed data. Providing all hypotheses, test statistic, degrees of freedom and your clear conclusion, decide whether this linear model fits these data.

Answer: The model is transformed below. We see from the summary all the transformed doses are significant in the linear model.

We perform a Lack of Fit test to ensure that this model is appropriate.

\(H_0\): There is no lack of fit in the model.

\(H_1\): There is a lack of fit.

With an F value of 552.6 on 4 and 30 DF and a p-value: < 2.2e-16, We reject the \(H_0\). We conclude there is a lack of fit.

#New transformation
dose_new <- as.numeric(as.character(salmonella$dose))
dose_newnew <- as.factor(log(dose_new+1))
cc_new <- (cc^0.22)
salm2 <- data.frame(cc_new, dose_newnew)

g <- lm(cc_new~dose_newnew, data=salm2)
summary(g)

## 
## Call:
## lm(formula = cc_new ~ dose_newnew, data = salm2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17454 -0.07655  0.01204  0.06864  0.16822 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   1.86202    0.03897  47.786  < 2e-16 ***
## dose_newnew0.262364264467491  0.50803    0.05511   9.219 2.93e-10 ***
## dose_newnew0.693147180559945  0.94311    0.05511  17.115  < 2e-16 ***
## dose_newnew1.38629436111989   1.58847    0.05511  28.826  < 2e-16 ***
## dose_newnew2.39789527279837   2.33916    0.05511  42.448  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1031 on 30 degrees of freedom
## Multiple R-squared:  0.9866, Adjusted R-squared:  0.9848 
## F-statistic: 552.6 on 4 and 30 DF,  p-value: < 2.2e-16

aov3 <- aov(cc_new~dose_newnew, data=salm2)
summary(aov3)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## dose_newnew  4 23.493   5.873   552.6 <2e-16 ***
## Residuals   30  0.319   0.011                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(aov3)

#plot(salm2$dose_newnew,salm2$cc_new, xlab-"Dose New", ylab="Colony Counts")

aov3 <- aov(cc_new~dose_newnew, data=salm2)
summary(aov3)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## dose_newnew  4 23.493   5.873   552.6 <2e-16 ***
## Residuals   30  0.319   0.011                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(aov3)

Problem 3

A comparison study is to be conducted to evaluate campsites in a national park. There are ten campsite designs, which range from primitive to full-facility sites. Visitors to the national park will be selected at random during the month of June and shown photographs of three of the ten campsites. They are to provide ratings on a 0 to 100 scale of the three campsites based on the photographs.

Answer:

We use the image to help us with the problems.

Design Problem 3

Part A.

Question: How many visitors will be required for the study?

Answer: N = gr = bk = 90. We need 90 visitors for this study.

g <- 10
k <- 3 
r <- 9
b <- 30

N <- g*r
N

## [1] 90

Part B.

Question: How many replications of each campsite will there be in the study?

Answer: \(r = 9\) The number of replicates from the table is 9

Part C.

Question: How many times will each pair of campsites be viewed by a visitor?

Answer: \(\lambda = 2\) Each pair of campsites will be viewed by a visitor 2 times.

Part D.

Question: Showing all needed calculations, give the efficiency of this design compared with the one where each visitor assesses each of the ten campsites.

Answer E = 0.74 for this design.

E= 1 for the design where each visitor assesses each of the ten campsites.

#This Design
(g*(k-1))/(k*(g-1))

## [1] 0.7407407

#Design where each visitor assesses each of the ten campsites
(10*(10-1))/(10*(10-1))

## [1] 1

Part E.

Question: If it turns out that MSE = 30 showing all work give the standard error of the difference of the least-squares estimates of two of the campsite designs.

Answer: Standard error between the two is 3.

\(se_{\hat{\mu_i} - \hat{\mu_j}}\)

sqrt(((2*k)/(2*g))*30)

## [1] 3

Problem

A supermarket chain wanted to assess how varying product position within a store affected sales of aspirin. Two factors were identified for studying: position of the product on the aisle shelving (denoted A) and shelf position of the product within the aisle (denoted S). Shelving is provided in three sections: bottom, middle and top. Each of these levels was tested. For the aisle position factor, the aspirin was placed either at the end nearest the checkouts or at the end furthest from the checkouts. Eighteen stores within the chain were randomly selected for the study with each combination assessed in three randomly chosen stores to provide the replicate sales data. The number of product sales after a two-week trial period were recorded as summarized by the cell means given in the following table. For the full data, note that SSE = 369.33

Part A.

Question: Showing all relevant work, derive and report the S X A sum of squares.� sum of squares.

Answer \(SS_{S X A}\) = 134.8889

sse <- 369.33
b <- c(40.333,41)
m <- c(68.667,35)
t <- c(53, 39)

shelf <- data.frame(b,m,t)

#y's from rows
y1_dot <- sum(shelf[1,])/3
y2_dot <- sum(shelf[2,])/3

y1_dot

## [1] 54

y2_dot

## [1] 38.33333

#y's from columns
ydot_1 <- sum(shelf[,1])/2
ydot_2 <- sum(shelf[,2])/2
ydot_3 <- sum(shelf[,3])/2

ydot_1

## [1] 40.6665

ydot_2

## [1] 51.8335

ydot_3

## [1] 46

#all average
y_all <- sum(y1_dot + y2_dot + ydot_1 + ydot_2 + ydot_3)/5
y_all

## [1] 46.16667

#alphas
alpha_1 <- y1_dot - y_all
alpha_1

## [1] 7.833333

alpha_2 <- y2_dot - y_all
alpha_2

## [1] -7.833333

#betas 
beta_1 <- ydot_1 - y_all
beta_1

## [1] -5.500167

beta_2 <- ydot_2 - y_all
beta_2

## [1] 5.666833

beta_3 <- ydot_3 - y_all
beta_3

## [1] -0.1666667

y11 <- alpha_1 + beta_1 + y_all 
y12 <- alpha_1 + beta_2 + y_all
y13 <- alpha_1 + beta_3 + y_all

y21 <- alpha_2 + beta_1 + y_all
y22 <- alpha_2 + beta_2 + y_all
y23 <- alpha_2 + beta_3 + y_all

yij <- matrix(c(y11,y12,y13,y21,y22,y23), ncol=3, byrow=TRUE)
yij

##          [,1]     [,2]     [,3]
## [1,] 48.49983 59.66683 53.83333
## [2,] 32.83317 44.00017 38.16667

#ab_ij
ab_ij <- shelf - yij
ab_ij

##           b         m          t
## 1 -8.166833  9.000167 -0.8333333
## 2  8.166833 -9.000167  0.8333333

#SSab
SSab <- 2*sum(ab_ij^2)
SSab

## [1] 593.5784

Part B.

Question: Test whether the S X A interaction is significant, providing the relevant hypotheses, test statistic, degrees of freedom, p-value and your clear conclusion.

Answer: \(H_0\): The S X A interaction is not significant \(H_A\) The S X A interaction is significant

The mean square between shelf and aisle is 296.7892 The fvalue is 9.643058 with df = 2 and 12 The p value is 0.003184033.

With a p-value < \(\alpha\) = 0.05, we reject the H_0. We conclude the interaction is in fact significant.

MSE <- sse /((3-1)*(2*3))
MSE

## [1] 30.7775

MS_ShelfAisle <- SSab/2
MS_ShelfAisle

## [1] 296.7892

fval <- MS_ShelfAisle/MSE
fval

## [1] 9.643058

#(n-1)*ab = 12
1-pf(fval, 2, 12)

## [1] 0.003184033

Part C.

Question: Test the difference between Aisle Position means within each level of Shelf Position. Break this down for each within group and give and interpret the 95% confidence intervals for the relevant difference of means. Note this was a pre-planned comparison.

Answer: \(H_0\): Theinteraction is not significant \(H_A\) The interaction is significant

Aisle within Bottom Contrast

Fvalue is 0.021878 Pvalue is 0.8848684

With a p-value greater than \(\alpha\), we fail to reject the null hypothesis.

Aisle within Middle Contrast

Fvalue is 55.24166 Pvalue is 7.920019e-06.

With a p-value less than \(\alpha\), we reject the null hypothesis.

Aisle within Top Contrast

Fvalue is 9.552433 Pvalue is 0.009350148.

With a p-value less than \(\alpha\), we reject the null hypothesis.

The confidence intervals are as follows: Bottom to Middle Aisle: (-23.2516665, 0.9176665) Middle to Top: (-6.251167, 17.918167) Top to Bottom: (-6.751167, 17.418167)

Since 0 is in all of the confidence intervals, we conclude the interaction is significant.

#Aisle within Bottom
SSW_b <- (40.33-41)^2/(2/3)
SSW_b

## [1] 0.67335

fval_b <- SSW_b/MSE
fval_b

## [1] 0.021878

1-pf(fval_b,1,12)

## [1] 0.8848684

#Aisle within Middle
SSW_m <- (68.667-35)^2/(2/3)
SSW_m

## [1] 1700.2

fval_m <- SSW_m/MSE
fval_m

## [1] 55.24166

1-pf(fval_m,1,12)

## [1] 7.920019e-06

#Aisle within Top
SSW_t <-  (53-39)^2/(2/3)
SSW_t

## [1] 294

fval_t <- SSW_t/MSE
fval_t

## [1] 9.552433

1-pf(fval_t,1,12)

## [1] 0.009350148

#Confidence Interval 
q.value <- qtukey(p = 0.95, nmeans = 3, df = 12)
tukey.hsd <- q.value * sqrt(MSE/3)
tukey.hsd

## [1] 12.08467

mu_b <- (41+40.333)/2
mu_m <- (68.667+35.00)/2
mu_t <- (53 + 39)/2

(mu_b-mu_m)+ c(-1,1) *tukey.hsd

## [1] -23.2516665   0.9176665

(mu_m-mu_t)+ c(-1,1) *tukey.hsd

## [1] -6.251167 17.918167

(mu_t-mu_b)+ c(-1,1) *tukey.hsd

## [1] -6.751167 17.418167

Part D.

Yes the conclusion is expected based on the results from part B and C.

Problem 5

Question: Clearly report hypotheses, test statistics, degrees of freedom, p-values and clear conclusions. Summarize your findings by reporting the estimated best fitting polynomial for each “within group”. (Note: since the suggested transformation does not improve the residual pattern, work with these data on the original scale.)

Answer: We go through the columns and calculate the averages.

SSE = 44.5

The fstat for flowers is 10.39326 and pvalue is 0.004581446 The fstat for seed is 39.04494 and pvalue is 3.666372e-05 The fstat for flowersXseed is 13.64045 and pvalue is 0.001988281

Flower, seed and the interaction are all significant.

For flower and seed, it seems linear and quadratic polynomials are best fitting.

aa <- mean(c(67,66))
ab <- mean(c(68,65))
ac <- mean(c(65,64))

ba <- mean(c(65,61))
bb <- mean(c(68,61))
bc <- mean(c(62,63))

ca <- mean(c(62,64))
cb <- mean(c(55,53))
cc <- mean(c(49,47))

flow <- matrix(c(aa,ab,ac,ba,bb,bc,ca,cb,cc), ncol=3, byrow=FALSE)
flow

##      [,1] [,2] [,3]
## [1,] 66.5 63.0   63
## [2,] 66.5 64.5   54
## [3,] 64.5 62.5   48

y1_dot <- sum(flow[1,])/3
y2_dot <- sum(flow[2,])/3
y3_dot <- sum(flow[3,])/3

y1_dot

## [1] 64.16667

y2_dot

## [1] 61.66667

y3_dot

## [1] 58.33333

#y's from columns
ydot_1 <- sum(flow[,1])/3
ydot_2 <- sum(flow[,2])/3
ydot_3 <- sum(flow[,3])/3

ydot_1

## [1] 65.83333

ydot_2

## [1] 63.33333

ydot_3

## [1] 55

#all average
y_all <- sum(y1_dot + y2_dot + y3_dot + ydot_1 + ydot_2 + ydot_3)/6
y_all

## [1] 61.38889

#alphas
alpha_1 <- y1_dot - y_all
alpha_1

## [1] 2.777778

alpha_2 <- y2_dot - y_all
alpha_2

## [1] 0.2777778

alpha_3 <- y3_dot - y_all
alpha_3

## [1] -3.055556

#betas 
beta_1 <- ydot_1 - y_all
beta_1

## [1] 4.444444

beta_2 <- ydot_2 - y_all
beta_2

## [1] 1.944444

beta_3 <- ydot_3 - y_all
beta_3

## [1] -6.388889

y11 <- alpha_1 + beta_1 + y_all 
y12 <- alpha_1 + beta_2 + y_all
y13 <- alpha_1 + beta_3 + y_all

y21 <- alpha_2 + beta_1 + y_all
y22 <- alpha_2 + beta_2 + y_all
y23 <- alpha_2 + beta_3 + y_all

y31 <- alpha_3 + beta_1 + y_all
y32 <- alpha_3 + beta_2 + y_all
y33 <- alpha_3 + beta_3 + y_all

yij <- matrix(c(y11,y12,y13,y21,y22,y23,y31,y32,y33), ncol=3, byrow=TRUE)
yij

##          [,1]     [,2]     [,3]
## [1,] 68.61111 66.11111 57.77778
## [2,] 66.11111 63.61111 55.27778
## [3,] 62.77778 60.27778 51.94444

#ab_ij
ab_ij <- flow - yij
ab_ij

##            [,1]       [,2]      [,3]
## [1,] -2.1111111 -3.1111111  5.222222
## [2,]  0.3888889  0.8888889 -1.277778
## [3,]  1.7222222  2.2222222 -3.944444

#SSa
SSa <- 2*3*(alpha_1^2 + alpha_2^2 + alpha_3^2)
SSa

## [1] 102.7778

#SSb
SSb <- 2*3*(beta_1^2 + beta_2^2 + beta_3^2)
SSb

## [1] 386.1111

#SSab
SSab <- 2*sum(ab_ij^2)
SSab

## [1] 134.8889

#SSE
SSE <- ((67-aa)^2 + (66-aa)^2 
        + (68-ab)^2 + (65-ab)^2
        + (65-ac)^2 + (64-ac)^2
        + (65-ba)^2 + (61-ba)^2
        + (68-bb)^2 + (61-bb)^2
        + (62-bc)^2 + (63-bc)^2
        + (62-ca)^2 + (64-ca)^2
        + (55-cb)^2 + (53-cb)^2
        + (49-cc)^2 + (47-cc)^2)
SSE

## [1] 44.5

MSE <- SSE/9
MSE

## [1] 4.944444

MSA <- SSa/2
MSB <- SSb/2
MSAB <- SSab/2

fval_a <- MSA/MSE
fval_a

## [1] 10.39326

1-pf(fval_a,2,9)

## [1] 0.004581446

fval_b <- MSB/MSE
fval_b

## [1] 39.04494

1-pf(fval_b,2,9)

## [1] 3.666372e-05

fval_ab <- MSAB/MSE
fval_ab

## [1] 13.64045

1-pf(fval_a,4,9)

## [1] 0.001988281

#Polynomial Contrasts
lin_con<-c(-1,0,1)
quad_con<-c(1,-2,1)


SSL_1 <- (sum(lin_con*flow[,1])^2)/((sum((lin_con)^2))/2)
SSQ_1 <- (sum(quad_con*flow[,1])^2)/((sum((quad_con)^2))/2)

SSL_5 <- (sum(lin_con*flow[,2])^2)/((sum((lin_con)^2))/2)
SSQ_5 <- (sum(quad_con*flow[,2])^2)/((sum((quad_con)^2))/2)

SSL_9 <- (sum(lin_con*flow[,3])^2)/((sum((lin_con)^2))/2)
SSQ_9 <- (sum(quad_con*flow[,3])^2)/((sum((quad_con)^2))/2)

SSL_1 +SSQ_1+ SSL_5 + SSQ_5 + SSL_9 + SSQ_9

## [1] 237.6667

SSa + SSab

## [1] 237.6667

Problem 6

A company is developing an insulating compound. An accelerated life testing procedure is conducted to determine the breakdown time in minutes after subjection to elevated voltages. Four voltages could be tested on a given run of the test.

Part A.

Question: How many times did each treatment pair occur together in the same block?

Answer: Each treatment pair occured 2 times in each block.

Part B.

Question: Find the efficiency

Answer: The efficiency factor for this design is 0.875

7*(4-1)/(4*(7-1))

## [1] 0.875

Part C.

Box Cox

STATDesign2

Kajal Chokshi

11/7/2019

Problem 1

Part A.

Part B. Variance Component Estimates for sire and samples (progeny)

Part C.

Part D.

Problem 2

Part A.

Part B.

Problem 3

Part A.

Part B.

Part C.

Part D.

Part E.

Problem

Part A.

Part B.

Part C.

Part D.

Problem 5

Problem 6

Part A.

Part B.

Part C.