Hw #8

Reading: Section 6.3, Section 8.2, Chapter 9
Exercises to hand in: 6.36, 8.14, 9.4, 9.6, 9.22

library(tidyverse)
library(Stat2Data)
library(skimr)
library(agricolae)

6.36 Alfalfa sprouts

Some students were interested in how an acidic environment might affect the growth of plants. They planted alfalfa seed in 15 cups and randomly chose five to get plain water, five to get a moderate anount of acid (1.5M HCl), and five to get a stronger acid solution (3.0M HCl). The plants were grown in an indoor room, so the students assumed that the distance from the main source of daylight (a window) might have an effect on growth rates. For this reason, they arranged the cups in give rows of three, with one cup from each Acid level in each row. these are labeled in the dataset as Row: a = farthest from the window through e = nearest to the window. Each cup was an experimental unit, and the response variable was the average height of the alfalfa sprouts in each cup after four days (Ht4). The data are shown in the table belowand stored in the Alfalfa file.

table <- matrix(c(1.45,2.79,1.93,2.33,4.85,1.00,0.70,1.37,2.80,1.46,1.03,1.22,0.45,1.65,1.07),ncol=5,byrow=TRUE)
colnames(table) <- c("a", " b", "c", "d", "e")
rownames(table) <- c("water","1.5 HCl","3.0 HCl")
table <- as.table(table)
table

##            a    b    c    d    e
## water   1.45 2.79 1.93 2.33 4.85
## 1.5 HCl 1.00 0.70 1.37 2.80 1.46
## 3.0 HCl 1.03 1.22 0.45 1.65 1.07

Find the means for each row of cups (a, b, …, e) and each treatment (water, 1.5 HCl, 3.0 HCl). Also find the average and standard deviation for the growth in all 15 cups.

table <- matrix(c(1.45,2.79,1.93,2.33,4.85,2.67,1.00,0.70,1.37,2.80,1.46,1.466,1.03,1.22,0.45,1.65,1.07,1.084,1.16,1.57,1.25,2.26,2.46,1.74),ncol=6,byrow=TRUE)
colnames(table) <- c("a", " b", "c", "d", "e","Avg")
rownames(table) <- c("water","1.5 HCl","3.0 HCl","Avg")
table <- as.table(table)
table

##             a     b     c     d     e   Avg
## water   1.450 2.790 1.930 2.330 4.850 2.670
## 1.5 HCl 1.000 0.700 1.370 2.800 1.460 1.466
## 3.0 HCl 1.030 1.220 0.450 1.650 1.070 1.084
## Avg     1.160 1.570 1.250 2.260 2.460 1.740

data("Alfalfa")
mydata <- Alfalfa$Ht4
mymean <- mean(mydata)
mysd <- sd(mydata)

The average for the growth in all 15 cups is 1.74 and the standard deviation for the growth in all 15 cups is 1.105396

Construct a two-way main effects ANOVA table for testing for differences in average growth due to the acid treatments using the rows as a blocking variable.

data(Alfalfa)
a1 <- aov(Ht4~Acid + Row,data=Alfalfa)
summary(a1)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Acid         2  6.852   3.426   4.513 0.0487 *
## Row          4  4.183   1.046   1.378 0.3235  
## Residuals    8  6.072   0.759                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

t_water = 2.67-mymean
t_hcl1 = 1.466-mymean
t_hcl3 = 1.084-mymean
b_a = 1.16-mymean
b_b = 1.57-mymean
b_c = 1.25-mymean
b_d = 2.26-mymean
b_e = 2.46-mymean

# SSA = sum(J(y_i-y)^2)
SSA = 5*((t_water^2)+(t_hcl1^2)+(t_hcl3^2))

# SSB = sum(I(y_i-y)^2)
SSB = 3*((b_a^2)+(b_b^2)+(b_c^2)+(b_d^2)+(b_e^2))

# SST = (n-1)s_y^2
SST = ((3*5)-1)*(mysd^2)

# SSE = SST - SSA - SSB
SSE = SST-SSA-SSB

# MSA = SSA/I-1
MSA = SSA/2

# MSB = SSB/J-1
MSB = SSB/4

# MSE = SSE/(I-1)(J-1)
MSE = SSE/8

# F=MSA/MSE or F=MSB/MSE
F1 = MSA/MSE
F2 = MSB/MSE

# p-value
pf(F1, 2, 8, lower.tail=F)

## [1] 0.04873771

pf(F2, 4, 8, lower.tail=F)

## [1] 0.3235159

	Df	Sum Sq	Mean Sq	F value	Pr(\(>\)F)
Acid	2	6.852	3.426	4.513	0.0487
Row	4	4.18	1.046	1.378	0.3235
Residuals	8	6.07	0.759

Check the conditions required for the ANOVA model.

plot(a1,which=1)

plot(a1,which=2)

We see that the acids are all independent of each other so the condition for independence hold for the ANOVA model. Looking at the residuals vs. fitted plot, we see that the red line is not flat, this means that there is some non-linear trend to the residuals. In the QQ plot, we see that there are some data points that deviates form the dotted line. Therefore, not all conditions are met for the ANOVA model.

Based on the ANOVA, would you conclude that there is a significant difference in average growth due to the treatments? Explain why or why not.

Base on the ANOVA table, we can conclude that there is a significant difference in average growth due to the treatments because the p-value is significant.

Based on the ANOVA, would you conclude that there is a significant difference in average growth due to the distance from the window? Explain why or why not.

Based on the ANOVA table, we cannot conclude that there is a significant difference in average growth due to the distance from the window because our row variable p-value is not significant.

8.14 Sea slugs

Sea slugs, common on the coast of Southern California, live on vaucherian seaweed. But the larvae from these sea slugs need to locate this type of seaweed to survive. A study was done to try to determine whether chemicals that leach out of the seaweed attract the larvae. Seawater was collected over a patch of this kind of seaweed at 5-minute intervals as the tide was coming in and, presumably, mixing with the chemicals. The idea was that as more seawater came in, the concentration of the chemicals was reduced. Each sample of water was divided into six parts. Larvae were then introduced to this seawater to see what percentage metamorphosed. Is there a difference in this percentage over the five time periods? Open the dataset SeaSlugs.

Use Fisher’s LSD intervals to find any differences that exist between the percent of larvae that metamorphosed in the different water conditions.

data("SeaSlugs")
a1 <- aov(Percent~factor(Time), data=SeaSlugs)
anova(a1)

## Analysis of Variance Table
## 
## Response: Percent
##              Df  Sum Sq  Mean Sq F value    Pr(>F)    
## factor(Time)  5 0.63091 0.126182  5.9648 0.0006067 ***
## Residuals    30 0.63464 0.021155                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

print(LSD.test(a1,"factor(Time)"))

## $statistics
##      MSerror Df      Mean       CV  t.value       LSD
##   0.02115452 30 0.2716667 53.53838 2.042272 0.1714963
## 
## $parameters
##         test p.ajusted       name.t ntr alpha
##   Fisher-LSD      none factor(Time)   6  0.05
## 
## $means
##      Percent       std r        LCL       UCL   Min   Max     Q25    Q50
## 0  0.5356667 0.1687859 6 0.41440050 0.6569328 0.357 0.857 0.47525 0.5000
## 10 0.1776667 0.1238881 6 0.05640050 0.2989328 0.067 0.333 0.08350 0.1330
## 15 0.1833333 0.1470397 6 0.06206716 0.3045995 0.000 0.333 0.05350 0.2405
## 20 0.2191667 0.1383914 6 0.09790050 0.3404328 0.067 0.437 0.10775 0.2335
## 25 0.1686667 0.1484650 6 0.04740050 0.2899328 0.000 0.412 0.08350 0.1330
## 5  0.3455000 0.1423921 6 0.22423383 0.4667662 0.125 0.467 0.26050 0.4000
##        Q75
## 0  0.52475
## 10 0.28300
## 15 0.28125
## 20 0.26700
## 25 0.23350
## 5  0.45025
## 
## $comparison
## NULL
## 
## $groups
##      Percent groups
## 0  0.5356667      a
## 5  0.3455000      b
## 20 0.2191667     bc
## 15 0.1833333     bc
## 10 0.1776667     bc
## 25 0.1686667      c
## 
## attr(,"class")
## [1] "group"

We see that the means sharing the same letter are different at intervals 0, 5, and 25 compared to the intervals at 20, 15, and 10. There are no significant differences between the time intervals of 20, 15, and 10 to other time intervals.

Use Tukey’s HSD intervals to find any differences that exist between the percent of larvae that metamorphosed in the different water conditions.

a1 <- aov(Percent~factor(Time), data=SeaSlugs)
TukeyHSD(a1)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Percent ~ factor(Time), data = SeaSlugs)
## 
## $`factor(Time)`
##               diff        lwr         upr     p adj
## 5-0   -0.190166667 -0.4455792  0.06524590 0.2397208
## 10-0  -0.358000000 -0.6134126 -0.10258743 0.0023231
## 15-0  -0.352333333 -0.6077459 -0.09692077 0.0027831
## 20-0  -0.316500000 -0.5719126 -0.06108743 0.0085222
## 25-0  -0.367000000 -0.6224126 -0.11158743 0.0017407
## 10-5  -0.167833333 -0.4232459  0.08757923 0.3666256
## 15-5  -0.162166667 -0.4175792  0.09324590 0.4038772
## 20-5  -0.126333333 -0.3817459  0.12907923 0.6641386
## 25-5  -0.176833333 -0.4322459  0.07857923 0.3114499
## 15-10  0.005666667 -0.2497459  0.26107923 0.9999998
## 20-10  0.041500000 -0.2139126  0.29691257 0.9960188
## 25-10 -0.009000000 -0.2644126  0.24641257 0.9999978
## 20-15  0.035833333 -0.2195792  0.29124590 0.9980127
## 25-15 -0.014666667 -0.2700792  0.24074590 0.9999748
## 25-20 -0.050500000 -0.3059126  0.20491257 0.9901287

We see that the interval 0-10, 0-15, 0-20, and 0-25 are the most statistically significant. This means that there are differences that exist between the percent of larvae that metamorphosed in the different water conditions.

Were your conclusions to (a) and (b) different? Explain. If so, which would you prefer to use in this case and why?

The conclusions to (a) and (b) are the not the same. Fisher’s LSD is more narrower compared to Tukey’s. When using Fisher, we are willing to accept a higher risk of making a Type I error. Therefore, I would prefer to use Tukey’s HSD in this case.

9.4 Probability to odds

If the probability of an event occuring is 0.8, what are the odds?

0.8/(1-0.8)

## [1] 4

The odds for the probability of an event occuring at 0.8 is 4

If the probability of an event occuring is 0.25, what are the odds?

0.25/(1-0.25)

## [1] 0.3333333

The odds for the probability of an event occuring at 0.25 is 1/3

If the probability of an event occuring is 0.6, what are the odds?

0.6/(1-0.6)

## [1] 1.5

The odds for the probability of an event occuring at 0.6 is 1.5

9.6 Odds to probabilities

If the odds of an event occuring are 1:3, what is the probability?

1/((1/(1/3))+1)

## [1] 0.25

If the odds of an event is 1:3, the probability is 0.25

If the odds of an event occuring are 5:2, what is the probability?

1/((1/(5/2))+1)

## [1] 0.7142857

If the odds of an event occuring is 5:2, the probability is 0.7142857

If the odds of an event occuring are 1:9, what is the probability?

1/((1/(1/9))+1)

## [1] 0.1

If the odds of an event occuring is 1:9, the probability is 0.1

9.22 Dementia: Odds and probability

Two types of dementia are Dementia with Lewy Bodies and Alzheimer’s disease. Some people are afflicted with both of these. The file LewyBody2Groups includes the variable Type, which has two levels: “DLB/AD” for the 20 subjects with both types of dementia and “DLB” for the 19 subjects with only Lewy Body dementia. the variable MMSE measures change in functional performance on the Mini Mental State Examination. We are interested in using MMSE to predict whether or not Alzheimer’s disease is present. A fitted logistic model is

\[ log(\frac{\hat{\pi}}{1-\hat{\pi}})=-0.742-0.294MMSE \]

Use this model to estimate the odds of Alzheimer’s disease. \(\pi/(1-\pi)\), if a patient’s \(MMSE\) is -4.

exp(-0.742-0.294*(-4))

## [1] 1.543419

The odds of Alzheimer’s disease if a patient’s MMSE is -4 is 1.543419

Use this model to estimate the probability of Alzheimer’s disease if a patient’s \(MMSE\) is -4.

(exp(-0.742-0.294*(-4)))/(1+exp(-0.742-0.294*(-4)))

## [1] 0.6068284

The probability of Alzheimer’s disease if a patient’s MMSE is -4 is 0.6068284

How much do the estimated odds change if the \(MMSE\) changes from -4 to -3?

o_4 = exp(-0.742-0.294*(-4))
o_3 = exp(-0.742-0.294*(-3))
o_3-o_4

## [1] -0.3931451

The estimated odds changes by 0.3931451 if the MMSE changes from -4 to -3.

How much does the estimate of \(\pi\) change if the \(MMSE\) changes from -4 to -3?

p_4 = (exp(-0.742-0.294*(-4)))/(1+exp(-0.742-0.294*(-4)))
p_3 = (exp(-0.742-0.294*(-3)))/(1+exp(-0.742-0.294*(-3)))
p_3-p_4

## [1] -0.07188548

The estimate of \(\pi\) will change by 0.07188548 if the MMSE changes form -4 to -3.