12.74

dan = read.csv("EX12-74DANDRUFF.csv")

head(dan)
##   OBS Treatment Flaking
## 1   1      PyrI      17
## 2   2      PyrI      16
## 3   3      PyrI      18
## 4   4      PyrI      17
## 5   5      PyrI      18
## 6   6      PyrI      16
head(dan)
##   OBS Treatment Flaking
## 1   1      PyrI      17
## 2   2      PyrI      16
## 3   3      PyrI      18
## 4   4      PyrI      17
## 5   5      PyrI      18
## 6   6      PyrI      16
res.aov_dan <- aov(Flaking ~ Treatment, data = dan )

summary(res.aov_dan)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Treatment     3   4151  1383.8   967.8 <2e-16 ***
## Residuals   351    502     1.4                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pairwise.t.test(dan$Flaking, dan$Treatment, p.adj = "bonf")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  dan$Flaking and dan$Treatment 
## 
##         Keto    Placebo PyrI
## Placebo < 2e-16 -       -   
## PyrI    5.8e-15 < 2e-16 -   
## PyrII   2.3e-11 < 2e-16 1   
## 
## P value adjustment method: bonferroni

Bonferroni correction leads to p-values that are all significant besides the comparison between the PyrII and PyrI groups, which is supported in the contrast analysis. This shows that overall, the placebo has a significantly different mean that the other non-placebo groups.

13.11

grocery = read.csv("EX13-11SMART2.csv")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
with = filter(grocery, Smartcart == "With")
wo = filter(grocery, Smartcart == "Without")

a)

xbar_with = mean(with$TotalCost)

xbar_wo = mean(wo$TotalCost)

barplot(c(xbar_with, xbar_wo), names.arg = c("With", "Without"), 
        col = c("#eb8060", "#b9e38d"))

The plot shows that the means of the two groups are quite close. Without is slightly higher than With real-time feedback.

b)

res.aov_grocery <- aov(TotalCost ~ Smartcart, data = grocery )

summary(res.aov_grocery)
##              Df Sum Sq Mean Sq F value Pr(>F)
## Smartcart     1     58   57.91   1.026  0.312
## Residuals   192  10838   56.45

c)

The degrees of freedeomfor the residuls is 192, the F statistic if 1.026 and the p-value is 0.312.

Based on this analysis, the estimated effects may be unbalanced, but the p-value suggest that there is not enough evidence to conclude that the means are significantly different ebtween the two groups. Therefore, there is not enough evidence to suggest that receiving real-time feedback has a significant effect on the total cost of the shopping cart.

13.22

a)

biling = read.csv("EX13-22BILING.csv")
young_mon = filter(biling, Age == "Young" & Ling =="Mono")
old_mon = filter(biling, Age == "Old" & Ling =="Mono")
young_bi = filter(biling, Age == "Young" & Ling =="Bi")
old_bi = filter(biling, Age == "Old" & Ling =="Bi")
xbar_young_mon = mean(young_mon$Time)
xbar_old_mon = mean(old_mon$Time)
xbar_young_bi = mean(young_bi$Time)
xbar_old_bi = mean(old_bi$Time)
sd_young_mon = sd(young_mon$Time)
sd_old_mon = sd(old_mon$Time)
sd_young_bi = sd(young_bi$Time)
sd_old_bi = sd(old_bi$Time)
sample_sizes_biling = c(20, 20, 20, 20)
means_biling = c(xbar_young_mon, xbar_old_mon, xbar_young_bi, xbar_old_bi)
sds_biling = c(sd_young_mon, sd_old_mon, sd_young_bi, sd_old_bi)

data.frame(sample_sizes_biling, means_biling, sds_biling)
##   sample_sizes_biling means_biling sds_biling
## 1                  20       820.70   47.38543
## 2                  20       996.85   53.59524
## 3                  20       785.65   42.04293
## 4                  20       919.30   54.38663
sqrt(54.38)/sqrt(42.02)
## [1] 1.137605

The rule for examining standard deviations in a pooled states that if the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations, and our results will still be approximately correct. In this case, the ratio of the largest to smallest standard deviation is \(\frac{\sqrt{54.38}}{\sqrt{42.04}} = 1.137 < 2\); therefore, we can pool the standard deviations as it meets the assumptions.

b)

par(mfrow = c(2,2))

hist(young_mon$Time, breaks = 25)
hist(old_mon$Time, breaks = 25)
hist(young_bi$Time, breaks = 25)
hist(old_bi$Time, breaks = 25)

It appears that young groups are approximately normal, but the old groups are skewed or widely dispersed. Therefore, I would not make the qualitative conclusion that the old groups are approximately Normal.

13.23

a)

interaction.plot(x.factor = biling$Age, #x-axis variable
                 trace.factor = biling$Ling, #variable for lines
                 response = biling$Time, #y-axis variable
                 fun = median, #metric to plot
                 ylab = "Time",
                 xlab = "Language Level",
                 col = c("pink", "blue"),
                 lty = 1, #line type
                 lwd = 2, #line width
                 trace.label = "Languages Known")

We might find an interaction between age and lingualism because the more we age, the hard it becomes to learn a language and the more brain function declines in general.

b)

res.aov_biling <- aov(Time ~ Age + Ling, data= biling )

summary(res.aov_biling)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Age          1 479880  479880   188.5  < 2e-16 ***
## Ling         1  63394   63394    24.9 3.66e-06 ***
## Residuals   77 196055    2546                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The F-value for Age and Linguistic ability is 188.5 and 24.9,r espectively. Both variables has a significant p-value of less than 0.001. Both variables have a degree of freedom of 1 and the residuls df = 77.

c)

This analysis suggests that an interaction effect between age and bilinguilism is abesnt since the p-values are significant for both. This suggests that both Age and Bilinguilism contribute to reaction times for the cognitive test.

16.18

library(dplyr)
drp_data = read.csv("EX16-18DRP.csv")
head(drp_data)
##   id group g drp
## 1  1 Treat 0  24
## 2  2 Treat 0  56
## 3  3 Treat 0  43
## 4  4 Treat 0  59
## 5  5 Treat 0  58
## 6  6 Treat 0  52
drp_data <- drp_data %>% 
mutate(group = as.factor(group)) 

a)

meandiff <- function(d, i){
  d = d[i,] 
  y = tapply(d$drp, d$group, mean) 
  y[1]-y[2] 
} 
library(boot)

drp_boot = boot(drp_data, meandiff, R = 2000, strata = drp_data$group)
drp_boot
## 
## STRATIFIED BOOTSTRAP
## 
## 
## Call:
## boot(data = drp_data, statistic = meandiff, R = 2000, strata = drp_data$group)
## 
## 
## Bootstrap Statistics :
##      original      bias    std. error
## t1* -9.954451 -0.01060248     4.24764

The bootstrap standard error is 4.159

b)

plot(drp_boot) #graphing bootstrap distribution

mean(drp_boot$t) #checking bootstrap mean
## [1] -9.965054

The bootstrap t confidence interval is appropriate because the bootstrap distribution is approximately Normal

boot.ci(boot.out = drp_boot, type = "norm") 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = drp_boot, type = "norm")
## 
## Intervals : 
## Level      Normal        
## 95%   (-18.269,  -1.619 )  
## Calculations and Intervals on Original Scale

c)

My bootstrap results are nearly identical to what is shown in example 7.14. this suggests that this confidence interval can be captured in multiple ways depending on the structure of the data. The actual mean of the boostrapoed data does in fact lie within this interval.

16.24

theta <- function(data,i) {
  d = data[i]
  mean(d)
}
# tvtime = read.csv("EX16-24TVTIME.csv")
time = c(3, 16.5, 10.5, 40.5, 5.5, 33.5, 0, 6.5) 
tvtime = data.frame(time) 

a)

xbar_tvtime = mean(tvtime$Time,  na.rm = T)
## Warning in mean.default(tvtime$Time, na.rm = T): argument is not numeric or
## logical: returning NA
tvtime_boot = boot(data = tvtime$time, statistic = theta, R = 2000)
plot(tvtime_boot)

Rhe plot show an approximately Normal distribution with a narrow variance about the mean. The observed xbar is 14.5 and there seems to be a tight grouping of means around that value for t*.

b)

boot.ci(boot.out = tvtime_boot, type = "norm")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = tvtime_boot, type = "norm")
## 
## Intervals : 
## Level      Normal        
## 95%   ( 4.99, 24.17 )  
## Calculations and Intervals on Original Scale

The 95% bootstrap confidence interval for \(\mu\) is (5.11, 23.86).

c)

t.test(tvtime$time)
## 
##  One Sample t-test
## 
## data:  tvtime$time
## t = 2.761, df = 7, p-value = 0.02806
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   2.081702 26.918298
## sample estimates:
## mean of x 
##      14.5

The usual t interval is (2.08, 26.91), which is wider than the one capture via bootstrapping of (5.11, 23.86). Neither interval includes 0. The wider interval for the t procedure could eb due to the small sample size and the fact that bootstrapping creates a new distribution from the original data directly.