dan = read.csv("EX12-74DANDRUFF.csv")
head(dan)
## OBS Treatment Flaking
## 1 1 PyrI 17
## 2 2 PyrI 16
## 3 3 PyrI 18
## 4 4 PyrI 17
## 5 5 PyrI 18
## 6 6 PyrI 16
head(dan)
## OBS Treatment Flaking
## 1 1 PyrI 17
## 2 2 PyrI 16
## 3 3 PyrI 18
## 4 4 PyrI 17
## 5 5 PyrI 18
## 6 6 PyrI 16
res.aov_dan <- aov(Flaking ~ Treatment, data = dan )
summary(res.aov_dan)
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 3 4151 1383.8 967.8 <2e-16 ***
## Residuals 351 502 1.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pairwise.t.test(dan$Flaking, dan$Treatment, p.adj = "bonf")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: dan$Flaking and dan$Treatment
##
## Keto Placebo PyrI
## Placebo < 2e-16 - -
## PyrI 5.8e-15 < 2e-16 -
## PyrII 2.3e-11 < 2e-16 1
##
## P value adjustment method: bonferroni
Bonferroni correction leads to p-values that are all significant besides the comparison between the PyrII and PyrI groups, which is supported in the contrast analysis. This shows that overall, the placebo has a significantly different mean that the other non-placebo groups.
grocery = read.csv("EX13-11SMART2.csv")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
with = filter(grocery, Smartcart == "With")
wo = filter(grocery, Smartcart == "Without")
xbar_with = mean(with$TotalCost)
xbar_wo = mean(wo$TotalCost)
barplot(c(xbar_with, xbar_wo), names.arg = c("With", "Without"),
col = c("#eb8060", "#b9e38d"))
The plot shows that the means of the two groups are quite close. Without is slightly higher than With real-time feedback.
res.aov_grocery <- aov(TotalCost ~ Smartcart, data = grocery )
summary(res.aov_grocery)
## Df Sum Sq Mean Sq F value Pr(>F)
## Smartcart 1 58 57.91 1.026 0.312
## Residuals 192 10838 56.45
The degrees of freedeomfor the residuls is 192, the F statistic if 1.026 and the p-value is 0.312.
Based on this analysis, the estimated effects may be unbalanced, but the p-value suggest that there is not enough evidence to conclude that the means are significantly different ebtween the two groups. Therefore, there is not enough evidence to suggest that receiving real-time feedback has a significant effect on the total cost of the shopping cart.
biling = read.csv("EX13-22BILING.csv")
young_mon = filter(biling, Age == "Young" & Ling =="Mono")
old_mon = filter(biling, Age == "Old" & Ling =="Mono")
young_bi = filter(biling, Age == "Young" & Ling =="Bi")
old_bi = filter(biling, Age == "Old" & Ling =="Bi")
xbar_young_mon = mean(young_mon$Time)
xbar_old_mon = mean(old_mon$Time)
xbar_young_bi = mean(young_bi$Time)
xbar_old_bi = mean(old_bi$Time)
sd_young_mon = sd(young_mon$Time)
sd_old_mon = sd(old_mon$Time)
sd_young_bi = sd(young_bi$Time)
sd_old_bi = sd(old_bi$Time)
sample_sizes_biling = c(20, 20, 20, 20)
means_biling = c(xbar_young_mon, xbar_old_mon, xbar_young_bi, xbar_old_bi)
sds_biling = c(sd_young_mon, sd_old_mon, sd_young_bi, sd_old_bi)
data.frame(sample_sizes_biling, means_biling, sds_biling)
## sample_sizes_biling means_biling sds_biling
## 1 20 820.70 47.38543
## 2 20 996.85 53.59524
## 3 20 785.65 42.04293
## 4 20 919.30 54.38663
sqrt(54.38)/sqrt(42.02)
## [1] 1.137605
The rule for examining standard deviations in a pooled states that if the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations, and our results will still be approximately correct. In this case, the ratio of the largest to smallest standard deviation is \(\frac{\sqrt{54.38}}{\sqrt{42.04}} = 1.137 < 2\); therefore, we can pool the standard deviations as it meets the assumptions.
par(mfrow = c(2,2))
hist(young_mon$Time, breaks = 25)
hist(old_mon$Time, breaks = 25)
hist(young_bi$Time, breaks = 25)
hist(old_bi$Time, breaks = 25)
It appears that young groups are approximately normal, but the old groups are skewed or widely dispersed. Therefore, I would not make the qualitative conclusion that the old groups are approximately Normal.
interaction.plot(x.factor = biling$Age, #x-axis variable
trace.factor = biling$Ling, #variable for lines
response = biling$Time, #y-axis variable
fun = median, #metric to plot
ylab = "Time",
xlab = "Language Level",
col = c("pink", "blue"),
lty = 1, #line type
lwd = 2, #line width
trace.label = "Languages Known")
We might find an interaction between age and lingualism because the more we age, the hard it becomes to learn a language and the more brain function declines in general.
res.aov_biling <- aov(Time ~ Age + Ling, data= biling )
summary(res.aov_biling)
## Df Sum Sq Mean Sq F value Pr(>F)
## Age 1 479880 479880 188.5 < 2e-16 ***
## Ling 1 63394 63394 24.9 3.66e-06 ***
## Residuals 77 196055 2546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F-value for Age and Linguistic ability is 188.5 and 24.9,r espectively. Both variables has a significant p-value of less than 0.001. Both variables have a degree of freedom of 1 and the residuls df = 77.
This analysis suggests that an interaction effect between age and bilinguilism is abesnt since the p-values are significant for both. This suggests that both Age and Bilinguilism contribute to reaction times for the cognitive test.
library(dplyr)
drp_data = read.csv("EX16-18DRP.csv")
head(drp_data)
## id group g drp
## 1 1 Treat 0 24
## 2 2 Treat 0 56
## 3 3 Treat 0 43
## 4 4 Treat 0 59
## 5 5 Treat 0 58
## 6 6 Treat 0 52
drp_data <- drp_data %>%
mutate(group = as.factor(group))
meandiff <- function(d, i){
d = d[i,]
y = tapply(d$drp, d$group, mean)
y[1]-y[2]
}
library(boot)
drp_boot = boot(drp_data, meandiff, R = 2000, strata = drp_data$group)
drp_boot
##
## STRATIFIED BOOTSTRAP
##
##
## Call:
## boot(data = drp_data, statistic = meandiff, R = 2000, strata = drp_data$group)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* -9.954451 -0.01060248 4.24764
The bootstrap standard error is 4.159
plot(drp_boot) #graphing bootstrap distribution
mean(drp_boot$t) #checking bootstrap mean
## [1] -9.965054
The bootstrap t confidence interval is appropriate because the bootstrap distribution is approximately Normal
boot.ci(boot.out = drp_boot, type = "norm")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = drp_boot, type = "norm")
##
## Intervals :
## Level Normal
## 95% (-18.269, -1.619 )
## Calculations and Intervals on Original Scale
My bootstrap results are nearly identical to what is shown in example 7.14. this suggests that this confidence interval can be captured in multiple ways depending on the structure of the data. The actual mean of the boostrapoed data does in fact lie within this interval.
theta <- function(data,i) {
d = data[i]
mean(d)
}
# tvtime = read.csv("EX16-24TVTIME.csv")
time = c(3, 16.5, 10.5, 40.5, 5.5, 33.5, 0, 6.5)
tvtime = data.frame(time)
xbar_tvtime = mean(tvtime$Time, na.rm = T)
## Warning in mean.default(tvtime$Time, na.rm = T): argument is not numeric or
## logical: returning NA
tvtime_boot = boot(data = tvtime$time, statistic = theta, R = 2000)
plot(tvtime_boot)
Rhe plot show an approximately Normal distribution with a narrow variance about the mean. The observed xbar is 14.5 and there seems to be a tight grouping of means around that value for t*.
boot.ci(boot.out = tvtime_boot, type = "norm")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = tvtime_boot, type = "norm")
##
## Intervals :
## Level Normal
## 95% ( 4.99, 24.17 )
## Calculations and Intervals on Original Scale
The 95% bootstrap confidence interval for \(\mu\) is (5.11, 23.86).
t.test(tvtime$time)
##
## One Sample t-test
##
## data: tvtime$time
## t = 2.761, df = 7, p-value = 0.02806
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 2.081702 26.918298
## sample estimates:
## mean of x
## 14.5
The usual t interval is (2.08, 26.91), which is wider than the one capture via bootstrapping of (5.11, 23.86). Neither interval includes 0. The wider interval for the t procedure could eb due to the small sample size and the fact that bootstrapping creates a new distribution from the original data directly.