For this problem use the bupa.csv data set. Check UCI Machine Learning Repository for more information (http://archive.ics.uci.edu/ml/datasets/Liver+Disorders). The mean corpuscular volume and alkaline phosphatase are blood tests thought to be sensitive to liver disorder related to excessive alcohol consumption. We assume that normality and independence assumptions are valid.
# Read data
q2.data <- read.csv("bupa.csv")
# Check the structure
str(q2.data)
## 'data.frame': 345 obs. of 3 variables:
## $ mcv : int 85 85 86 91 87 98 88 88 92 90 ...
## $ alkphos : int 92 64 54 78 70 55 62 67 54 60 ...
## $ drinkgroup: int 1 1 1 1 1 1 1 1 1 1 ...
table(q2.data$drinkgroup)
##
## 1 2 3 4 5
## 117 52 88 67 21
# Convert int to factor for drink gtoup variable
q2.data$drinkgroup <- factor(q2.data$drinkgroup)
#check the result
str(q2.data)
## 'data.frame': 345 obs. of 3 variables:
## $ mcv : int 85 85 86 91 87 98 88 88 92 90 ...
## $ alkphos : int 92 64 54 78 70 55 62 67 54 60 ...
## $ drinkgroup: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
H0:
There are no difference between drink groups, in the other word mean of all groups are equal, thus there is no effect of drinking groups on MCV (mean corpuscular volume) factor.
H1:
At least one group has a different mean, thus there is drinking effect on MCV (mean corpuscular volume) factor.
table(q2.data$drinkgroup)
##
## 1 2 3 4 5
## 117 52 88 67 21
Data of “drinkinggroup” is not balanced, but because this is “One_Way ANOVA” and we have only one independent variable, thus unbalancing is not important and we can continue the ANOVA test.
q2.aov.a <- aov(data=q2.data, mcv ~ drinkgroup)
summary(q2.aov.a)
## Df Sum Sq Mean Sq F value Pr(>F)
## drinkgroup 4 733 183.29 10.26 7.43e-08 ***
## Residuals 340 6073 17.86
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We must check equality of variances with 2 ways:
1) Levente Test
2) Residuals diagram
here we will check the leven test, next at the diagnostic step, we will check the residuals plot.
leveneTest(q2.aov.a)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 0.3053 0.8744
## 340
P-Value is very large, thus we don’t have enough evidence to reject the Null, therefore the variances are equal (Variances are Homogeneity)
par(mfrow=c(2,2))
plot(q2.aov.a)
According to QQ plot, normality assumption is reasonable.
Regarding to Residuals plot, the variances are equal (Same as result of Levene Test).
q2.lm.a <- lm(q2.data$mcv~q2.data$drinkgroup)
aov(q2.lm.a)
## Call:
## aov(formula = q2.lm.a)
##
## Terms:
## q2.data$drinkgroup Residuals
## Sum of Squares 733.177 6073.055
## Deg. of Freedom 4 340
##
## Residual standard error: 4.226337
## Estimated effects may be unbalanced
summary(q2.lm.a)$r.square
## [1] 0.1077214
The R Square is the percentage of variation in a response variable that is explained by the model.
According the Step (2), P-Value is very small and significant, thus we don’t have any evidence to accept Null, therefore we reject Null.
Final Result:
At least one group has a different mean, thus there is drinking effect on MCV (mean corpuscular volume) factor.
H0:
There are no difference between drink groups, in the other word mean of all groups are equal, thus there is no effect of drinking on “Alkaline phosphate” factor.
H1:
At least one group has a different mean, thus there is drinking effect on “alkaline phosphate” factor.
table(q2.data$drinkgroup)
##
## 1 2 3 4 5
## 117 52 88 67 21
Same as last part, data of “drinkinggroup” is not balanced, but because this is “One_Way ANOVA” and we have only one independent variable, thus unbalancing is not important and we can continue the ANOVA test.
q2.aov.b=aov(data=q2.data, alkphos ~ drinkgroup)
summary(q2.aov.b)
## Df Sum Sq Mean Sq F value Pr(>F)
## drinkgroup 4 4946 1236.4 3.792 0.00495 **
## Residuals 340 110858 326.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We must check equality of variances with 2 ways:
1) Levente Test
2) Residuals diagram
here we will check the leven test, next at the diagnostic step, we will check the residuals plot.
leveneTest(q2.aov.b)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 0.8089 0.5201
## 340
P-Value is very large, thus we don’t have enough evidence to reject the Null, therefore the variances are equal (Variances are Homogeneity)
par(mfrow=c(2,2))
plot(q2.aov.b)
According to QQ plot, normality assumption is reasonable, although there are some extreme values out of the line, but it’s acceptable, because ANOVA works on the semi-normal style data.
Regarding to Residuals plot, the variances are equal (Same as result of Levene Test).
q2.lm.b <- lm(q2.data$alkphos ~ q2.data$drinkgroup)
#aov(q2.lm.b)
summary(q2.lm.b)$r.square
## [1] 0.04270721
The R Square is the percentage of variation in a response variable that is explained by the model.
According the Step (2), P-Value is very small and significant, thus we don’t have any evidence to accept Null, therefore we reject Null.
Final Result:
- At least one group has a different mean, thus there is drinking effect on “alkaline phosphate” factor.
ScheffeTest(q2.aov.a)
##
## Posthoc multiple comparisons of means: Scheffe Test
## 95% family-wise confidence level
##
## $drinkgroup
## diff lwr.ci upr.ci pval
## 2-1 1.241452991 -0.94020481 3.423111 0.5410
## 3-1 0.938131313 -0.90892674 2.785189 0.6495
## 4-1 3.744610282 1.73913894 5.750082 1.9e-06 ***
## 5-1 3.746031746 0.64379565 6.848268 0.0081 **
## 3-2 -0.303321678 -2.59291786 1.986275 0.9966
## 4-2 2.503157290 0.08395442 4.922360 0.0380 *
## 5-2 2.504578755 -0.87987039 5.889028 0.2646
## 4-3 2.806478969 0.68408993 4.928868 0.0025 **
## 5-3 2.807900433 -0.37116998 5.986971 0.1151
## 5-4 0.001421464 -3.27222796 3.275071 1.0000
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
µ(group 4) > µ(group 1) AND µ(group 4) > µ(group 2) AND µ(group 4) > µ(group 3)
µ(group 5) > µ(group 1)
At least group 4 and group 5 have different mean. It means group 4 and group 5 (drinking 6 and more per day) have more effect on MCV (mean corpuscular volume).
ScheffeTest(q2.aov.b)
##
## Posthoc multiple comparisons of means: Scheffe Test
## 95% family-wise confidence level
##
## $drinkgroup
## diff lwr.ci upr.ci pval
## 2-1 -2.645299 -11.9663647 6.675766 0.9419
## 3-1 -4.056138 -11.9476367 3.835360 0.6389
## 4-1 -1.148743 -9.7170578 7.419571 0.9965
## 5-1 12.572650 -0.6815582 25.826857 0.0734 .
## 3-2 -1.410839 -11.1930681 8.371390 0.9953
## 4-2 1.496556 -8.8394138 11.832525 0.9952
## 5-2 15.217949 0.7579944 29.677903 0.0329 *
## 4-3 2.907395 -6.1604467 11.975236 0.9117
## 5-3 16.628788 3.0463078 30.211268 0.0069 **
## 5-4 13.721393 -0.2651729 27.707959 0.0578 .
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
µ(group 5) > µ(group 2) AND µ(group 5) > µ(group 3)
At least group 5 has the different mean with others, it means drinking same as group 5 has most effect on “alkaline phosphate” factor.