The heartbpchol.csv data set contains continuous cholesterol (Cholesterol) and blood pressure status (BP_Status) (category: High/ Normal/ Optimal) for alive patients.
For the heartbpchol.xlsx data set, consider a one-way ANOVA model to identify differences between group cholesterol means. The normality assumption is reasonable, so you can proceed without testing normality.
# Read the CSV file
q1.data <- read.csv("heartbpchol.csv")
# Check data format
str(q1.data)
## 'data.frame': 541 obs. of 2 variables:
## $ Cholesterol: int 221 188 292 319 205 247 202 150 228 280 ...
## $ BP_Status : chr "Optimal" "High" "High" "Normal" ...
# change BP_Status to Factor
q1.data$BP_Status <- factor(q1.data$BP_Status)
str(q1.data)
## 'data.frame': 541 obs. of 2 variables:
## $ Cholesterol: int 221 188 292 319 205 247 202 150 228 280 ...
## $ BP_Status : Factor w/ 3 levels "High","Normal",..: 3 1 1 2 2 1 3 2 1 1 ...
View data
head(q1.data)
## Cholesterol BP_Status
## 1 221 Optimal
## 2 188 High
## 3 292 High
## 4 319 Normal
## 5 205 Normal
## 6 247 High
H0:
There are no mean difference between blood pressure groups, in the other word, mean of blood pressures in any groups are equal.
H1:
At least one group in blood pressure have a difference mean, thus there exist cholesterol effect.
# Check the balancing data
table(q1.data$BP_Status)
##
## High Normal Optimal
## 229 245 67
Data of “BP_Status” is not balanced, but because this is “One_Way ANOVA” and we have only one independent variable, thus unbalancing is not important and we can continue the ANOVA test.
q1.aov=aov(q1.data$Cholesterol ~ q1.data$BP_Status)
summary(q1.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## q1.data$BP_Status 2 25211 12605 6.671 0.00137 **
## Residuals 538 1016631 1890
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We must check equality of variances with 2 ways:
1) Levente Test
2) Residuals diagram
here we will check the leven test, next at the diagnostic step, we will check the residuals plot.
LeveneTest(q1.aov)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.1825 0.8332
## 538
P-Value is very large, thus we don’t have enough evidence to reject the Null, therefore the variances are equal (Variances are Homogeneity)
par(mfrow=c(2,2))
plot(q1.aov)
Regarding to Residuals plot, the variances are equal (Same as result of Levene Test).
q1.lm=lm(data=q1.data, Cholesterol ~ BP_Status)
anova(q1.lm)
## Analysis of Variance Table
##
## Response: Cholesterol
## Df Sum Sq Mean Sq F value Pr(>F)
## BP_Status 2 25211 12605.4 6.6708 0.001375 **
## Residuals 538 1016631 1889.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(q1.lm)$r.squared
## [1] 0.02419833
The R Square is the percentage of variation in a response variable that is explained by the model.
According the Step (2), P-Value is very small and statisticaly significant, thus we don’t have any evidence to accept Null, therefore we reject Null.
Final Result:
- At least one group in blood pressure has different mean, it means there exist cholesterol effect on blood pressure.
ScheffeTest(q1.aov)
##
## Posthoc multiple comparisons of means: Scheffe Test
## 95% family-wise confidence level
##
## $`q1.data$BP_Status`
## diff lwr.ci upr.ci pval
## Normal-High -11.543481 -21.35092 -1.736038 0.0159 *
## Optimal-High -18.646679 -33.46702 -3.826341 0.0089 **
## Optimal-Normal -7.103198 -21.81359 7.607194 0.4958
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the other word, mean of the “High” is bigger than both of “Optimal” and “Normal”, it means the High has highest mean value of cholesterol
End of repoer