Analysis of Variance (ANOVA) is a statistical technique, commonly used to studying differences between two or more group means. ANOVA test is centred on the different sources of variation in a typical variable. ANOVA in R primarily provides evidence of the existence of the mean equality between the groups. This statistical method is an extension of the t-test. It is used in a situation where the factor variable has more than one group. The idea behind the ANOVA test is very simple: if the average variation between groups is large enough compared to the average variation within groups, then you could conclude that at least one group mean is not equal to the others.
Thus, it’s possible to evaluate whether the differences between the group means are significant by comparing the two variance estimates. This is why the method is called analysis of variance even though the main goal is to compare the group means.
Briefly, the mathematical procedure behind the ANOVA test is as follow:
Compute the within-group variance, also known as residual variance. This tells us, how different each participant is from their own group mean.Compute the variance between group means.
The ANOVA test makes the following assumptions about the data:
Independence of the observations: Each subject should belong to only one group. There is no relationship between the observations in each group. Having repeated measures for the same participants is not allowed.
No significant outliers: Theres should not be extreme values in any cell of the design.
Normality: The data for each design cell should be approximately normally distributed.
Homogeneity of variances: The variance of the outcome variable should be equal in every cell of the design. Before computing ANOVA test, you need to perform some preliminary tests to check if the assumptions are met.
One-way ANOVA: Here we shall be conducting a one way ANOVA to see if there is a significant difference in the responses of the 4 category of questions in the survey by 5 groups of respondants in the survey.
The 4 category of the survey are as- Communication, Service Delivery, Facilities and Equipment and Information Resources
The 5 Respondatnts of the survey are as- Undergrads, Postgrads, Faculty, Exchnge Students and Others
The following snippet of code is used to load the relevant packages.
The library dataset is imported in R console to perform the PCA
d1$cat <- ifelse(d1$Position==1 | d1$Position== 2 |d1$Position== 3 | d1$Position== 4, "Undergrads",
ifelse(d1$Position==5 , "Exchange",
ifelse(d1$Position== 6 | d1$Position== 7, "PostGrads",
ifelse(d1$Position== 8 | d1$Position== 9| d1$Position== 10| d1$Position== 11| d1$Position== 12, "Faculty",
ifelse(d1$Position== 13 |d1$Position== 14,"Others",
NA )))) )Selecting the Improvement score for Analysis
Preparing the data for Communication Related Questions
#Communications
dc<-d2 %>% select(ResponseID,cat,I01,I02,I03)
dc$Mean_Comm<-rowMeans(dc[,3:5],na.rm =TRUE)
dcbox<-dc[,c(1,2,6)]
dcbox <- na.omit(dcbox)Preparing the data for Service Delivery Related Questions
#Serice Delivery
ds<-d2 %>% select(ResponseID,cat,I04,I05,I06,I07,I08,I09,I10,I11,I12,I13)
ds$Mean_Serv<-rowMeans(ds[,3:12],na.rm =TRUE)
dsbox<-ds[,c(1,2,13)]
dsbox <- na.omit(dsbox)Preparing the data for Facilities and Information Related Questions
#Facilities and Information
dfac<-d2 %>% select(ResponseID,cat,I14,I15,I16,I17,I18,I19,I20)
dfac$Mean_Fac<-rowMeans(dfac[,3:9],na.rm =TRUE)
dfacbox<-dfac[,c(1,2,10)]
dfacbox <- na.omit(dfacbox)Preparing the data for Information Resources Related Questions
Summary of the Communication data
Summary of the Service data
Summary of the Facility data
Summary of the Information data
Outlier Check of the Communication data
Outlier Check of the Service data
Outlier Check of the Facility data
Outlier Check of the Information data
Normality Check of the data
model1 <- lm(Mean_Comm ~ cat, data = dcbox)
model2 <- lm(Mean_Serv ~ cat, data = dsbox)
model3 <- lm(Mean_Fac ~ cat, data = dfacbox)
model4 <- lm(Mean_Inf ~ cat, data = dinfbox)n1<- ggqqplot(dcbox, "Mean_Comm", facet.by = "cat",title ="Test for Normality for Communication")
s1<-shapiro_test(residuals(model1))
n1## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
Anova for Communication Related Questions
set.seed(123)
g1 <- ggstatsplot::ggbetweenstats(
data = dcbox,
x = cat,
y = Mean_Comm,
mean.plotting = TRUE,
mean.ci = TRUE,
pairwise.comparisons = TRUE,
notch = FALSE,
type = "np",
k=3,
title = "Differences in mean ratings \n(Communication)",
messages = FALSE)
g1#Code for interactive plot
g2 <- plotly::ggplotly(g1, tooltip=c("text","x","y"))
g2 <- g2 %>% layout(yaxis= list(title = "Mean Ratingsof the respondents",
titlefont=list(family='Arial', size=12),
tickfont=list(family='Arial', size = 13)),
xaxis=list(tickfont=list(family='Arial', size = 11))
) Anova for Service Related Questions
set.seed(123)
g3 <- ggstatsplot::ggbetweenstats(
data = dsbox,
x = cat,
y = Mean_Serv,
mean.plotting = TRUE,
mean.ci = TRUE,
pairwise.comparisons = TRUE,
notch = FALSE,
type = "np",
k=3,
title = "Differences in mean ratings \n(Service)",
messages = FALSE)
g3#Code for interactive plot
g4 <- plotly::ggplotly(g3, tooltip=c("text","x","y"))
g4 <- g4 %>% layout(yaxis= list(title = "Mean Ratingsof the respondents",
titlefont=list(family='Arial', size=12),
tickfont=list(family='Arial', size = 13)),
xaxis=list(tickfont=list(family='Arial', size = 11))
) Anova for Facility Related Questions
set.seed(123)
g5 <- ggstatsplot::ggbetweenstats(
data = dfacbox,
x = cat,
y = Mean_Fac,
mean.plotting = TRUE,
mean.ci = TRUE,
pairwise.comparisons = TRUE,
notch = FALSE,
type = "np",
k=3,
title = "Differences in mean ratings \n(Facility)",
messages = FALSE)
g5#Code for interactive plot
g6 <- plotly::ggplotly(g5, tooltip=c("text","x","y"))
g6 <- g6 %>% layout(yaxis= list(title = "Mean Ratingsof the respondents",
titlefont=list(family='Arial', size=12),
tickfont=list(family='Arial', size = 13)),
xaxis=list(tickfont=list(family='Arial', size = 11))
) Anova for Information Related Questions
set.seed(123)
g7 <- ggstatsplot::ggbetweenstats(
data = dinfbox,
x = cat,
y = Mean_Inf,
mean.plotting = TRUE,
mean.ci = TRUE,
pairwise.comparisons = TRUE,
notch = FALSE,
type = "np",
k=3,
title = "Differences in mean ratings \n(Information)",
messages = FALSE)
g7## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomLabelRepel() has yet to be implemented in plotly.
## If you'd like to see this geom implemented,
## Please open an issue with your example code at
## https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomSignif() has yet to be implemented in plotly.
## If you'd like to see this geom implemented,
## Please open an issue with your example code at
## https://github.com/ropensci/plotly/issues
g8 <- g8 %>% layout(yaxis= list(title = "Mean Ratingsof the respondents",
titlefont=list(family='Arial', size=12),
tickfont=list(family='Arial', size = 13)),
xaxis=list(tickfont=list(family='Arial', size = 11))
) Plotting the interactive outputs of the 4 Categories of Questions
The output displays the results of all pairwise comparisons among the tested groups (here 5 groups) You’ll find the actual difference between the means under diff and the adjusted p-value (p adj) for each pairwise comparison. Looking at the above table , the only significant difference to be reported in the present test is between the means of the groups Faculty and Others as difference in p-value is less than 0.05.
The F-statistic is used to test if the data are from significantly different populations, i.e., different sample means.To compute the F-statistic, you need to divide the between-group variability over the within-group variability.The between-group variability reflects the differences between the groups inside all of the population. Look at the two graphs below to understand the concept of between-group variance.The above graphs shows variation between the five groups.