|

1.0 Overview

Analysis of Variance (ANOVA) is a statistical technique, commonly used to studying differences between two or more group means. ANOVA test is centred on the different sources of variation in a typical variable. ANOVA in R primarily provides evidence of the existence of the mean equality between the groups. This statistical method is an extension of the t-test. It is used in a situation where the factor variable has more than one group. The idea behind the ANOVA test is very simple: if the average variation between groups is large enough compared to the average variation within groups, then you could conclude that at least one group mean is not equal to the others.

Thus, it’s possible to evaluate whether the differences between the group means are significant by comparing the two variance estimates. This is why the method is called analysis of variance even though the main goal is to compare the group means.

Briefly, the mathematical procedure behind the ANOVA test is as follow:

Compute the within-group variance, also known as residual variance. This tells us, how different each participant is from their own group mean.Compute the variance between group means.

1.1 Assumptions

The ANOVA test makes the following assumptions about the data:

Independence of the observations: Each subject should belong to only one group. There is no relationship between the observations in each group. Having repeated measures for the same participants is not allowed.

No significant outliers: Theres should not be extreme values in any cell of the design.

Normality: The data for each design cell should be approximately normally distributed.

Homogeneity of variances: The variance of the outcome variable should be equal in every cell of the design. Before computing ANOVA test, you need to perform some preliminary tests to check if the assumptions are met.

2.0 Anova for the Survey Questions

One-way ANOVA: Here we shall be conducting a one way ANOVA to see if there is a significant difference in the responses of the 4 category of questions in the survey by 5 groups of respondants in the survey.

The 4 category of the survey are as- Communication, Service Delivery, Facilities and Equipment and Information Resources

The 5 Respondatnts of the survey are as- Undergrads, Postgrads, Faculty, Exchnge Students and Others

2.1 Loading Relevant Packages

The following snippet of code is used to load the relevant packages.

2.2 Loading the dataset

The library dataset is imported in R console to perform the PCA

2.3 Checking the Assumptions

Summary of the Communication data

Summary of the Service data

Summary of the Facility data

Summary of the Information data

Outlier Check of the Communication data

Outlier Check of the Service data

Outlier Check of the Facility data

Outlier Check of the Information data

Normality Check of the data

2.4 Conductive the ANOVA

## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()

3.0 Plotting the output of Anova

Anova for Communication Related Questions

Anova for Service Related Questions

Anova for Facility Related Questions

Anova for Information Related Questions

## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomLabelRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomSignif() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

Plotting the interactive outputs of the 4 Categories of Questions

4.0 Conclusion

The output displays the results of all pairwise comparisons among the tested groups (here 5 groups) You’ll find the actual difference between the means under diff and the adjusted p-value (p adj) for each pairwise comparison. Looking at the above table , the only significant difference to be reported in the present test is between the means of the groups Faculty and Others as difference in p-value is less than 0.05.

The F-statistic is used to test if the data are from significantly different populations, i.e., different sample means.To compute the F-statistic, you need to divide the between-group variability over the within-group variability.The between-group variability reflects the differences between the groups inside all of the population. Look at the two graphs below to understand the concept of between-group variance.The above graphs shows variation between the five groups.