2024-09-17

Analysis of Variance (ANOVA)

  • Analysis of Variance (ANOVA) is a statistical method used to compare the means of three or more groups to determine if there is a statistically significant difference between them. In this case, we are interested in comparing the mean stock prices of companies across different sectors.

  • ANOVA is used to determine differences between research results from three or more unrelated samples or groups.

  • You might use ANOVA when you want to test a particular hypothesis between groups, determining – in using one-way ANOVA – the relationship between an independent variable and one quantitative dependent variable.

Types of ANOVA Tests

  • A One Way ANOVA is used to compare two means from two independent (unrelated) groups using the F-distribution. The null hypothesis for the test is that the two means are equal. Therefore, a significant result means that the two means are unequal.

  • Note: A one way ANOVA will tell you that at least two groups were different from each other. But it won’t tell you which groups were different.

  • A Two-Way ANOVA is an extension of the One-Way ANOVA. With a One-Way, you have one independent variable affecting a dependent variable. With a Two-Way ANOVA, there are two independent variables, and it also tests for interaction between them.

One-Way ANOVA

  • One-way ANOVA compares the means of three or more independent groups based on a single factor.

Mathematically, we test: \[ H_0: \mu_1 = \mu_2 = \dots = \mu_k \] \[ H_A: \text{At least one } \mu_i \text{ differs} \]

Where \(\mu\) is the population mean for each group.

One-Way ANOVA Example in R

  • using built-in ‘Cars’ data set.

  • code in R: data(mtcars) anova_result <- aov(mpg ~ factor(cyl), data = mtcars) summary(anova_result)

  • example one-way result:

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(cyl)  2  824.8   412.4    39.7 4.98e-09 ***
## Residuals   29  301.3    10.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

One-Way ANOVA Example Plot

## $x
## [1] "Number of Cylinders"
## 
## $y
## [1] "Miles per Gallon"
## 
## $title
## [1] "Miles per Gallon by Cylinder Count"
## 
## attr(,"class")
## [1] "labels"
  • plot code in R: ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_boxplot(fill = “magenta”) labs(title = “Miles per Gallon by Cylinder Count”, x = “Number of Cylinders”, y = “Miles per Gallon”)

Two-Way ANOVA

  • A Two-Way ANOVA is an extension of the One-Way ANOVA. With a One-Way, you have one independent variable affecting a dependent variable. With a Two-Way ANOVA, there are two independent variables, and it also tests for interaction between them.

Mathematically, we are testing: \[ H_0: \mu_{ij} = \mu_{kl} \quad \text{for all } i \neq k \text{ and } j \neq l \] \[ H_A: \text{At least one group differs based on the levels of factor A or factor B, or their interaction.} \]

Where: - \(\mu_{ij}\) is the mean of the group corresponding to level \(i\) of factor A and level \(j\) of factor B. - Factor A and Factor B represent the two independent variables.

Two-Way ANOVA Example in R

  • Using built-in ‘Cars’ data set.

  • code in R: anova_result_2way <- aov(mpg ~ factor(cyl) * factor(am), data = mtcars) summary(anova_result_2way)

  • example two-way result:

##                        Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(cyl)             2  824.8   412.4  44.852 3.73e-09 ***
## factor(am)              1   36.8    36.8   3.999   0.0561 .  
## factor(cyl):factor(am)  2   25.4    12.7   1.383   0.2686    
## Residuals              26  239.1     9.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two-Way ANOVA Example Plot

- plot code in R: ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(am))) + geom_boxplot() + labs(title = “Miles per Gallon by Cylinder Count and Transmission”, x = “Number of Cylinders”, y = “Miles per Gallon”, fill = “Transmission (0 = Automatic, 1 = Manual)”)

Visualizing ANOVA Results

  • Visualizing ANOVA in 3D
  • In this case, we are interested in the relationship between miles per gallon (mpg), the number of cylinders (cyl), and the weight of the cars (wt).
  • By plotting these variables in 3D, we can visually explore the variation in mpg based on the number of cylinders and car weight:

Works Cited