ANOVA is a statistical method for analysing the variance in a study. It’s used to look at variations in the dependent variable’s mean values that are linked to the influence of independent variables. ANOVA is a method for comparing the means of two or more winners.
ANOVA can be devided into two parts, one is One way and second one is Two-way ANOVA
The one-way ANOVA contrasts the means of the categories that interested in to see if all of them are statistically substantially different from one another. It examines the null hypothesis.
The mean of a quantitative variable is estimated using a two-way ANOVA based on the levels of two categorical variables.
$$
H0 : μ1 = μ2 =μ3 = …..= μk
HA : μi ≠μj for some i and j
$$
Where k = number of groups and μ = group mean. If, on the other hand, the one-way ANOVA provides a statistically significant finding, we support the alternate hypothesis (HA), which states that there are at least two statistically significant group means.
head(data)
## Name Weight Chest Waist Hips
## 1 P1 63.5 81.3 78.74 101.60
## 2 P2 63.5 81.3 78.74 100.33
## 3 P3 63.0 81.3 78.74 100.33
## 4 P4 63.0 81.3 71.12 99.06
## 5 P5 63.0 81.3 71.12 99.06
## 6 P6 62.6 81.3 71.12 99.06
Lets add above values into R
data = data.frame("A" = c(63.5,81.3,78.74,101.60), "B" = c(63.5,81.3,78.74,100.33), "C" = c(63.0,81.3,78.74,100.33), "D" = c(63.0,81.3,71.12,99.06),"E" = c(63.0,81.3,71.12,99.06),"Body"=1:4)
data
## A B C D E Body
## 1 63.50 63.50 63.00 63.00 63.00 1
## 2 81.30 81.30 81.30 81.30 81.30 2
## 3 78.74 78.74 78.74 71.12 71.12 3
## 4 101.60 100.33 100.33 99.06 99.06 4
I have organized this information. It took me a long time to figure out the right code, but I finally did so here. By the way, I don’t believe I was required to include the plant number, but I did.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
test <-
data %>%
pivot_longer(c('A','B','C','D','E'), names_to = "Weight", values_to = "Chest")
test
## # A tibble: 20 x 3
## Body Weight Chest
## <int> <chr> <dbl>
## 1 1 A 63.5
## 2 1 B 63.5
## 3 1 C 63
## 4 1 D 63
## 5 1 E 63
## 6 2 A 81.3
## 7 2 B 81.3
## 8 2 C 81.3
## 9 2 D 81.3
## 10 2 E 81.3
## 11 3 A 78.7
## 12 3 B 78.7
## 13 3 C 78.7
## 14 3 D 71.1
## 15 3 E 71.1
## 16 4 A 102.
## 17 4 B 100.
## 18 4 C 100.
## 19 4 D 99.1
## 20 4 E 99.1
Compute Df Sum Sq Mean Sq F value Pr(>F):
fm <- aov(Chest ~ Weight, test)
summary(fm)
## Df Sum Sq Mean Sq F value Pr(>F)
## Weight 4 28 7.08 0.03 0.998
## Residuals 15 3574 238.25
Visualize the data with ggplot, boxplot.
library(ggpubr)
## Loading required package: ggplot2
library(ggmosaic)
library(ggplot2)
ggplot(test , aes(x = Chest, y = Weight)) +
geom_boxplot()
Check the homogeneity of variance assumption. The residuals versus fits plot can be used to check the homogeneity of variances.
plot(fm, 1:4)
In the above graph displayed a relationship. Some points are fall on that line. All are close enough to continue with our results.
For more visualization I have expressed the differences in these means.
y1 <- mean(data$Weight, na.rm = TRUE)
## Warning in mean.default(data$Weight, na.rm = TRUE): argument is not numeric or
## logical: returning NA
ggplot(test , aes(x = Chest, y = Weight)) +
geom_point() +
stat_summary(fun.data = 'mean_se',color = "green") +
geom_hline(yintercept = y1, color ="blue", linetype = "dashed")
## Warning: Removed 1 rows containing missing values (geom_hline).
summary(data1)
## Ship_Mode Profit Unit_Price Shipping_Cost
## Length:264 Min. :1877 Min. : 2.88 Min. : 0.50
## Class :character 1st Qu.:1909 1st Qu.: 5.28 1st Qu.:74.35
## Mode :character Median :1931 Median : 40.42 Median :74.35
## Mean :1929 Mean :101.48 Mean :70.51
## 3rd Qu.:1952 3rd Qu.:120.98 3rd Qu.:74.35
## Max. :1967 Max. :500.98 Max. :74.35
## Customer_Name
## Length:264
## Class :character
## Mode :character
##
##
##
Display the data within the table (Customer name wise profit).
table(data1$Customer_Name,data1$Profit)
##
## 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887
## Allen Rosenblatt 0 0 0 0 0 1 0 0 0 0 0
## Barry French 0 1 1 0 0 0 0 0 0 0 0
## Carl Ludwig 0 0 0 0 0 0 0 0 1 1 0
## Carlos Soltero 0 0 0 0 0 0 0 1 1 0 0
## Claudia Miner 0 0 0 0 1 0 0 0 0 0 0
## Clay Rozendal 0 0 0 1 0 0 0 0 0 0 0
## Don Miller 0 0 0 0 0 0 0 0 0 1 0
## Edward Hooks 0 0 0 0 0 0 0 0 0 0 0
## Eugene Barchas 0 0 0 0 0 0 0 0 0 0 0
## Jack Garza 0 0 0 0 0 0 0 0 0 0 1
## Jim Radford 0 0 0 0 0 0 1 1 0 0 0
## Julia West 0 0 0 0 0 0 0 0 0 0 1
## Muhammed MacIntyre 1 0 0 0 0 0 0 0 0 0 0
## Neola Schneider 0 0 0 0 1 0 0 0 0 0 0
## Sylvia Foulston 0 0 0 0 0 1 1 0 0 0 0
##
## 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898
## Allen Rosenblatt 0 0 0 0 0 1 0 0 0 0 0
## Barry French 0 0 1 1 0 0 0 0 0 0 0
## Carl Ludwig 0 0 0 0 0 0 0 0 1 1 0
## Carlos Soltero 0 0 0 0 0 0 0 1 1 0 0
## Claudia Miner 0 0 0 0 1 0 0 0 0 0 0
## Clay Rozendal 0 0 0 1 0 0 0 0 0 0 0
## Don Miller 0 0 0 0 0 0 0 0 0 1 0
## Edward Hooks 0 1 0 0 0 0 0 0 0 0 0
## Eugene Barchas 2 1 0 0 0 0 0 0 0 0 0
## Jack Garza 0 0 0 0 0 0 0 0 0 0 1
## Jim Radford 0 0 0 0 0 0 1 1 0 0 0
## Julia West 0 0 0 0 0 0 0 0 0 0 1
## Muhammed MacIntyre 0 0 1 0 0 0 0 0 0 0 0
## Neola Schneider 0 0 0 0 1 0 0 0 0 0 0
## Sylvia Foulston 0 0 0 0 0 1 1 0 0 0 0
##
## 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909
## Allen Rosenblatt 0 0 0 0 0 1 0 0 0 0 0
## Barry French 0 0 1 1 0 0 0 0 0 0 0
## Carl Ludwig 0 0 0 0 0 0 0 1 1 0 0
## Carlos Soltero 0 0 0 0 0 0 0 2 0 0 0
## Claudia Miner 0 0 0 0 1 0 0 0 0 0 0
## Clay Rozendal 0 0 0 1 0 0 0 0 0 0 0
## Don Miller 0 0 0 0 0 0 0 0 1 0 0
## Edward Hooks 0 1 0 0 0 0 0 0 0 0 1
## Eugene Barchas 2 1 0 0 0 0 0 0 0 2 1
## Jack Garza 0 0 0 0 0 0 0 0 1 0 0
## Jim Radford 0 0 0 0 0 0 2 0 0 0 0
## Julia West 0 0 0 0 0 0 0 0 0 1 0
## Muhammed MacIntyre 0 0 1 0 0 0 0 0 0 0 1
## Neola Schneider 0 0 0 0 1 0 0 0 0 0 0
## Sylvia Foulston 0 0 0 0 0 1 1 0 0 0 0
##
## 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920
## Allen Rosenblatt 0 1 0 0 0 0 0 0 0 0 0
## Barry French 2 0 0 0 0 0 0 0 0 1 1
## Carl Ludwig 0 0 0 0 2 0 0 0 0 0 0
## Carlos Soltero 0 0 0 2 0 0 0 0 0 0 0
## Claudia Miner 0 1 0 0 0 0 0 0 0 0 1
## Clay Rozendal 1 0 0 0 0 0 0 0 0 0 1
## Don Miller 0 0 0 0 1 0 0 0 0 0 0
## Edward Hooks 0 0 0 0 0 0 0 0 0 1 0
## Eugene Barchas 0 0 0 0 0 0 1 1 1 0 0
## Jack Garza 0 0 0 0 0 1 0 0 0 0 0
## Jim Radford 0 0 1 1 0 0 0 0 0 0 0
## Julia West 0 0 0 0 0 1 0 0 0 0 0
## Muhammed MacIntyre 0 0 0 0 0 0 0 0 0 1 0
## Neola Schneider 0 1 0 0 0 0 0 0 0 0 0
## Sylvia Foulston 0 0 2 0 0 0 0 0 0 0 0
##
## 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931
## Allen Rosenblatt 1 0 0 0 0 0 1 0 0 0 0
## Barry French 0 0 0 0 0 2 0 0 0 0 0
## Carl Ludwig 0 0 1 1 0 0 0 0 2 0 0
## Carlos Soltero 0 0 2 0 0 0 0 0 2 0 0
## Claudia Miner 0 0 0 0 0 0 1 0 0 0 0
## Clay Rozendal 0 0 0 0 0 0 1 0 0 0 0
## Don Miller 0 0 0 1 0 0 0 0 0 1 0
## Edward Hooks 0 0 0 0 0 1 0 0 0 0 1
## Eugene Barchas 0 0 0 0 3 0 0 0 0 1 2
## Jack Garza 0 0 0 1 0 0 0 0 0 1 0
## Jim Radford 0 2 0 0 0 0 0 2 0 0 0
## Julia West 0 0 0 0 1 0 0 0 0 1 0
## Muhammed MacIntyre 0 0 0 0 0 1 0 0 0 0 1
## Neola Schneider 1 0 0 0 0 0 1 0 0 0 0
## Sylvia Foulston 1 1 0 0 0 0 0 2 0 0 0
##
## 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942
## Allen Rosenblatt 0 1 0 0 0 0 1 0 0 0 0
## Barry French 2 0 0 0 0 2 0 0 0 0 0
## Carl Ludwig 0 0 0 2 0 0 0 0 0 1 1
## Carlos Soltero 0 0 2 0 0 0 0 0 2 0 0
## Claudia Miner 1 0 0 0 0 0 1 0 0 0 0
## Clay Rozendal 1 0 0 0 0 0 1 0 0 0 0
## Don Miller 0 0 0 1 0 0 0 0 0 0 0
## Edward Hooks 0 0 0 0 0 1 0 0 0 0 0
## Eugene Barchas 0 0 0 0 3 0 0 0 0 0 0
## Jack Garza 0 0 0 1 0 0 0 0 0 0 0
## Jim Radford 0 0 2 0 0 0 0 2 0 0 0
## Julia West 0 0 0 0 1 0 0 0 0 0 0
## Muhammed MacIntyre 0 0 0 0 0 1 0 0 0 0 0
## Neola Schneider 0 1 0 0 0 0 1 0 0 0 0
## Sylvia Foulston 0 2 0 0 0 0 0 2 0 0 0
##
## 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953
## Allen Rosenblatt 0 0 0 0 0 1 0 0 0 0 0
## Barry French 0 0 0 0 2 0 0 0 0 1 1
## Carl Ludwig 0 0 0 0 0 0 0 2 0 0 0
## Carlos Soltero 0 0 0 0 0 0 1 1 0 0 0
## Claudia Miner 0 0 0 0 0 1 0 0 0 0 1
## Clay Rozendal 0 0 0 0 1 0 0 0 0 0 1
## Don Miller 1 0 0 0 0 0 0 1 0 0 0
## Edward Hooks 0 0 0 1 0 0 0 0 0 1 0
## Eugene Barchas 0 0 0 3 0 0 0 0 2 1 0
## Jack Garza 0 1 0 0 0 0 0 0 1 0 0
## Jim Radford 0 0 0 0 0 0 2 0 0 0 0
## Julia West 0 0 1 0 0 0 0 0 1 0 0
## Muhammed MacIntyre 0 0 0 0 1 0 0 0 0 1 0
## Neola Schneider 0 0 0 0 0 1 0 0 0 0 1
## Sylvia Foulston 0 0 0 0 0 1 1 0 0 0 0
##
## 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
## Allen Rosenblatt 1 0 0 0 0 1 0 0 0 0 0
## Barry French 0 0 0 0 2 0 0 0 0 1 1
## Carl Ludwig 0 1 1 0 0 0 0 2 0 0 0
## Carlos Soltero 0 2 0 0 0 0 1 1 0 0 0
## Claudia Miner 0 0 0 0 0 1 0 0 0 0 1
## Clay Rozendal 0 0 0 0 1 0 0 0 0 0 1
## Don Miller 0 0 1 0 0 0 0 1 0 0 0
## Edward Hooks 0 0 0 1 0 0 0 0 0 1 0
## Eugene Barchas 0 0 0 3 0 0 0 0 2 1 0
## Jack Garza 0 0 1 0 0 0 0 0 1 0 0
## Jim Radford 1 1 0 0 0 0 2 0 0 0 0
## Julia West 0 0 1 0 0 0 0 0 1 0 0
## Muhammed MacIntyre 0 0 0 0 1 0 0 0 0 1 0
## Neola Schneider 0 0 0 0 0 1 0 0 0 0 1
## Sylvia Foulston 2 0 0 0 0 1 1 0 0 0 0
##
## 1965 1966 1967
## Allen Rosenblatt 1 0 0
## Barry French 0 0 0
## Carl Ludwig 0 1 1
## Carlos Soltero 0 2 0
## Claudia Miner 0 0 0
## Clay Rozendal 0 0 0
## Don Miller 0 0 1
## Edward Hooks 0 0 1
## Eugene Barchas 0 1 2
## Jack Garza 0 0 1
## Jim Radford 1 1 0
## Julia West 0 1 0
## Muhammed MacIntyre 0 0 0
## Neola Schneider 0 0 0
## Sylvia Foulston 2 0 0
Visualize the table data into ggplot.
ggplot(data1, aes(x = Customer_Name, y = Profit, color = Shipping_Cost))+
geom_boxplot()
Time to run the ANOVA
twoWayAnova <- aov(Profit ~ Unit_Price * Shipping_Cost, data = data1)
summary(twoWayAnova)
## Df Sum Sq Mean Sq F value Pr(>F)
## Unit_Price 1 161 161 0.312 0.577
## Shipping_Cost 1 34196 34196 66.314 1.61e-14 ***
## Unit_Price:Shipping_Cost 1 138 138 0.267 0.606
## Residuals 260 134074 516
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So here we examined if cut, color and the interaction between the two will have an effect on the Policy.
plot(twoWayAnova, 1:5)