library(ggplot2)
library(kableExtra)
library(knitr)
library(rsample)

data(diamonds)
kbl(head(diamonds), caption="Dataset", booktabs = T) %>% kable_styling(latex_options = c("striped", "hold_position","scale_down"),font_size=12, stripe_index = c(1,3,5,7))

Dataset
carat	cut	color	clarity	depth	table	price	x	y	z
0.23	Ideal	E	SI2	61.5	55	326	3.95	3.98	2.43
0.21	Premium	E	SI1	59.8	61	326	3.89	3.84	2.31
0.23	Good	E	VS1	56.9	65	327	4.05	4.07	2.31
0.29	Premium	I	VS2	62.4	58	334	4.20	4.23	2.63
0.31	Good	J	SI2	63.3	58	335	4.34	4.35	2.75
0.24	Very Good	J	VVS2	62.8	57	336	3.94	3.96	2.48

Let’s use the “diamonds” dataset to show how statistical tests are used.

TESTING POPULATION MEAN

ONE SAMPLE T-TEST

It test if the population mean equal (bigger or smaller) to the specific value or not.

Assumptions:

Independence of the observations
Normality of the sample
Randomness

Test if the price of the diamonds bigger than 3500 or not. First we check the normality with shapiro-wilk normality test and qq-plot.

Q-Q plot: Q-Q plot (or quantile-quantile plot) draws the correlation between a given sample and the normal distribution. A 45-degree reference line is also plotted.

library(ggpubr)
ggqqplot(diamonds$price)

library("car")
qqPlot(diamonds$price)

## [1] 27750 27749

#shapiro.test(diamonds$price)

#Error in shapiro.test(diamonds$price) : sample size must be between 3 and 5000

As seen shapiro test does not work when sample size is bigger than 5000. In such a cases, we uses Kolmogorov-Smirnov test or Jarque-Bera test are used.

ks.test(x=diamonds$price,y='pnorm',alternative='two.sided')

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  diamonds$price
## D = 1, p-value < 2.2e-16
## alternative hypothesis: two-sided

The Jarque-Bera test is a goodness-of-fit test that determines whether or not sample data have skewness and kurtosis that matches a normal distribution.

library(tseries)
jarque.bera.test(diamonds$price)

## 
##  Jarque Bera Test
## 
## data:  diamonds$price
## X-squared = 34201, df = 2, p-value < 2.2e-16

Hypothesis:

\[ H_o : \mu = 3500\ vs\ H_1: \mu > 3500 \]

one-sided

t.test(diamonds$price,mu=3500,alternative = "greater")

## 
##  One Sample t-test
## 
## data:  diamonds$price
## t = 25.196, df = 53939, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 3500
## 95 percent confidence interval:
##  3904.545      Inf
## sample estimates:
## mean of x 
##    3932.8

P-value is smaller than the significance level of 0.05, so we reject the null hypothesis. That means real population mean is bigger than 3500. As seen real population mean is 3932.8.

two-sided

\[ H_o : \mu = 3900\ vs\ H_1: \mu \neq 3900 \]

t.test(diamonds$price,mu=3900,alternative = "two.sided")

## 
##  One Sample t-test
## 
## data:  diamonds$price
## t = 1.9095, df = 53939, p-value = 0.05621
## alternative hypothesis: true mean is not equal to 3900
## 95 percent confidence interval:
##  3899.132 3966.467
## sample estimates:
## mean of x 
##    3932.8

P-value is bigger than the significance level of 0.05, so we fail to reject the null hypothesis.

TWO SAMPLE T-TEST

It test if the population means of two samples are equal (bigger or smaller) to the specific value or not.

Assumptions:

Samples should be independent from each other
Normality of the samples
Sample variances’ should be equal (homogeneous)

Test if the price of the diamonds of ideal cut and premium cut is equal or not. First we check the normality with shapiro-wilk normality test and qq-plot.

Normality

#select ideal and premium cuts from data
library(dplyr)
diamonds_cut<- diamonds %>% filter(cut=="Ideal" | cut=="Premium")

First, let’s filter the ideal cut and premium cut diamonds from the data with the “filter” command.

First Sample

ggqqplot(diamonds_cut$price)

jarque.bera.test(diamonds_cut$price)

## 
##  Jarque Bera Test
## 
## data:  diamonds_cut$price
## X-squared = 21398, df = 2, p-value < 2.2e-16

Second Sample

ggqqplot(diamonds$price[diamonds$cut=="Premium"])

jarque.bera.test(diamonds_cut$price)

## 
##  Jarque Bera Test
## 
## data:  diamonds_cut$price
## X-squared = 21398, df = 2, p-value < 2.2e-16

As seen two samples are not normality distributed.

Homogeneity of the variances

Two tests are used in the literature to test the homogeneity of variances. These are Bartlett test and levene test. Let’s use the Bartlett test in this example.

bartlett.test(price ~ cut, data = diamonds_cut)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  price by cut
## Bartlett's K-squared = 301.5, df = 1, p-value < 2.2e-16

As can be seen, the variances of the two samples are not equal (p=0). The assumptions of normality and homogeneity of variance were not met for the t-test, but let’s see how to do the t-test in R as an example.

t.test(diamonds_cut$price ~ diamonds_cut$cut, alternative = "two.sided",conf.level = 0.95, var.equal = F)

## 
##  Welch Two Sample t-test
## 
## data:  diamonds_cut$price by diamonds_cut$cut
## t = 24.918, df = 26552, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1038.088 1215.344
## sample estimates:
## mean in group Premium   mean in group Ideal 
##              4584.258              3457.542

#If variances are equal write "var.equal=T"

The two population mean is not equal (p=0). The average price of the Premium group is 4584,258 and the average price of the Ideal group is 3457,542.

T-TEST FOR DEPENDENT SAMLPLES (PAIRED SAMPLES T-TEST)

The paired samples t-test is used to compare the means between two related groups of samples.

As an example of data, 20 mice received a treatment X during 3 months. We want to know whether the treatment X has an impact on the weight of the mice.

To answer to this question, the weight of the 20 mice has been measured before and after the treatment. This gives us 20 sets of values before treatment and 20 sets of values after treatment from measuring twice the weight of the same mice.

In such situations, paired t-test can be used to compare the mean weights before and after treatment.

Paired t-test analysis is performed as follow:

Calculate the difference (d) between each pair of value
Compute the mean (m) and the standard deviation (s) of d
Compare the average difference to 0. If there is any significant difference between the two pairs of samples, then the mean of d (m) is expected to be far from 0.

# Data in two numeric vectors

# Weight of the mice before treatment
before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)
# Weight of the mice after treatment
after <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)
# Create a data frame
my_data <- data.frame( 
                group = rep(c("before", "after"), each = 10),
                weight = c(before,  after))
head(my_data)

##    group weight
## 1 before  200.1
## 2 before  190.9
## 3 before  192.7
## 4 before  213.0
## 5 before  241.4
## 6 before  196.9

dim(my_data)

## [1] 20  2

# Plot weight by group and color by group
library("ggpubr")
ggboxplot(my_data, x = "group", y = "weight", 
          color = "group", palette = c("#00AFBB", "#E7B800"),
          order = c("before", "after"),
          ylab = "Weight", xlab = "Groups")

Assumptions of paired sample t-test

Samples should be dependent
Sample size should be large
Samples should be normally distributed.

In our case:

Samples are dependent since the data have been collected from measuring twice the weight of the same mice.
Sample size is small, because n < 30. Since the sample size is not large enough (less than 30), we need to check whether the differences of the pairs follow a normal distribution.

That means we should look for normality distance(d) when sample is smaller than 30. (distance means difference between before and after)

Let print distance.

# compute the difference
d <-  my_data$weight[my_data$group == "before"] - my_data$weight[my_data$group == "after"]
d

##  [1] -192.8 -202.3 -152.4 -180.0 -192.6 -231.0 -249.8 -198.4 -187.1 -158.5

shapiro.test(d)

## 
##  Shapiro-Wilk normality test
## 
## data:  d
## W = 0.94536, p-value = 0.6141

Since p-value is bigger than the significance level of 0.05, so distance is normally distributed.

# Compute t-test
t.test(weight ~ group, data = my_data, paired = TRUE) # we write paired=T

## 
##  Paired t-test
## 
## data:  weight by group
## t = 20.883, df = 9, p-value = 6.2e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  173.4219 215.5581
## sample estimates:
## mean of the differences 
##                  194.49

NON-PARAMETRIC TEST IN R

When the assumptions of the parametric t-test, non-parametric version of the tests are used. The common assumptions in nonparametric tests are randomness and independence. The chi-square test is one of the nonparametric tests for testing three types of statistical tests: the goodness of fit, independence, and homogeneity. In nonparametric analysis, the Mann-Whitney U test is used for comparing two groups of cases on one variable. The Kruskal-Wallis test is considered as an alternative test to the parametric one-way analysis of variance (ANOVA) for comparing more than two groups on one variable. The Wilcoxon Signed-Rank test is an alternative test to the parametric “Paired-samples T-Test” to test the statistical differences in the mean between two related/dependent random samples.

Sign Test

Sign test is used to test if the median of the sample is equal to specific value or not. Remember for diamonds dataset normality assumptions are not met, so let apply same test using non-parametric versions.

Let’s test with the sign test whether the median of the price of diamonds is equal to 3900 or not.

\[ H_o : Median = 3900\ vs\ H_1: Median \neq 3900 \]

library(BSDA)
SIGN.test(diamonds$price, md = 3500)

## 
##  One-sample Sign-Test
## 
## data:  diamonds$price
## s = 21440, p-value < 2.2e-16
## alternative hypothesis: true median is not equal to 3500
## 95 percent confidence interval:
##  2376 2441
## sample estimates:
## median of x 
##        2401 
## 
## Achieved and Interpolated Confidence Intervals: 
## 
##                   Conf.Level L.E.pt U.E.pt
## Lower Achieved CI     0.9499   2376   2441
## Interpolated CI       0.9500   2376   2441
## Upper Achieved CI     0.9509   2376   2441

As seen median of the priced of the diamonds are not 3500, it is 2400.

Wilcoxon rank sum test

The ‘Wilcoxon Rank Sum test’ (also called ‘Mann-Whitney test’), is a distribution-free alternative to the t-test, and is used to test the hypothesis that the distributions in the two groups have the same median.

\[ H_o : Median_1 = Median_2\ vs\ H_1: Median_1 \neq Median_2 \]

# independent 2-group Mann-Whitney U Test 
wilcox.test(diamonds_cut$price~diamonds_cut$cut)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  diamonds_cut$price by diamonds_cut$cut
## W = 174286667, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

# where y is numeric and A is A binary factor

We reject the null hypothesis because the P value is close to 0. As you can see, the median of diamond prices is 2400, not 3500. We also see the upper and lower confidence intervals for diamond prices. The confidence interval for the median is (2376, 2441).

Wilcoxon rank sum test

The ‘Wilcoxon Rank Sum test’ (also called the ‘Mann-Whitney test’) is a distribution-free alternative to the t-test (doesn’t assume normality) and is used to test the hypothesis that the distributions in two groups have the same median.

\[ H_o : Median_1 = Median_2\ vs\ H_1: Median_1 \neq Median_2 \]

# independent 2-group Mann-Whitney U Test
wilcox.test(diamonds_cut$price~diamonds_cut$cut)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  diamonds_cut$price by diamonds_cut$cut
## W = 174286667, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

# where y is numeric and A is A binary factor

As can be seen, the actual position shift is not equal to 0 (p=0). (the difference between the medians is not 0, so the medians are not equal.)

Wilcoxon Signed Rank Test

The non-parametric equivalent of the t-test for matched pairs (dependent samples) is the ‘Wilcoxon signed rank test’.

\[ H_o : Median_{before} = Median_{after}\ vs\ H_1: Median_{before} \neq Median_{after} \]

# dependent 2-group Wilcoxon Signed Rank Test 
wilcox.test(weight ~ group, data = my_data, paired=TRUE)

## 
##  Wilcoxon signed rank exact test
## 
## data:  weight by group
## V = 55, p-value = 0.001953
## alternative hypothesis: true location shift is not equal to 0

As seen medians of the before-after is not equal (like in the means).

Note: We know that outliers in the dataset, break the normality and “mean” of the population is heavily influenced by outliers. However, median is not affected by outliers. That is why in non-parametric tests, medians are compared and tested.

ONE WAY ANOVA

The one-way analysis of variance (ANOVA) is an extension of independent two-samples t-test for comparing means in a situation where there are more than two groups. In one-way ANOVA, the data is organized into several groups base on one single grouping variable (also called factor variable).

Assumptions:

The observations are obtained independently and randomly from the population defined by the factor levels
The data of each factor level are normally distributed.
These normal populations have a common variance. (Levene’s test or bartlett test can be used to check this.)

Let test if there is significant difference between mean of the all cut groups’ prices. Remember there is 5 cut groups, so we compare more than 2 means. That is why we uses one-way ANOVA.

\[ H_o : \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5\ vs\ H_1: \text{At least one of them is different} \]

library(ggplot2)
ggplot(diamonds, aes(x=price, y=cut, fill=cut)) + geom_boxplot(outlier.colour="red", outlier.shape=18,
             outlier.size=3, notch=F)+labs(title = "Boxplot of price by cut") + scale_fill_brewer(palette = "BuPu")

The R function aov() can be used to apply ANOVA. The function summary.aov() is used to summarize the analysis of variance model.

anova<-aov(price ~ cut, data = diamonds)
summary(anova)

##                Df    Sum Sq   Mean Sq F value Pr(>F)    
## cut             4 1.104e+10 2.760e+09   175.7 <2e-16 ***
## Residuals   53935 8.474e+11 1.571e+07                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups highlighted with “*" in the model summary.

Multiple pairwise-comparison between the means of groups

In one-way ANOVA test, a significant p-value indicates that some of the group means are different, but we don’t know which pairs of groups are different.

It’s possible to perform multiple pairwise-comparison, to determine if the mean difference between specific pairs of group are statistically significant.

Tukey multiple pairwise-comparisons

As the ANOVA test is significant, we can compute Tukey HSD (Tukey Honest Significant Differences, R function: TukeyHSD()) for performing multiple pairwise-comparison between the means of groups. (If anova test is not significant, there is no need to perform pair-wise comparison)

TukeyHSD(anova)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = price ~ cut, data = diamonds)
## 
## $cut
##                          diff         lwr        upr     p adj
## Good-Fair          -429.89331  -740.44880  -119.3378 0.0014980
## Very Good-Fair     -376.99787  -663.86215   -90.1336 0.0031094
## Premium-Fair        225.49994   -59.26664   510.2665 0.1950425
## Ideal-Fair         -901.21579 -1180.57139  -621.8602 0.0000000
## Very Good-Good       52.89544  -130.15186   235.9427 0.9341158
## Premium-Good        655.39325   475.65120   835.1353 0.0000000
## Ideal-Good         -471.32248  -642.36268  -300.2823 0.0000000
## Premium-Very Good   602.49781   467.76249   737.2331 0.0000000
## Ideal-Very Good    -524.21792  -647.10467  -401.3312 0.0000000
## Ideal-Premium     -1126.71573 -1244.62267 -1008.8088 0.0000000

diff: difference between means of the two groups
lwr, upr: the lower and the upper end point of the confidence interval at 95% (default)
p adj: p-value after adjustment for the multiple comparisons.

#we can use also this function to performn pairwise comparison
pairwise.t.test(diamonds$price, diamonds$cut,
                 p.adjust.method = "BH")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  diamonds$price and diamonds$cut 
## 
##           Fair    Good    Very Good Premium
## Good      0.00023 -       -         -      
## Very Good 0.00042 0.43055 -         -      
## Premium   0.03419 < 2e-16 < 2e-16   -      
## Ideal     < 2e-16 9.5e-14 < 2e-16   < 2e-16
## 
## P value adjustment method: BH

Check ANOVA assumptions: test validity

The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.

homogenity of variances

# 1. Homogeneity of variances
plot(anova, 1)

For equal variances, the points in the plot should be around the zero line, but in our case the equal variance assumption seems to be broken. Let’s check this result with the test. We used the Bartlett test to test the homogeneity of variance in the t-tests, this time let’s use the Levene test.

Levenes test is less sensitive to deviations from the Normal distribution. In other words, in cases where normality is not provided, it is more accurate to use the Levenes test instead of the Bartlett test.

\[ H_o: \text{the variance across groups is equal.}\\ H_1: \text{the variance across groups is statistically significantly different.} \]

library(car)
leveneTest(price ~ cut, data = diamonds)

## Levene's Test for Homogeneity of Variance (center = median)
##          Df F value    Pr(>F)    
## group     4   123.6 < 2.2e-16 ***
##       53935                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the output above, we can see that the p-value is less than the significance level of 0.05. This means that there is an evidence to suggest that the variance across groups is significantly different.

bartlett.test(price ~ cut, data = diamonds)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  price by cut
## Bartlett's K-squared = 406.7, df = 4, p-value < 2.2e-16

We reach the same result in the Bartlett test.

ANOVA test with no assumption of equal variances

An alternative procedure (Welch one-way test), that does not require that assumption have been implemented in the function oneway.test().

oneway.test(price ~ cut, data = diamonds)

## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  price and cut
## F = 166.04, num df = 4.0, denom df = 9398.6, p-value < 2.2e-16

pairwise.t.test(diamonds$price, diamonds$cut,
                 p.adjust.method = "BH", pool.sd = FALSE)

## 
##  Pairwise comparisons using t tests with non-pooled SD 
## 
## data:  diamonds$price and diamonds$cut 
## 
##           Fair    Good    Very Good Premium
## Good      4.5e-05 -       -         -      
## Very Good 0.00011 0.40560 -         -      
## Premium   0.02122 < 2e-16 < 2e-16   -      
## Ideal     < 2e-16 1.7e-15 < 2e-16   < 2e-16
## 
## P value adjustment method: BH

NORMALITY

Normality plot of residuals. In the plot below, the quantiles of the residuals are plotted against the quantiles of the normal distribution. A 45-degree reference line is also plotted.
The normal probability plot of residuals is used to check the assumption that the residuals are normally distributed. It should approximately follow a straight line.

plot(anova,2)

aov_residuals <- residuals(object = anova )
jarque.bera.test(x = aov_residuals )

## 
##  Jarque Bera Test
## 
## data:  aov_residuals
## X-squared = 34438, df = 2, p-value < 2.2e-16

Normality assumption is also not met. In such a cases, we should use non-parametric version of the one-way anova which is “Kruskal-Wallis rank sum test”.

Kruskal-Wallis rank sum test

The Kruskal-Wallis H test (sometimes called “one-way ANOVA on ranks”) is a rank-based nonparametric test that can be used to determine whether there are statistically significant differences between two or more groups of an independent variable. It is considered a non-parametric alternative to one-way ANOVA and an extension of the Mann-Whitney U test to allow comparison of more than two independent groups.

Assumptions

1: Your dependent variable should be measured at the ordinal or continuous level.
2: Your argument must consist of two or more categorical, independent groups.
3: You must be independent of the observations, i.e. there should be no relationship between the observations in each group or between the groups themselves

Because the Kruskal-Wallis H test does not assume normality in the data and is much less sensitive to outliers, it can be used when these assumptions are violated and the use of one-way ANOVA is inappropriate. . Additionally, if your data is ordinal, one-way ANOVA is not appropriate, but the Kruskal-Wallis H test is. However, the Kruskal-Wallis H test comes with an additional assumption. This assumption;

4: To know how to interpret the results from a Kruskal-Wallis H test, you must determine whether the distributions in each group have the same shape (ie have the same variability).

Briefly;

Assumptions

Observations are independent within and between samples.
The studied variable is continuous
Populations are the same with respect to the median.

The test statistic is distributed approximately as chi-square with (k-1) degrees of freedom.

$$ H_o: \

H_1: $$

kruskal.test(price ~ cut, data=diamonds)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  price by cut
## Kruskal-Wallis chi-squared = 978.62, df = 4, p-value < 2.2e-16

As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the treatment groups.

Multiple pairwise-comparison between groups

From the output of the Kruskal-Wallis test, we know that there is a significant difference between groups, but we don’t know which pairs of groups are different.

It’s possible to use the function pairwise.wilcox.test() to calculate pairwise comparisons between group levels with corrections for multiple testing.

pairwise.wilcox.test(diamonds$price, diamonds$cut,
                 p.adjust.method = "BH")

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  diamonds$price and diamonds$cut 
## 
##           Fair    Good    Very Good Premium
## Good      7.9e-14 -       -         -      
## Very Good < 2e-16 0.023   -         -      
## Premium   1.5e-05 1.6e-12 < 2e-16   -      
## Ideal     < 2e-16 < 2e-16 < 2e-16   < 2e-16
## 
## P value adjustment method: BH

TWO-WAY ANOVA

Two-way ANOVA test is used to evaluate simultaneously the effect of two grouping variables (A and B) on a response variable. The grouping variables are also known as factors. The different categories (groups) of a factor are called levels. The number of levels can vary between factors.

When the sample sizes within cells are equal, we have the so-called balanced design. In this case the standard two-way ANOVA test can be applied.
When the sample sizes within each level of the independent variables are not the same (case of unbalanced designs), the ANOVA test should be handled differently.

Assumptions of two-way ANOVA test

Two-way ANOVA, like all ANOVA tests, assumes that - the observations within each cell are normally distributed and - have equal variances.

Two-way ANOVA test hypotheses

\[ H_o:\text{There is no difference in the means of factor A}\\ H_o\text{There is no difference in means of factor B}\\ H_o:\text{There is no interaction between factors A and B} \]

As an example we group price of diamonds by their color and cut, and we test 2 things which are:

If there is difference in means of cut group
If there is difference in means of color group

ggplot(diamonds, aes(x=price, y=cut, fill=color)) + geom_boxplot(outlier.colour="red", outlier.shape=18,
             outlier.size=0.3, notch=F)+labs(title = "Boxplot of price by cut and color") + scale_fill_brewer(palette = "BuPu")

We want to know if price depends on cut and color.

res.aov2 <- aov(price ~ cut +color , data =diamonds)
summary(res.aov2)

##                Df    Sum Sq   Mean Sq F value Pr(>F)    
## cut             4 1.104e+10 2.760e+09   181.1 <2e-16 ***
## color           6 2.551e+10 4.251e+09   278.9 <2e-16 ***
## Residuals   53929 8.219e+11 1.524e+07                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA table we can conclude that both cut and colors are statistically significant.

# Two-way ANOVA with interaction effect
res.aov3 <- aov(price ~ cut*color , data =diamonds)
summary(res.aov3)

##                Df    Sum Sq   Mean Sq F value Pr(>F)    
## cut             4 1.104e+10 2.760e+09 181.405 <2e-16 ***
## color           6 2.551e+10 4.251e+09 279.371 <2e-16 ***
## cut:color      24 1.653e+09 6.889e+07   4.527  1e-12 ***
## Residuals   53905 8.203e+11 1.522e+07                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA results, you can conclude the following, based on the p-values and a significance level of 0.05:

the p-value of cut is ~0(significant), which indicates that the levels of cut are associated with significant different price.
the p-value of color is < 2e-16 (significant), which indicates that colors are associated with significant different price.
the p-value for the interaction between cut*color is ~0 (significant), which indicates that the relationships between color and price depends on the cut.

#multiple comparison like anova
TukeyHSD(res.aov2)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = price ~ cut + color, data = diamonds)
## 
## $cut
##                          diff         lwr         upr     p adj
## Good-Fair          -429.89331  -735.75636  -124.03026 0.0011913
## Very Good-Fair     -376.99787  -659.52768   -94.46806 0.0025302
## Premium-Fair        225.49994   -54.96387   505.96375 0.1821643
## Ideal-Fair         -901.21579 -1176.35038  -626.08121 0.0000000
## Very Good-Good       52.89544  -127.38604   233.17692 0.9305749
## Premium-Good        655.39325   478.36707   832.41943 0.0000000
## Ideal-Good         -471.32248  -639.77829  -302.86667 0.0000000
## Premium-Very Good   602.49781   469.79831   735.19731 0.0000000
## Ideal-Very Good    -524.21792  -645.24787  -403.18797 0.0000000
## Ideal-Premium     -1126.71573 -1242.84112 -1010.59035 0.0000000
## 
## $color
##          diff        lwr        upr     p adj
## E-D -104.4894 -286.36133   77.38248 0.6201928
## F-D  537.8278  354.96509  720.69052 0.0000000
## G-D  820.6730  643.79152  997.55443 0.0000000
## H-D 1260.0195 1071.58200 1448.45698 0.0000000
## I-D 1885.6973 1675.96202 2095.43256 0.0000000
## J-D 2064.7478 1806.41658 2323.07904 0.0000000
## F-E  642.3172  476.76688  807.86758 0.0000000
## G-E  925.1624  766.24356 1084.08123 0.0000000
## H-E 1364.5089 1192.82072 1536.19711 0.0000000
## I-E 1990.1867 1795.36107 2185.01235 0.0000000
## J-E 2169.2372 1922.85710 2415.61737 0.0000000
## G-F  282.8452  122.79337  442.89696 0.0000039
## H-F  722.1917  549.45426  894.92911 0.0000000
## I-F 1347.8695 1152.11859 1543.62038 0.0000000
## J-F 1526.9200 1279.80758 1774.03243 0.0000000
## H-G  439.3465  272.95393  605.73911 0.0000000
## I-G 1065.0243  874.84890 1255.19973 0.0000000
## J-G 1244.0748 1001.35519 1486.79449 0.0000000
## I-H  625.6778  424.70932  826.64627 0.0000000
## J-H  804.7283  553.46259 1055.99405 0.0000000
## J-I  179.0505  -88.55864  446.65968 0.4322041

Check ANOVA assumptions: test validity

ANOVA assumes that the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.

Check the homogeneity of variance assumption

# 1. Homogeneity of variances
plot(res.aov3, 1)

library(car)
leveneTest(price ~ cut*color , data =diamonds)

## Levene's Test for Homogeneity of Variance (center = median)
##          Df F value    Pr(>F)    
## group    34  50.915 < 2.2e-16 ***
##       53905                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

variances are not equal

Normality

# 2. Normality
plot(res.aov3, 2)

# Extract the residuals
aov_residuals <- residuals(object = res.aov3)
# Run Shapiro-Wilk test
jarque.bera.test(x = aov_residuals )

## 
##  Jarque Bera Test
## 
## data:  aov_residuals
## X-squared = 33271, df = 2, p-value < 2.2e-16

The conclusion above, is supported by the jarque.bera test on the ANOVA residuals (X-squared = 33271, p= 2.2e-16) which finds indication that normality is violated.

Compute two-way ANOVA test in R for unbalanced designs

An unbalanced design includes an unequal number of subjects in each group. When we group the 2 factors (cut and color) we use in two-way ANOVA, let’s see how many observations there are in each cell.

table(diamonds$cut,diamonds$color)

##            
##                D    E    F    G    H    I    J
##   Fair       163  224  312  314  303  175  119
##   Good       662  933  909  871  702  522  307
##   Very Good 1513 2400 2164 2299 1824 1204  678
##   Premium   1603 2337 2331 2924 2360 1428  808
##   Ideal     2834 3903 3826 4884 3115 2093  896

As seen in each cell, there is not equal number of observations. Thus, we have unbalanced design.

To set up an unbalanced design in two-way ANOVA, we need to write Anova(type = “III”).

library(car)
my_anova <- aov(price ~ cut*color , data =diamonds)
Anova(my_anova, type = "III")

## Anova Table (Type III tests)
## 
## Response: price
##                 Sum Sq    Df    F value    Pr(>F)    
## (Intercept) 3.7606e+11     1 24713.0009 < 2.2e-16 ***
## cut         8.7891e+09     4   144.3969 < 2.2e-16 ***
## color       9.4602e+09     6   103.6142 < 2.2e-16 ***
## cut:color   1.6535e+09    24     4.5274 1.001e-12 ***
## Residuals   8.2027e+11 53905                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

FRIEDMAN TEST

The Friedman test is a non-parametric alternative to the one-way repeated measures ANOVA test. It extends the Sign test in the situation where there are more than two groups to compare.

Friedman test is used to assess whether there are any statistically significant differences between the distributions of three or more paired groups. It’s recommended when the normality assumptions of the one-way repeated measures ANOVA test is not met or when the dependent variable is measured on an ordinal scale.

Assumptions: • Two-way data arranged in an unreplicated complete block design • Dependent variable is ordinal, interval, or ratio • Treatment or group independent variable is a factor with two or more levels. That is, two or more groups • Blocking variable is a factor with two or more levels • Blocks are independent of each other and have no interaction with treatments • In order to be a test of medians, the distribution of the differences in scores between each pair of groups are all symmetrical, or the distributions of values for each group have similar shape and spread. Otherwise the test is a test of distributions.

Hypotheses: If the distribution of the differences in scores between each pair of groups are all symmetrical, or the distributions of values for each group have similar shape and spread: • Null hypothesis: The medians of values for each group are equal. • Alternative hypothesis (two-sided): The medians of values for each group are not equal.

Post-hoc tests: The outcome of the Friedman test tells you if there are differences among the groups, but doesn’t tell you which groups are different from other groups. In order to determine which groups are different from others, post-hoc testing can be conducted.

#friedman.test(price~ cut|color, diamonds)

Friedman rank sum test

Friedman chi-squared = 23.139, df = 4, p-value = 0.0001188

MANOVA

In the situation where there multiple response variables you can test them simultaneously using a multivariate analysis of variance (MANOVA). (we have two continuous dependent variable and one independent categoric variable)

Assumptions of MANOVA

MANOVA can be used in certain conditions:

The dependent variables should be normally distribute within groups. The R function mshapiro.test( )[in the mvnormtest package] can be used to perform the Shapiro-Wilk test for multivariate normality. This is useful in the case of MANOVA, which assumes multivariate normality.
Homogeneity of variances across the range of predictors.
Linearity between all pairs of dependent variables, all pairs of covariates, and all dependent variable-covariate pairs in each cell

Hypotheses:

\[ H_0: \text{Treatment variable does not have significant effect on two continuous variable} \]

we want to know if there is significant difference price and carat when they grouped by cut.

# MANOVA test
res.man <- manova(cbind(price, carat) ~ cut, data = diamonds)
summary(res.man)

##              Df   Pillai approx F num Df den Df    Pr(>F)    
## cut           4 0.081688   574.18      8 107870 < 2.2e-16 ***
## Residuals 53935                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

we see that p-value is smaller than the significance level of 0.05. That means, cut(treatment) has significance effect on price and carat of diamonds.

# Look to see which differ
summary.aov(res.man)

##  Response price :
##                Df     Sum Sq    Mean Sq F value    Pr(>F)    
## cut             4 1.1042e+10 2760436340  175.69 < 2.2e-16 ***
## Residuals   53935 8.4743e+11   15712087                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response carat :
##                Df  Sum Sq Mean Sq F value    Pr(>F)    
## cut             4   429.7 107.435  495.69 < 2.2e-16 ***
## Residuals   53935 11689.6   0.217                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## [1] 1996

check multivariate normality

library(MVN)
result <- mvn(data = diamonds2[,c(1,7)], mvnTest = "royston")
result$multivariateNormality

##      Test        H      p value MVN
## 1 Royston 251.0394 6.571757e-56  NO

# create univariate Q-Q plots
result <- mvn(data = diamonds2[,c(1,7)], mvnTest = "royston", univariatePlot = "qqplot")

we see that MVN= No, so multivariate normality is not met.

LATIN SQUARE DESIGN

The Latin square design is used where the researcher desires to control the variation in an experiment that is related to rows and columns in the field.

Assumptions:

Treatments are assigned at random within rows and columns, with each treatment once per row and once per column.
There are equal numbers of rows, columns, and treatments.
Useful where the experimenter desires to control variation in two different directions

The formula used for this kind of three-way ANOVA are:

For Example:

we have 3 categorical variable(row effect, column effect and treatment effect), and 1 continuous varible(target).

#create new dataset

fertil <- c(rep("fertil1",1), rep("fertil2",1), rep("fertil3",1), rep("fertil4",1), rep("fertil5",1))
treat <- c(rep("treatA",5), rep("treatB",5), rep("treatC",5), rep("treatD",5), rep("treatE",5))
seed <- c("A","E","C","B","D", "C","B","A","D","E", "B","C","D","E","A", "D","A","E","C","B", "E","D","B","A","C")
freq <- c(42,45,41,56,47, 47,54,46,52,49, 55,52,57,49,45, 51,44,47,50,54, 44,50,48,43,46)
 
mydata <- data.frame(treat, fertil, seed, freq)

kbl(head(mydata), caption="Dataset", booktabs = T) %>% kable_styling(latex_options = c("striped", "hold_position","scale_down"),font_size=12, stripe_index = c(1,3,5,7))

Dataset
treat	fertil	seed	freq
treatA	fertil1	A	42
treatA	fertil2	E	45
treatA	fertil3	C	41
treatA	fertil4	B	56
treatA	fertil5	D	47
treatB	fertil1	C	47

Design:

	treatA	treatB	treatC	treatD	treatE
fertilizer1	A42	C47	B55	D51	E44
fertilizer2	E45	B54	C52	A44	D50
fertilizer3	C41	A46	D57	E47	B48
fertilizer4	B56	D52	E49	C50	A43
fertilizer5	D47\|	E49	A45	B54	C46

#visualize data
mydata %>% 
  ggplot(aes(x= freq, y=treat, fill=fertil)) +
  geom_boxplot() +
  geom_jitter(width=0.1,alpha=0.2) +
  xlab("freq")+ 
  facet_wrap(~seed) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Hypothesis:

\[ H_0: \text{Row has no effect on Response}\\ H_0: \text{Column has no effect on Response}\\ H_0: \text{No Treatment effect on Response} \]

myfit <- lm(freq ~ fertil+treat+seed, mydata)
anova(myfit)

## Analysis of Variance Table
## 
## Response: freq
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## fertil     4  17.76   4.440  0.7967 0.549839    
## treat      4 109.36  27.340  4.9055 0.014105 *  
## seed       4 286.16  71.540 12.8361 0.000271 ***
## Residuals 12  66.88   5.573                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The difference between group considering the fertilizer is not significant (p-value > 0.1).
The difference between group considering the tillage is quite significant (p-value < 0.05).
The difference between group considering the seed is very significant (p-value < 0.001).

TESTING POPULATION VARIANCE

Single sample test of variance

We estimate the variance. Using the chi-square test, where the variance is equal to a user-specified value, we test the null hypothesis and construct a confidence interval for the variance.

Let’s test if the variance of the diamond price is equal to 90.

\[ H_o: \sigma^2=90\\ H_1: \sigma^2 \neq 90 \]

library(EnvStats)

varTest(diamonds$price ,alternative = "two.sided", conf.level = 0.95,
    sigma.squared = 90)

## 
##  Chi-Squared Test on Variance
## 
## data:  diamonds$price
## Chi-Squared = 9538590395, df = 53939, p-value < 2.2e-16
## alternative hypothesis: true variance is not equal to 90
## 95 percent confidence interval:
##  15727376 16107299
## sample estimates:
## variance 
## 15915629

As can be seen, the sample estimate for the variance is 15915629, not 90. (p=0)

There are two-sample.test

When should you use the F-test?

Comparing two variances is useful in many situations, including:

When we want to run a two-sample t-test to check the equality of variances of two samples.
When we want to compare the variability of a new measurement method with an old measurement method. To test if the new method reduces the variability of the measure.

Test statistics can be obtained by calculating the ratio of the two variances $S^2_A\ and \ S^2_B$.

\[ F= {\frac{S^2_A}{S^2_B}} \]

The degrees of freedom are $n_A-1$ for the first variance and $n_B-1$ for the second variance.

Note, the more this ratio deviates from 1, the stronger the evidence for unequal population variances.
Note that the F test requires a normal distribution of two samples. (we already checked normality for these variables)

Let’s test whether the variances of the ideal cut price and the premium cut are equal.

\[ H_o: \sigma^2_1= \sigma^2_2\\ H_1: \sigma^2 \neq \sigma^2_2 \]

var.test(price~cut, diamonds_cut, alternative = "two.sided",alpha=0.05)

## 
##  F test to compare two variances
## 
## data:  price by cut
## F = 1.3042, num df = 13790, denom df = 21550, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  1.265403 1.344263
## sample estimates:
## ratio of variances 
##            1.30417

As can be seen, the variance ratios are greater than 1, which means that the variances are not equal. (also specified at p-value=0)

IMPORTANT NOTE

There are different types of tests that can be used to evaluate the equality of variances.

1. F-test:- It is used to compare two groups of variance. The data should be normally distributed.
1. Bartlett test: - It is used for two or more groups of variance comparison. The data should be normally distributed.
1. Levene’s test: - It is an alternative to Bartlett’s test for non-normally distributed data.
1. Figner-Killeen test: - It is a non-parametric test for abnormal data.

We conclude that the nonparametric version of the F test is the levene test. We have already shown how to do the Levene test.

library(car)
leveneTest(price ~ cut, data = diamonds_cut)

## Levene's Test for Homogeneity of Variance (center = median)
##          Df F value    Pr(>F)    
## group     1  424.97 < 2.2e-16 ***
##       35340                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the above output, we can see that the p-value is less than the 0.05 significance level. This means that there is evidence to suggest that the variance between groups is significantly different.

TESTING POPULATION PROPORTION

“prop.test” can be used to test the hypothesis that the odds (probability of success) in several groups are the same or equal to certain given values. Our main goal is to find out if there is a difference between the sample rate $\hat p$ and the claimed value of the population ratio $p_0 $. To perform this test, the sampling distribution of the sample rate must be approximately normally distributed, with its mean $\hat p$ and its standard deviation $\sqrt{\frac{\hat p(1- \hat p)}{n}}= \sigma_p$ should be.

Assumptions:

The following 3 conditions must be met:

The sample should be obtained by a simple random sampling process. (randomness)
n* p * (1 – p) ≥ 10
n ≤ 0.05*N where n is the sample size and N is the population size.

Single sample proportion

Let’s test if the ratio of ideally cut diamonds is equal to 0.1. First, let’s see the number of all cuts and the ideal cut.

\[ H_o: p=0.1\ vs\ H_1: p\neq0.1 \]

table(diamonds$cut)

## 
##      Fair      Good Very Good   Premium     Ideal 
##      1610      4906     12082     13791     21551

Syntax for the proportion test; is prop.test(x, n , p, alternative). X is the number of categories tested, n is the number of all samples, p is the rate tested on the null hypothesis.

In our case;

x: number of ideal cut (21551)
n: sum of all numbers (1610 +4906 +12082+13791+21551 = 53940)
p: 0.1

 prop.test(x = 21551, n = 53940, p = 0.1, correct = FALSE,alternative = "two.sided")

## 
##  1-sample proportions test without continuity correction
## 
## data:  21551 out of 53940, null probability 0.1
## X-squared = 53773, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.1
## 95 percent confidence interval:
##  0.3954104 0.4036770
## sample estimates:
##         p 
## 0.3995365

We see that the ideal cut ratio is 0.4, not 0.1.

Proportion of two samples

For statistical purposes, we can compare two populations or groups when the variable is categorical (smokers/non-smoker, Democrat/Republican, support/opposes an opinion, etc.) or the proportion of individuals with a particular trait (such as the proportion of smokers).

In order to make this comparison, an independent (separate) random sample must be selected from each population.

Note that the Z-statistics formula is valid only when the sample size (n) is large enough.

Situation for Large sample sizes

The test statistics (z-test) are calculated as:

\[ z= {\frac{p_A - p_B}{\sqrt {pq/n_A\ +pq/n_B}}} \]

$p_A$: rates observed in group A.(size $n_A$)
$p_A$: rates observed in group B. (size $n_B$)
p and q are exact ratios.

Let’s test whether there is a significant difference between the ideal and premium cut diamond rates. In the first part, we see the numbers of these categories.

\[ H_o: p_1=p_2\ vs\ H_1: p_1\neq p_2 \]

prop.test(x = c( 13791, 21551 ), n =c(53940, 53940),
              alternative = "two.sided")

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(13791, 21551) out of c(53940, 53940)
## X-squared = 2533.4, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.1494173 -0.1383098
## sample estimates:
##    prop 1    prop 2 
## 0.2556730 0.3995365

We see that the premium cut rate is 0.26, and the ideal cut rate is 0.40. Therefore, the ratios of these two samples are different. (p=0)

SUMMARY

Non-parametric tests corresponding to parametric tests
Parametric Test	Non-Parametric Test
One-sample T-test	Sign test
Two-sample T-test	Wilcoxon rank sum test
Dependent samples T-test	Wilcoxon Signed Rank Test
One-way Anova	Kruskal-Wallis rank sum test
Two-way Anova	FRIEDMAN TEST
F-test	Levene test

STATISTICAL TESTS IN R

İrem Tanrıverdi

2021-11-14

TESTING POPULATION MEAN

ONE SAMPLE T-TEST

one-sided

two-sided

TWO SAMPLE T-TEST

Normality

First Sample

Second Sample

Homogeneity of the variances

T-TEST FOR DEPENDENT SAMLPLES (PAIRED SAMPLES T-TEST)

NON-PARAMETRIC TEST IN R

Sign Test

Wilcoxon rank sum test

Wilcoxon rank sum test

Wilcoxon Signed Rank Test

ONE WAY ANOVA

homogenity of variances

NORMALITY

Kruskal-Wallis rank sum test

TWO-WAY ANOVA

MANOVA

LATIN SQUARE DESIGN

TESTING POPULATION VARIANCE

Single sample test of variance

There are two-sample.test

TESTING POPULATION PROPORTION

Single sample proportion

Proportion of two samples

SUMMARY