At the end of the session, the participants are expected to:
Statistical inference is the process through which inferences about a population are made based on certain statistics calculated from a sample of data drawn from that population.
There are basically two types of statistical inference
estimation ( a process of estimating the values of parameters based on measured empirical data that has a random component )
hypothesis testing (a process used to assess the plausibility of a hypothesis by using sample data)
The concept of sampling can be illustrated using the cake tasting example
cake
Suppose that you wish to know if this chocolate cake is delicious or not. To know the answer, all you have to do is take and eat a slice of the cake. You don’t have to consume the whole cake to know if the cake is delicious or not.
This is the most basic principle of sampling. We take a few sample from the large population. Then we analyze the sample data and come up with a conclusion about the characteristic of the population.
However, we must use RANDOMIZATION in the sampling process to ensure that all the characteristics in the population are well represented by the sample.
Stat
Research questions and corresponding statistical tests
The most popular research questions include:
Each of these questions can be answered using the following statistical tests:
Many of the statistical procedures including correlation, regression, t-test, and analysis of variance assume some certain characteristic about the data. Generally they assume that:
These assumptions should be taken seriously to draw reliable interpretation and conclusions of the research.
These tests - correlation, t-test and ANOVA - are called parametric tests, because their validity depends on the distribution of the data.
How to assess the normality of the data?
We can use the normal qqplot or the Shapiro-Wilk’s significance test comparing the sample distribution to a normal one in order to ascertain whether data show or not a serious deviation from normality.
How to assess the equality of variances?
The standard Student’s t-test (comparing two independent samples) and the ANOVA test (comparing multiple samples) assume also that the samples to be compared have equal variances.
We can use the following method to assess the equality of variance of the samples:
What is correlation test?
Correlation test is used to evaluate the association between two or more variables.
For instance, if we are interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question.
If there is no relationship between the two variables (father and son heights), the average height of son should be the same regardless of the height of the fathers and vice versa.
Methods for correlation analyses
Here are different methods to perform correlation analysis:
Pearson correlation (r), which measures a linear dependence between two variables (x and y). It’s also known as a parametric correlation test because it depends to the distribution of the data. It can be used only when \(x\) and \(y\) are from normal distribution. The plot of \(y = f(x)\) is named the linear regression curve.
Kendall tau and Spearman rho, which are rank-based correlation coefficients (non-parametric)
Correlation formula
Pearson correlation formula
\[ r = \frac{\sum_{}^{} (x-m_x)(y-m_y)}{\sqrt{\sum_{}^{} (x-m_x)^2 (y-m_y)^2 }}\]
The p-value (significance level) of the correlation can be determined by using the correlation coefficient table for the degrees of freedom : \(df=n−2\), where n is the number of observation in x and y variables.
Note: If the p-value is < 5%, then the correlation between x and y is significant.
Spearman correlation formula
\[ rho = \frac{\sum_{}^{} (x'-m_{x'})(y'-m_{y'})}{\sqrt{\sum_{}^{} (x'-m_{x'})^2 (y'-m_{y'})^2 }}\] Where \(x′=rank(x)\) and \(y′=rank(y)\).
Kendall correlation formula
The Kendall correlation method measures the correspondence between the ranking of \(x\) and \(y\) variables. The total number of possible pairings of \(x\) with \(y\) observations is \(n(n−1)/2\), where \(n\) is the size of \(x\) and \(y\).
The procedure is as follow:
Begin by ordering the pairs by the \(x\) values. If \(x\) and \(y\) are correlated, then they would have the same relative rank orders.
Now, for each \(y_i\), count the number of \(y_j>y_i\) (concordant pairs (c)) and the number of \(y_j<y_i\) (discordant pairs (d)).
Kendall correlation distance is defined as follow:
\[ tau=\frac{n_c-n_d}{\frac{1}{2} [n(n-1)]}\] Where,
Compute correlations in R
Correlation coefficient can be computed using the functions cor() or cor.test():
The simplified formats are:
cor(x, y, method = c("pearson", "kendall", "spearman"))
cor.test(x, y, method=c("pearson", "kendall", "spearman"))If your data contain missing values, use the following R code to handle missing values by case-wise deletion.
Here, we’ll use the built-in R data set mtcars as an example.
We want to compute the correlation between mpg and wt variables.
We can start by visualizing our data using scatter plots:
library("ggpubr")
ggscatter(my_data, x = "mpg", y = "wt",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Miles/(US) gallon", ylab = "Weight (1000 lbs)")Preliminary test to check the test assumptions
Is the relationship or association linear? Yes, form the plot above, the relationship is linear. In the situation where the scatter plots show curved patterns, we are dealing with nonlinear association between the two variables.
Are the data from each of the 2 variables (x, y) follow a normal distribution?
Shapiro-Wilk test can be performed as follow:
##
## Shapiro-Wilk normality test
##
## data: my_data$mpg
## W = 0.94756, p-value = 0.1229
##
## Shapiro-Wilk normality test
##
## data: my_data$wt
## W = 0.94326, p-value = 0.09265
From the output, the two p-values are greater than the significance level 0.05 implying that the distribution of the data are not significantly different from normal distribution. In other words, we can assume the normality.
Pearson correlation test
##
## Pearson's product-moment correlation
##
## data: my_data$wt and my_data$mpg
## t = -9.559, df = 30, p-value = 1.294e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9338264 -0.7440872
## sample estimates:
## cor
## -0.8676594
In the result above :
Interpretation of the result
The p-value of the test is 1.294^{-10}, which is less than the significance level alpha = 0.05. We can conclude that wt and mpg are significantly correlated with a correlation coefficient of -0.87 and p-value of 1.294^{-10}.
To compute for Spearman rho and Kendall tau:
##
## Kendall's rank correlation tau
##
## data: my_data$wt and my_data$mpg
## z = -5.7981, p-value = 6.706e-09
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## -0.7278321
tau is the Kendall correlation coefficient.
##
## Spearman's rank correlation rho
##
## data: my_data$wt and my_data$mpg
## S = 10292, p-value = 1.488e-11
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.886422
rho is the Spearman’s correlation coefficient.
Interpret correlation coefficient
Correlation coefficient is comprised between -1 and 1:
A correlation matrix is used to investigate the dependence between multiple variables at the same time. The result is a table containing the correlation coefficients between each variable and the others.
Here, we’ll use a data derived from the built-in R data set mtcars as an example:
# Load data
data("mtcars")
my_data <- mtcars[, c(1,3,4,5,6,7)]
# print the first 6 rows
head(my_data, 6)Compute correlation matrix
## mpg disp hp drat wt qsec
## mpg 1.00 -0.85 -0.78 0.68 -0.87 0.42
## disp -0.85 1.00 0.79 -0.71 0.89 -0.43
## hp -0.78 0.79 1.00 -0.45 0.66 -0.71
## drat 0.68 -0.71 -0.45 1.00 -0.71 0.09
## wt -0.87 0.89 0.66 -0.71 1.00 -0.17
## qsec 0.42 -0.43 -0.71 0.09 -0.17 1.00
In the table above correlation coefficients between the possible pairs of variables are shown.
Note that, if your data contain missing values, use the following R code to handle missing values by case-wise deletion.
Unfortunately, the function cor() returns only the correlation coefficients between variables. In the next section, we will use Hmisc R package to calculate the correlation p-values.
Use the rcorr() function
## mpg disp hp drat wt qsec
## mpg 1.00 -0.85 -0.78 0.68 -0.87 0.42
## disp -0.85 1.00 0.79 -0.71 0.89 -0.43
## hp -0.78 0.79 1.00 -0.45 0.66 -0.71
## drat 0.68 -0.71 -0.45 1.00 -0.71 0.09
## wt -0.87 0.89 0.66 -0.71 1.00 -0.17
## qsec 0.42 -0.43 -0.71 0.09 -0.17 1.00
##
## n= 32
##
##
## P
## mpg disp hp drat wt qsec
## mpg 0.0000 0.0000 0.0000 0.0000 0.0171
## disp 0.0000 0.0000 0.0000 0.0000 0.0131
## hp 0.0000 0.0000 0.0100 0.0000 0.0000
## drat 0.0000 0.0000 0.0100 0.0000 0.6196
## wt 0.0000 0.0000 0.0000 0.0000 0.3389
## qsec 0.0171 0.0131 0.0000 0.6196 0.3389
The output of the function rcorr() is a list containing the following elements :
Use corrplot() function: Draw a correlogram
The function corrplot() takes the correlation matrix as the first argument. The second argument (type=“upper”) is used to display only the upper triangular of the correlation matrix.
Positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients. In the right side of the correlogram, the legend color shows the correlation coefficients and the corresponding colors.
The function chart.Correlation()[ in the package PerformanceAnalytics], can be used to display a chart of a correlation matrix.
library("PerformanceAnalytics")
my_data <- mtcars[, c(1,3,4,5,6,7)]
chart.Correlation(my_data, histogram=TRUE, pch=19)In the above plot:
This section contains articles describing statistical tests to use for comparing means. These tests include:
one-sample t-test is used to compare the mean of one sample to a known standard (or theoretical/hypothetical) mean \(\mu\) :
Generally, the theoretical mean comes from:
Research questions and statistical hypotheses
Typical research questions are:
The t-statistic can be calculated as follow:
\[ t= \frac{m-\mu}{\frac{s}{\sqrt(n)}}\]
here,
We can compute the p-value corresponding to the absolute value of the t-test statistics (|t|) for the degrees of freedom (df): \(df=n−1\).
How to interpret the results?
To perform one-sample t-test, the R function t.test() can be used as follow:
Suppose that we want to know, if the average weight of the mice differs from 25g? We will generate a random weights for the mice
set.seed(1234)
my_data <- data.frame(
name = paste0(rep("M_", 10), 1:10),
weight = round(rnorm(10, 20, 2), 1)
)
# Print the data
my_data## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.30 18.38 18.90 19.25 20.82 22.20
library(ggpubr)
ggboxplot(my_data$weight,
ylab = "Weight (g)", xlab = FALSE,
ggtheme = theme_minimal())Preliminary test to check one-sample t-test assumptions
Briefly, it’s possible to use the Shapiro-Wilk normality test.
Shapiro-Wilk test:
##
## Shapiro-Wilk normality test
##
## data: my_data$weight
## W = 0.9526, p-value = 0.6993
From the output, the p-value is greater than the significance level 0.05 implying that the distribution of the data are not significantly different from normal distribution. In other words, we can assume the normality.
Compute one-sample t-test
## [1] 19.25
# two sided test (Ho: the mean is equal to 25g)
res <- t.test(my_data$weight, mu = 25)
# Printing the results
res ##
## One Sample t-test
##
## data: my_data$weight
## t = -9.0783, df = 9, p-value = 7.953e-06
## alternative hypothesis: true mean is not equal to 25
## 95 percent confidence interval:
## 17.8172 20.6828
## sample estimates:
## mean of x
## 19.25
In the result above :
Interpretation of the result
Note that:
# one-sided test (Ho: the mean is greater than or equal to to 25g)
t.test(my_data$weight, mu = 25,
alternative = "less")##
## One Sample t-test
##
## data: my_data$weight
## t = -9.0783, df = 9, p-value = 3.977e-06
## alternative hypothesis: true mean is less than 25
## 95 percent confidence interval:
## -Inf 20.41105
## sample estimates:
## mean of x
## 19.25
Interpretation of the result
Or, if you want to test whether the mean weight of mice is greater than 25g (one-tailed test), type this:
# one-sided test (Ho: the mean is less than or equal to to 25g)
t.test(my_data$weight, mu = 25,
alternative = "greater")##
## One Sample t-test
##
## data: my_data$weight
## t = -9.0783, df = 9, p-value = 1
## alternative hypothesis: true mean is greater than 25
## 95 percent confidence interval:
## 18.08895 Inf
## sample estimates:
## mean of x
## 19.25
Interpretation of the result
The one-sample Wilcoxon signed rank test is a non-parametric alternative to one-sample t-test when the data cannot be assumed to be normally distributed. It’s used to determine whether the median of the sample is equal to a known standard value (i.e. theoretical value).
Note that, the data should be distributed symmetrically around the median. In other words, there should be roughly the same number of values above and below the median.
Typical research questions are:
To perform one-sample Wilcoxon-test, the R function wilcox.test() can be used as follow:
For illustration purposes, we will use the randomly generated my_data. We will also use the same hypothetical question:
##
## Wilcoxon signed rank test with continuity correction
##
## data: my_data$weight
## V = 0, p-value = 0.005793
## alternative hypothesis: true location is not equal to 25
The unpaired two-samples t-test is used to compare the mean of two independent groups.
For example, suppose that we have measured the weight of 100 individuals: 50 women (group A) and 50 men (group B). We want to know if the mean weight of women (mA) is significantly different from that of men (mB).
In this case, we have two unrelated (i.e., independent or unpaired) groups of samples. Therefore, it’s possible to use an independent t-test to evaluate whether the means are different.
Note that, unpaired two-samples t-test can be used only under certain conditions:
Typical research questions are:
Formula of unpaired two-samples t-test
Classical t-test:
\[ t = \frac{m_A-m_B}{\sqrt{\frac{s^2}{n_A}+\frac{s^2}{n_B}}}\]
where,
\[ S^2 = \frac{\sum_{}^{}(x-m_A)^2+\sum_{}^{}(x-m_B)^2}{n_A+n_B-2}\]
with degrees of freedom (df): \(df=n_A+n_B-2\).
Welch t-statistic :
where, \(S_A\) and \(S_B\) are the standard deviation of the two groups A and B, respectively.
\[ df= \frac{(\frac{s_A^2}{n_A}+\frac{s_B^2}{n_B})^2}{\frac{S_A^4}{n_A^2 v_1}+\frac{S_B^4}{n_B^2 v_2}}\]
where \(v_1=n_A-1\) and \(v_2=n_B-1\)
Note that, the Welch t-test is considered as the safer one. Usually, the results of the classical t-test and the Welch t-test are very similar unless both the group sizes and the standard deviations are very different.
R function to compute unpaired two-samples t-test
Here, we’ll use an example data set, which contains the weight of 18 individuals (9 women and 9 men):
# Data in two numeric vectors
women_weight <- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5)
men_weight <- c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4)
# Create a data frame
my_data <- data.frame(
group = rep(c("Woman", "Man"), each = 9),
weight = c(women_weight, men_weight)
)
print(my_data)## group weight
## 1 Woman 38.9
## 2 Woman 61.2
## 3 Woman 73.3
## 4 Woman 21.8
## 5 Woman 63.4
## 6 Woman 64.6
## 7 Woman 48.4
## 8 Woman 48.8
## 9 Woman 48.5
## 10 Man 67.8
## 11 Man 60.0
## 12 Man 63.4
## 13 Man 76.0
## 14 Man 89.4
## 15 Man 73.3
## 16 Man 67.3
## 17 Man 61.3
## 18 Man 62.4
Compute summary statistics by groups:
library(dplyr)
group_by(my_data, group) %>%
summarise(
count = n(),
mean = mean(weight, na.rm = TRUE),
sd = sd(weight, na.rm = TRUE)
)Visualize your data using box plots
# Plot weight by group and color by group
library("ggpubr")
ggboxplot(my_data, x = "group", y = "weight",
color = "group", palette = c("#00AFBB", "#E7B800"),
ylab = "Weight", xlab = "Groups")Preliminary test to check independent t-test assumptions
Assumption 1: Are the two samples independent? Yes, since the samples from men and women are not related.
Assumption 2: Are the data from each of the 2 groups follow a normal distribution?
##
## Shapiro-Wilk normality test
##
## data: weight[group == "Man"]
## W = 0.86425, p-value = 0.1066
# Shapiro-Wilk normality test for Women's weights
with(my_data, shapiro.test(weight[group == "Woman"]))##
## Shapiro-Wilk normality test
##
## data: weight[group == "Woman"]
## W = 0.94266, p-value = 0.6101
Assumption 3. Do the two populations have the same variances?
##
## F test to compare two variances
##
## data: weight by group
## F = 0.36134, num df = 8, denom df = 8, p-value = 0.1714
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.08150656 1.60191315
## sample estimates:
## ratio of variances
## 0.3613398
##
## Two Sample t-test
##
## data: women_weight and men_weight
## t = -2.7842, df = 16, p-value = 0.01327
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -29.748019 -4.029759
## sample estimates:
## mean of x mean of y
## 52.10000 68.98889
In the result above :
Interpretation of the result
Note that:
Or, if you want to test whether the average men’s weight is greater than the average women’s weight, type this:
The unpaired two-samples Wilcoxon test (also known as Wilcoxon rank sum test or Mann-Whitney test) is a non-parametric alternative to the unpaired two-samples t-test, which can be used to compare two independent groups of samples. It’s used when your data are not normally distributed.
R function to compute Wilcoxon test
For illustration purposes, we will use the dataset, which contains the weight of 18 individuals (9 women and 9 men)
We want to know, if the median women’s weight differs from the median men’s weight?
Compute summary statistics by groups:
library(dplyr)
group_by(my_data, group) %>%
summarise(
count = n(),
median = median(weight, na.rm = TRUE),
IQR = IQR(weight, na.rm = TRUE)
)##
## Wilcoxon rank sum test with continuity correction
##
## data: women_weight and men_weight
## W = 15, p-value = 0.02712
## alternative hypothesis: true location shift is not equal to 0
The p-value of the test is 0.02712, which is less than the significance level alpha = 0.05. We can conclude that men’s median weight is significantly different from women’s median weight with a p-value = 0.02712.
Note that:
- Or, if you want to test whether the median men’s weight is greater than the median women’s weight, type this
The paired samples t-test is used to compare the means between two related groups of samples. In this case, you have two values (i.e., pair of values) for the same samples.
t-test statistic value can be calculated using the following formula:
\[ t = \frac{m}{s/\sqrt(n)} \]
where,
m is the mean differences
n is the sample size (i.e., size of d).
s is the standard deviation of d
\(df=n-1\)
To perform paired samples t-test comparing the means of two paired samples (x & y), the R function t.test() can be used as follow:
Here, we’ll use an example data set, which contains the weight of 10 mice before and after the treatment.
# Weight of the mice before treatment
before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)
# Weight of the mice after treatment
after <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)
# Create a data frame
my_data <- data.frame(
group = rep(c("before", "after"), each = 10),
weight = c(before, after)
)
print(my_data)## group weight
## 1 before 200.1
## 2 before 190.9
## 3 before 192.7
## 4 before 213.0
## 5 before 241.4
## 6 before 196.9
## 7 before 172.2
## 8 before 185.5
## 9 before 205.2
## 10 before 193.7
## 11 after 392.9
## 12 after 393.2
## 13 after 345.1
## 14 after 393.0
## 15 after 434.0
## 16 after 427.9
## 17 after 422.0
## 18 after 383.9
## 19 after 392.3
## 20 after 352.2
Compute summary statistics by groups:
library("dplyr")
group_by(my_data, group) %>%
summarise(
count = n(),
mean = mean(weight, na.rm = TRUE),
sd = sd(weight, na.rm = TRUE)
)# Plot weight by group and color by group
library("ggpubr")
ggboxplot(my_data, x = "group", y = "weight",
color = "group", palette = c("#00AFBB", "#E7B800"),
order = c("before", "after"),
ylab = "Weight", xlab = "Groups") Preliminary test to check paired t-test assumptions:
Assumption 1: Are the two samples paired? Yes, since the data have been collected from measuring twice the weight of the same mice.
Assumption 2: is the data normally distributed? Use Shapiro-Wilk normality test
# compute the difference
d <- with(my_data,
weight[group == "before"] - weight[group == "after"])
# Shapiro-Wilk normality test for the differences
shapiro.test(d)##
## Shapiro-Wilk normality test
##
## data: d
## W = 0.94536, p-value = 0.6141
From the output, the p-value is greater than the significance level 0.05 implying that the distribution of the differences (d) are not significantly different from normal distribution. In other words, we can assume the normality.
Note that, if the data are not normally distributed, it’s recommended to use the non parametric paired two-samples Wilcoxon test
##
## Paired t-test
##
## data: before and after
## t = -20.883, df = 9, p-value = 6.2e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -215.5581 -173.4219
## sample estimates:
## mean of the differences
## -194.49
In the result above :
Interpretation of the result
The p-value of the test is 6.210^{-9}, which is less than the significance level alpha = 0.05. We can then reject null hypothesis and conclude that the average weight of the mice before treatment is significantly different from the average weight after treatment with a p-value = 6.210^{-9}.
Note that:
The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is an extension of independent two-samples t-test for comparing means in a situation where there are more than two groups. In one-way ANOVA, the data is organized into several groups base on one single grouping variable (also called factor variable).
ANOVA test hypotheses:
Assumptions of ANOVA test
ANOVA test can be applied only when:
Here, we’ll use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under a control and two different treatment conditions.
## [1] "ctrl" "trt1" "trt2"
If the levels are not automatically in the correct order, re-order them as follow:
Compute summary statistics by groups - count, mean, sd:
library(dplyr)
group_by(my_data, group) %>%
summarise(
count = n(),
mean = mean(weight, na.rm = TRUE),
sd = sd(weight, na.rm = TRUE)
)#visualize the data
#Plot weight by group and color by group
library("ggpubr")
ggboxplot(my_data, x = "group", y = "weight",
color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
order = c("ctrl", "trt1", "trt2"),
ylab = "Weight", xlab = "Treatment") Compute one-way ANOVA test
We want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions.
The R function aov() can be used to answer to this question. The function summary.aov() is used to summarize the analysis of variance model.
# Compute the analysis of variance
res.aov <- aov(weight ~ group, data = my_data)
# Summary of the analysis
summary(res.aov)## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 3.766 1.8832 4.846 0.0159 *
## Residuals 27 10.492 0.3886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpret the result of one-way ANOVA tests:
Multiple pairwise-comparison between the means of groups
In one-way ANOVA test, a significant p-value indicates that some of the group means are different, but we don’t know which pairs of groups are different.
It’s possible to perform multiple pairwise-comparison, to determine if the mean difference between specific pairs of group are statistically significant.
Tukey multiple pairwise-comparisons
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ group, data = my_data)
##
## $group
## diff lwr upr p adj
## trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
## trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960
## trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
It can be seen from the output, that only the difference between trt2 and trt1 is significant with an adjusted p-value of 0.012.
Check ANOVA assumptions:
From the output above we can see that the p-value is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.
How do we save our ANOVA test, in a situation where the homogeneity of variance assumption is violated?
An alternative procedure (Welch one-way test), that does not require that assumption have been implemented in the function oneway.test().
##
## One-way analysis of means (not assuming equal variances)
##
## data: weight and group
## F = 5.181, num df = 2.000, denom df = 17.128, p-value = 0.01739
# Extract the residuals
aov_residuals <- residuals(object = res.aov )
# Run Shapiro-Wilk test
shapiro.test(x = aov_residuals )##
## Shapiro-Wilk normality test
##
## data: aov_residuals
## W = 0.96607, p-value = 0.4379
Note that, a non-parametric alternative to one-way ANOVA is Kruskal-Wallis rank sum test, which can be used when ANNOVA assumptions are not met.
Kruskal-Wallis test by rank is a non-parametric alternative to one-way ANOVA test, which extends the two-samples Wilcoxon test in the situation where there are more than two groups. It’s recommended when the assumptions of one-way ANOVA test are not met.
##
## Kruskal-Wallis rank sum test
##
## data: weight by group
## Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842
Multiple pairwise-comparison between groups
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: PlantGrowth$weight and PlantGrowth$group
##
## ctrl trt1
## trt1 0.199 -
## trt2 0.095 0.027
##
## P value adjustment method: BH
Two-way ANOVA test is used to evaluate simultaneously the effect of two grouping variables (A and B) on a response variable.
The grouping variables are also known as factors. The different categories (groups) of a factor are called levels. The number of levels can vary between factors. The level combinations of factors are called cell.
Two-way ANOVA test hypotheses:
Assumptions of two-way ANOVA test
Two-way ANOVA, like all ANOVA tests, assumes that the observations within each cell are normally distributed and have equal variances. We’ll show you how to check these assumptions after fitting ANOVA.
Here, we’ll use the built-in R data set named ToothGrowth. It contains data from a study evaluating the effect of vitamin C on tooth growth in Guinea pigs. The experiment has been performed on 60 pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC). Tooth length was measured and a sample of the data is shown below.
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# Convert dose as a factor and recode the levels
# as "D0.5", "D1", "D2"
my_data$dose <- factor(my_data$dose,
levels = c(0.5, 1, 2),
labels = c("D0.5", "D1", "D2"))
head(my_data)We want to know if tooth length depends on supp and dose.
##
## D0.5 D1 D2
## OJ 10 10 10
## VC 10 10 10
#Visualize your data with ggpubr:
library("ggpubr")
ggboxplot(my_data, x = "dose", y = "len", color = "supp",
palette = c("#00AFBB", "#E7B800"))Compute two-way ANOVA test
## Df Sum Sq Mean Sq F value Pr(>F)
## supp 1 205.4 205.4 14.02 0.000429 ***
## dose 2 2426.4 1213.2 82.81 < 2e-16 ***
## Residuals 56 820.4 14.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that the above fitted model is called additive model. It makes an assumption that the two factor variables are independent. If you think that these two variables might interact to create an synergistic effect, replace the plus symbol (+) by an asterisk (*), as follow.
It can be seen that the two main effects (supp and dose) are statistically significant, as well as their interaction.
Note that, in the situation where the interaction is not significant you should use the additive model.
Interpret the results
From the ANOVA results, you can conclude the following, based on the p-values and a significance level of 0.05:
Compute some summary statistics
library("dplyr")
group_by(my_data, supp, dose) %>%
summarise(
count = n(),
mean = mean(len, na.rm = TRUE),
sd = sd(len, na.rm = TRUE)
)Tukey multiple pairwise-comparisons
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = len ~ supp * dose, data = my_data)
##
## $dose
## diff lwr upr p adj
## D1-D0.5 9.130 6.362488 11.897512 0.0e+00
## D2-D0.5 15.495 12.727488 18.262512 0.0e+00
## D2-D1 6.365 3.597488 9.132512 2.7e-06
Check ANOVA assumptions
From the output above we can see that the p-value is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.
Check the normality assumption
# Extract the residuals
aov_residuals <- residuals(object = res.aov3)
# Run Shapiro-Wilk test
shapiro.test(x = aov_residuals )##
## Shapiro-Wilk normality test
##
## data: aov_residuals
## W = 0.98499, p-value = 0.6694
Research questions and statistical hypotheses:
Formula of the test statistic
The test statistic (also known as z-test) can be calculated as follow:
\[ z = \frac{p_o-p_e}{\sqrt{p_o q/n}}\]
where,
##
## 1-sample proportions test without continuity correction
##
## data: 95 out of 160, null probability 0.5
## X-squared = 5.625, df = 1, p-value = 0.01771
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.5163169 0.6667870
## sample estimates:
## p
## 0.59375
The p-value of the test is 0.01771, which is less than the significance level alpha = 0.05. We can conclude that the proportion of male with cancer is significantly different from 0.5 with a p-value = 0.01771.
The two-proportions z-test is used to compare two observed proportions.
For example, we have two groups of individuals:
The number of smokers in each group is as follow:
We want to know, whether the proportions of smokers are the same in the two groups of individuals?
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(490, 400) out of c(500, 500)
## X-squared = 80.909, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.1408536 0.2191464
## sample estimates:
## prop 1 prop 2
## 0.98 0.80
The p-value of the test is 2.2^{-16}, which is less than the significance level alpha = 0.05. We can conclude that the proportion of smokers is significantly different in the two groups with a p-value = 2.2^{-16}.
The chi-square test of independence is used to analyze the frequency table (i.e. contengency table) formed by two categorical variables. The chi-square test evaluates whether there is a significant association between the categories of the two variables.
# Import the data
file_path <- "http://www.sthda.com/sthda/RDoc/data/housetasks.txt"
housetasks <- read.delim(file_path, row.names = 1)
head(housetasks)The data is a contingency table containing 13 housetasks and their distribution in the couple:
rows are the different tasks
values are the frequencies of the tasks done :
Chi-square test examines whether rows and columns of a contingency table are statistically significantly associated.
The Chi-square statistic is calculated as follow:
\[ \chi^2 = \sum_{}^{} \frac{(o-e)^2}{e}\]
##
## Pearson's Chi-squared test
##
## data: housetasks
## X-squared = 1944.5, df = 36, p-value < 2.2e-16
In our example, the row and the column variables are statistically significantly associated (p-value <0.01).
If you want to know the most contributing cells to the total Chi-square score, you just have to calculate the Chi-square statistic for each cell:
\[ r= \frac{o-e}{\sqrt(e)}\]
The above formula returns the so-called Pearson residuals (r) for each cell (or standardized residuals)
Cells with the highest absolute standardized residuals contribute the most to the total Chi-square score.
Pearson residuals can be easily extracted from the output of the function chisq.test():
## Wife Alternating Husband Jointly
## Laundry 12.266 -2.298 -5.878 -6.609
## Main_meal 9.836 -0.484 -4.917 -6.084
## Dinner 6.537 -1.192 -3.416 -3.299
## Breakfeast 4.875 3.457 -2.818 -5.297
## Tidying 1.702 -1.606 -4.969 3.585
## Dishes -1.103 1.859 -4.163 3.486
## Shopping -1.289 1.321 -3.362 3.376
## Official -3.659 8.563 0.443 -2.459
## Driving -5.469 6.836 8.100 -5.898
## Finances -4.150 -0.852 -0.742 5.750
## Insurance -5.758 -4.277 4.107 5.720
## Repairs -7.534 -4.290 20.646 -6.651
## Holidays -7.419 -4.620 -4.897 15.556
Let’s visualize Pearson residuals using the package corrplot:
The sign of the standardized residuals is also very important to interpret the association between rows and columns as explained in the block below.