Assess the strength of evidence for/against a hypothesis; evaluate the data
Inferential statistical methods divide into 2 categories.
Hypothesis Testing: Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.
Model Fitting: Model fitting is a measure of how well a statistical learning model generalizes to similar data to that on which it was trained. A model that is well-fitted produces more accurate outcomes.
The process of drawing conclusions about population parameters based on a sample taken from the population.
Proposed explanation for a phenomenon.
A hypothesis is an educated guess about something in the world around you. It should be testable, either by experiment or observation.
Proposed explanation
Objectively testable
Singular - hypothesis
Plural - hypotheses
Examples
“If I…(do this to an independent variable)….then (this will happen to the dependent variable).”
Example
A good hypothesis statement should:
A p-value is a statistical measurement used to validate a hypothesis against observed data.
A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true.
The lower the p-value, the greater the statistical significance of the observed difference.
If p-value > alpha: Fail to reject the null hypothesis (i.e. not significant result).
If p-value <= alpha: Reject the null hypothesis (i.e. significant result).
If Significance level, α=0.05 or 5%
Statistical tests are either parametric or non-parametric tests:
Situation | Test |
---|---|
1 categorical variable | 1 sample proportion test |
2 categorical variables | chi squared test |
1 numeric variable | t-test |
1 numeric and 1 categorical variable | t-test or ANOVA |
more than 2 categorical variables | ANOVA |
2 numeric variables | correlation test |
The first question we need to ask is whether we are dealing with bivariate analysis or multivariate analysis.
Bivariate analysis: studying the relationship between two variables. For example:
Multivariate (regression modelling/analysis): studying the effect of multiple variables on an outcome variable. For example:
If we are doing bivariate analysis, we have to ask if we are studying a difference or a correlation.
Difference: to study the difference between two or more groups, or two or more conditions For example:
If we are doing bivariate analysis, we have to ask if we are working with independent data or paired data
Independent (unpaired) The observations in each sample are not related There is no relationship between the subjects in each sample.
Dependent (Paired): paired samples include:
Whatever the analysis we are doing, it is important to identify the types of data variables we are studying. The type of data variables is very important in choosing the suitable test. The following chart helps to distinguish between different types of data variables.
Time to event data (survival data): This is a special data type that will be discussed in survival analysis.
It is important to ask if we are comparing two groups (conditions) or more than two groups (conditions). For example: - Are we comparing two groups (diseased, not diseased), or three groups (normal, osteopaths, osteoporosis)? - Are we comparing two conditions (pre-test, post-test), or three conditions (before the operation, during the operation, after the operation)?
It is important before doing some statistical tests to determine if a numeric variable is normally distributed or not.
This histogram shows normally distributed variable.
For some tests to be done, data needs to be approximately normally distributed. - How to test for normality? - 1- Plotting a histogram or QQ plot - 2- Using a statistical test - The statistical tests for normality are the Shapiro-Wilk and Kolmogorov-Smirnoff tests - We usually do both, the graph and the statistical tests. - The hypotheses of the Shapiro-Wilk and Kolmogorov-Smirnoff tests
Ho : the variable is normally distributed H1 : the variable is not normally distributed
Homogeneity of variances (similar standard deviations) means that the variable we are studying has the same variance across groups.
We need to test for the equality of variances between groups when using some statistical tests, e.g. Independent t-tests and one-way ANOVA.
Homogeneity of variances is tested using Levene’s test.
Interpretation of the test result: If the p-value is < 0.05 reject H0 and conclude that the assumption of equal variances has not been met.
We accept the null hypothesis (say that there is equal variance) if the P-value > 0.05.
If the homogeneity of variance assumption was not met, the standard tests cannot be done, and modified tests can be used (will be discussed with the relevant tests).
# Load packages
library(tidyverse)
library(ggplot2)
library(ggpubr)
library(gridExtra)
library(gtsummary)
library(gt)
library(datasets)
Tests whether a data sample has a Gaussian distribution.
Assumptions
Observations in each sample are
independent and identically distributed (iid).
Interpretation
# Normality Test in R
data <- read.csv("data/500_Person_Gender_Height_Weight_Index.csv")
# examine first few rows
head(data)
Gender Height Weight Index
1 Male 174 96 4
2 Male 189 87 2
3 Female 185 110 4
4 Female 195 104 3
5 Male 149 61 3
6 Male 189 104 3
# Check Distribution of Height
gghistogram(data, x = "Height", add = "mean", fill = "#003f5c")
ggqqplot(data, x = "Height")
# Normality Test
shapiro.test(data$Height)
Shapiro-Wilk normality test
data: data$Height
W = 0.96065, p-value = 2.665e-10
# Interpretation
test <- shapiro.test(data$Height)
# Set the significance level
alpha = 0.05
if(test$p.value > alpha){
print("The sample has a Gaussian/normal distribution(Fail to reject the null hypothesis, the result is not significant)")
} else {
print("The sample does not have a Gaussian/normal distribution(Reject the null hypothesis, the result is significant)")
}
[1] "The sample does not have a Gaussian/normal distribution(Reject the null hypothesis, the result is significant)"
Interpretation of the result: If p-value < 0.05 (or another chosen significance level), then there is evidence that the sample has a Gaussian/normal distribution.
Reporting significant results: A Shapiro-Wilk test was used to check whether a data sample has a Gaussian distribution. Significant result(the sample does not have a Gaussian/normal distributio) was found in the results(p-value < 0.05).
Reporting non-significant results: A Shapiro-Wilk test was used to check whether a data sample has a Gaussian distribution. No significant result(the sample has a Gaussian/normal distribution) was found in the results(p > 0.05)
Tests whether a data sample has a equal variances.
Assumptions
Observations in each sample are
independent and identically distributed (iid).
Interpretation
library(car)
leveneTest(Height ~ Gender, data = data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 3.2219 0.07327 .
498
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation of the result: If p-value < 0.05 (or another chosen significance level), then there is evidence that the sample has a equal variances.
Reporting significant results: A leveneTest test was used to check whether a data sample has a equal variances. Significant result(the sample does not have equal variances) was found in the results(p-value < 0.05).
Reporting non-significant results: A leveneTest test was used to check whether a data sample has a equal variances. No significant result(the sample has equal variances) was found in the results(p > 0.05)
Is there a difference in the number of men and women in the population?
# Frequency Table
table(data$Gender)
Female Male
255 245
# Proportion Table
prop.table(table(data$Gender))
Female Male
0.51 0.49
# 1 Sample proportion test
x <- table(data$Gender)
prop.test(x)
1-sample proportions test with continuity correction
data: x, null probability 0.5
X-squared = 0.162, df = 1, p-value = 0.6873
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.4652797 0.5545644
sample estimates:
p
0.51
prop_test <- prop.test(x, n=12, p = 0.5)
alpha <- 0.05
if(prop_test$p.value > alpha){
print("There is a difference.(Fail to reject H0, the result is not significant)")
} else{
print("There is no difference.(Reject H0, the result is significant)")
}
[1] "There is a difference.(Fail to reject H0, the result is not significant)"
Interpretation of the result: If p-value < 0.05 (or another chosen significance level), then there is evidence that there is a difference in the number of men and women in the population.
Reporting significant results: A proportion test was used to check whether a difference in the number of men and women in the population. Significant result(there is no difference in the number of men and women in the population) was found in the results(p-value < 0.05).
Reporting non-significant results: A proportion test was used to check whether a difference in the number of men and women in the population. No significant result(there is a difference in the number of men and women in the populations) was found in the results(p > 0.05)
Unpaired t-test also have 2 categories
# One sample t-test
t.test(data$Height)
One Sample t-test
data: data$Height
t = 232.06, df = 499, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
168.5052 171.3828
sample estimates:
mean of x
169.944
# One sample t-test, set the value of mu
t.test(data$Height, mu=169)
One Sample t-test
data: data$Height
t = 1.289, df = 499, p-value = 0.198
alternative hypothesis: true mean is not equal to 169
95 percent confidence interval:
168.5052 171.3828
sample estimates:
mean of x
169.944
# One sample t-test is two tailed test by default
t.test(data$Height, mu=169, alternative = "two.sided")
One Sample t-test
data: data$Height
t = 1.289, df = 499, p-value = 0.198
alternative hypothesis: true mean is not equal to 169
95 percent confidence interval:
168.5052 171.3828
sample estimates:
mean of x
169.944
# To perform one tailed, upper tailed test
t.test(data$Height, mu=169, alternative = "greater")
One Sample t-test
data: data$Height
t = 1.289, df = 499, p-value = 0.09899
alternative hypothesis: true mean is greater than 169
95 percent confidence interval:
168.7372 Inf
sample estimates:
mean of x
169.944
# To perform one tailed, lower tailed test
t.test(data$Height, mu=169, alternative = "less")
One Sample t-test
data: data$Height
t = 1.289, df = 499, p-value = 0.901
alternative hypothesis: true mean is less than 169
95 percent confidence interval:
-Inf 171.1508
sample estimates:
mean of x
169.944
Example:
anorexia <- read.csv("data/anorexia.csv")
head(anorexia)
X Treat Prewt Postwt
1 1 Cont 80.7 80.2
2 2 Cont 89.4 80.1
3 3 Cont 91.8 86.4
4 4 Cont 74.0 86.3
5 5 Cont 78.1 76.1
6 6 Cont 88.3 78.1
x <- subset(anorexia, Treat == "Cont", Prewt, drop =TRUE)
y <- subset(anorexia, Treat == "Cont", Postwt, drop=TRUE)
# Perform paired t-test
t.test(x, y, paired = TRUE)
Paired t-test
data: x and y
t = 0.28723, df = 25, p-value = 0.7763
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-2.776708 3.676708
sample estimates:
mean difference
0.45
# Import data
us_mortality = read.csv("data/USRegionalMortality.csv")
head(us_mortality)
X Region Status Sex Cause Rate SE
1 5 HHS Region 01 Urban Male Heart disease 188.2 1.0
2 6 HHS Region 01 Rural Male Heart disease 199.1 2.6
3 7 HHS Region 01 Urban Female Heart disease 115.1 0.6
4 8 HHS Region 01 Rural Female Heart disease 124.5 1.7
5 9 HHS Region 02 Urban Male Heart disease 226.8 0.8
6 10 HHS Region 02 Rural Male Heart disease 248.8 3.3
# Filtering Data
x <- us_mortality %>%
filter(Cause == "Heart disease" & Sex == "Male")
y <- us_mortality %>%
filter(Cause == "Heart disease" & Sex == "Female")
plot(x$Rate, y$Rate)
# Student's t-test
t.test(x$Rate, y$Rate, var.equal = TRUE)
Two Sample t-test
data: x$Rate and y$Rate
t = 9.9475, df = 38, p-value = 3.951e-12
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
63.53616 96.00384
sample estimates:
mean of x mean of y
216.60 136.83
# Student's t-test
std_test <- t.test(x$Rate, y$Rate, var.equal = TRUE)
alpha = 0.05
if(std_test$p.value > alpha) {
print("The means are equal(Fail to reject H0, the result is not significant)")
} else{
print("The means are not equal(Reject H0, the result is significant)")
}
[1] "The means are not equal(Reject H0, the result is significant)"
# Welch's t-test
t.test(x$Rate, y$Rate)
Welch Two Sample t-test
data: x$Rate and y$Rate
t = 9.9475, df = 35.141, p-value = 9.31e-12
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
63.49267 96.04733
sample estimates:
mean of x mean of y
216.60 136.83
welch_test <- t.test(x$Rate, y$Rate)
alpha = 0.05
if(welch_test$p.value > alpha) {
print("The means are equal(Fail to reject H0, the result is not significant)")
} else{
print("The means are not equal(Reject H0, the result is significant)")
}
[1] "The means are not equal(Reject H0, the result is significant)"
# One sample t-test
t.test(x, mu="known mean")
# Two independent samples(Paired t-test)
t.test(x, y, paired=TRUE)
# Two independent samples(Stdudent's t-test)
t.test(x, y, equal.var=TRUE)
# Two independent samples(Welch's t-test)
t.test(x, y)
Also known as
The null hypothesis is taht no relationship exists between the variables(Independent Variables)
Contingency table is a table with at least two rows and two columns(2x2) and its use to present categorical data in terms of frequency counts.
If the sample size is small, we have to use Fisher’s Exact Test
Fisher’s Exact Test is similar to Chi-squared test, but it is used for small-sized samples.
migraine_data <- read.csv("data/KosteckiDillon.csv")
head(migraine_data)
X id time dos hatype age airq medication headache sex
1 1 1 -11 753 Aura 30 9 continuing yes female
2 2 1 -10 754 Aura 30 7 continuing yes female
3 3 1 -9 755 Aura 30 10 continuing yes female
4 4 1 -8 756 Aura 30 13 continuing yes female
5 5 1 -7 757 Aura 30 18 continuing yes female
6 6 1 -6 758 Aura 30 19 continuing yes female
table <- table(migraine_data$sex, migraine_data$headache)
table
no yes
female 1266 2279
male 220 387
chisq.test(table)
Pearson's Chi-squared test with Yates' continuity correction
data: table
X-squared = 0.042688, df = 1, p-value = 0.8363
chisq.test(migraine_data$sex, migraine_data$headache)
Pearson's Chi-squared test with Yates' continuity correction
data: migraine_data$sex and migraine_data$headache
X-squared = 0.042688, df = 1, p-value = 0.8363
ch_test <- chisq.test(table)
alpha = 0.05
if(ch_test$p.value > alpha) {
print("Dependent(Fail to reject H0, the result is not significant)")
} else{
print("Independent(Reject H0, the result is significant)")
}
[1] "Dependent(Fail to reject H0, the result is not significant)"
fisher.test(migraine_data$sex, migraine_data$medication)
Fisher's Exact Test for Count Data
data: migraine_data$sex and migraine_data$medication
p-value < 2.2e-16
alternative hypothesis: two.sided
fs_test <- fisher.test(migraine_data$sex, migraine_data$medication)
alpha = 0.05
if(ch_test$p.value > alpha) {
print("Independent(Fail to reject H0, the result is not significant)")
} else{
print("Dependent(Reject H0, the result is significant)")
}
[1] "Independent(Fail to reject H0, the result is not significant)"
Correlation Measures whether greater values of one variable correspond to greater values in the other. Scaled to always lie between +1 and −1
Pearson’s Test | Spearman’s Test |
---|---|
Paramentric Correlation | Non-parametric |
Linear relationship | Non-linear relationship |
Continuous variables | continuous or ordinal variables |
Propotional change | Change not at constant rate |
# Import Iris Dataset
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
iris$Species <- NULL
# Calculate correlation
cor(iris$Sepal.Length, iris$Sepal.Width)
[1] -0.1175698
cor(iris$Petal.Length, iris$Petal.Width)
[1] 0.9628654
# Calculate correlation using Spearman method
cor(iris$Sepal.Length, iris$Sepal.Width, method = "spearman")
[1] -0.1667777
cor(iris$Petal.Length, iris$Petal.Width, method = "spearman")
[1] 0.9376668
# Plot Sepal.Length vs Sepal.Width
plot(iris$Sepal.Length, iris$Sepal.Width)
# Plot Petal.Length vs Petal.Width
plot(iris$Petal.Length, iris$Petal.Width)
# Calculate correlation matrix
cor(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
# Calculate correlation matrix using spearman method
cor(iris, method = "spearman")
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1667777 0.8818981 0.8342888
Sepal.Width -0.1667777 1.0000000 -0.3096351 -0.2890317
Petal.Length 0.8818981 -0.3096351 1.0000000 0.9376668
Petal.Width 0.8342888 -0.2890317 0.9376668 1.0000000
# Correlation Test
cor.test(iris$Sepal.Length, iris$Petal.Length)
Pearson's product-moment correlation
data: iris$Sepal.Length and iris$Petal.Length
t = 21.646, df = 148, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8270363 0.9055080
sample estimates:
cor
0.8717538
# Pearson's Correlation: Interpretation
pearson_cor <- cor.test(iris$Sepal.Length, iris$Petal.Length)
alpha = 0.05
if(pearson_cor$p.value > alpha) {
print("Independent(Fail to reject H0, the result is not significant)")
} else{
print("Dependent (Reject H0, the result is significant)")
}
[1] "Dependent (Reject H0, the result is significant)"
Tests whether two samples have a monotonic relationship.
Assumptions
- Observations in each sample are
independent and identically distributed (iid). - Observations in each
sample can be ranked.
Interpretation
- H0: the two samples are
independent. - Ha: there is a dependency between the samples.
# Spearman's Correlation: Interpretation
spearman_cor <- cor.test(iris$Sepal.Length, iris$Petal.Length, method = "spearman")
alpha = 0.05
if(spearman_cor$p.value > alpha) {
print("Independent(Fail to reject H0, the result is not significant)")
} else{
print("Dependent (Reject H0, the result is significant)")
}
[1] "Dependent (Reject H0, the result is significant)"
Assumptions
Interpretation
# Spearman's Correlation: Interpretation
kendall_cor <- cor.test(iris$Sepal.Length, iris$Petal.Length, method = "kendall")
alpha = 0.05
if(kendall_cor$p.value > alpha) {
print("Independent(Fail to reject H0, the result is not significant)")
} else{
print("Dependent (Reject H0, the result is significant)")
}
[1] "Dependent (Reject H0, the result is significant)"
Assumptions
Interpretation
# Effect of cadmium on growth of green alga
alga <- read.csv("data/S.capricornutum.csv")
head(alga)
X conc count
1 1 0 120.9
2 2 0 118.0
3 3 0 134.0
4 4 5 121.2
5 5 5 118.6
6 6 5 120.4
# Structure
str(alga)
'data.frame': 18 obs. of 3 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ conc : int 0 0 0 5 5 5 10 10 10 20 ...
$ count: num 121 118 134 121 119 ...
# Dependent ~ Single Independent Variables(as factor)
one_way <- aov(count ~ as.factor(conc), data=alga)
summary(one_way)
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(conc) 5 40069 8014 217.6 2.44e-11 ***
Residuals 12 442 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(one_way)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = count ~ as.factor(conc), data = alga)
$`as.factor(conc)`
diff lwr upr p adj
5-0 -4.233333 -20.87689 12.410223 0.9505229
10-0 -48.633333 -65.27689 -31.989777 0.0000051
20-0 -80.233333 -96.87689 -63.589777 0.0000000
40-0 -110.266667 -126.91022 -93.623111 0.0000000
80-0 -119.856667 -136.50022 -103.213111 0.0000000
10-5 -44.400000 -61.04356 -27.756444 0.0000135
20-5 -76.000000 -92.64356 -59.356444 0.0000000
40-5 -106.033333 -122.67689 -89.389777 0.0000000
80-5 -115.623333 -132.26689 -98.979777 0.0000000
20-10 -31.600000 -48.24356 -14.956444 0.0003906
40-10 -61.633333 -78.27689 -44.989777 0.0000004
80-10 -71.223333 -87.86689 -54.579777 0.0000001
40-20 -30.033333 -46.67689 -13.389777 0.0006220
80-20 -39.623333 -56.26689 -22.979777 0.0000434
80-40 -9.590000 -26.23356 7.053556 0.4279326
pig_data <- read.csv("data/ToothGrowth.csv")
head(pig_data)
X len supp dose
1 1 4.2 VC 0.5
2 2 11.5 VC 0.5
3 3 7.3 VC 0.5
4 4 5.8 VC 0.5
5 5 6.4 VC 0.5
6 6 10.0 VC 0.5
# Dependent ~ Multiple Independent Variables(as factor)
two_way <- aov(len ~ as.factor(supp)+as.factor(dose), data=pig_data)
summary(two_way)
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(supp) 1 205.4 205.4 14.02 0.000429 ***
as.factor(dose) 2 2426.4 1213.2 82.81 < 2e-16 ***
Residuals 56 820.4 14.7
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(two_way, which = "as.factor(dose)")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = len ~ as.factor(supp) + as.factor(dose), data = pig_data)
$`as.factor(dose)`
diff lwr upr p adj
1-0.5 9.130 6.215909 12.044091 0e+00
2-0.5 15.495 12.580909 18.409091 0e+00
2-1 6.365 3.450909 9.279091 7e-06