This document was composed from Dr. Snopkowski’s ANTH 504 Week 8 lecture and Danielle Navarro’s 2021 Learning statistics with R Chapter 13.
For each statistical test we discuss, I want you to note:
What type of variables do we need for this test?
What is the null and alternative hypotheses?
What test (and R code) do we need to run the statistical analysis?
What are the assumptions of the test?
_ How do we check the assumptions of the test?
_ What alternative tests do we run if the assumptions are not met?
Independent T-test
_ The simplest form of experiment that can be done is one with only one independent variable that is manipulated in only two ways and only one outcome is measured.
More often than not the manipulation of the independent variable involves having an experimental condition and a control.
E.g., Is the movie Scream 2 scarier than the original Scream? We could measure heart rates (which indicate anxiety) during both films and compare them.
_ This situation can be analysed with a t-test
Independent is the movie which is 2 categories and dependent is the heart rate which is continuous
Two Types of T-test
_ Dependent t-test
Compares two means based on related data.
E.g., Data from the same people measured at different times.
Data from ‘matched’ samples.
_ Independent t-test
Compares two means based on independent data
E.g., data from different groups of people
_ Two samples of data are collected and the sample means calculated. These means might differ by either a little or a lot.
_ If the samples come from the same population, then we expect their means to be roughly equal. Although it is possible for their means to differ by chance alone, we would expect large differences between sample means to occur very infrequently.
_ What is our null and alternative hypothesis?
Ho: mu1 = mu2
Ha: mu1 ≠ mu2
_ We compare the difference between the sample means that we collected to the difference between the sample means that we would expect to obtain if there were no effect
_ We use the standard error as a gauge of the variability between sample means.
_ If the difference between the samples we have collected is larger than what we would expect based on the standard error then we can assume one of two:
There is no effect and sample means in our population fluctuate a lot and we have, by chance, collected two samples that are atypical of the population from which they came.
The two samples come from different populations but are typical of their respective parent population. In this scenario, the difference between samples represents a genuine difference between the samples (and so the null hypothesis is incorrect).
_ As the observed difference between the sample means gets larger, the more confident we become that the second explanation is correct (i.e. that the null hypothesis should be rejected). If the null hypothesis is incorrect, then we gain confidence that the two sample means differ because of the different experimental manipulation imposed on each sample.
t = (observed difference between sample means − expected difference between population means (if null hypothesis is true)) / (estimate of the standard error of the difference between two sample means)
Both the independent t-test and the dependent t-test are parametric tests based on the normal distribution. Therefore, they assume:
The sampling distribution is normally distributed.
Data are measured at least at the interval level. (Continuous variable is interval)
Variances in these populations are roughly equal (homogeneity of variance).
Scores in different treatment conditions are independent (because they come from different people).
Question: Are the grades of students who are taught by different instructors / TA’s significantly different?
We have 2 TA’s: Anastasia & Bernadette
Dataset = “harpo.Rdata”
Data comes from: https://learningstatisticswithr.com/
_ What are our variables? What type of variables are they? (Binary, Categorical, Continuous)?
_Independent variable is “tutor” which has two categories. The dependent is the “grade” which is continuous.
_ How might we visually display (or conduct descriptive statistics) to see the differences (if any exist) between the TAs?
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
load("harpo.Rdata")
# Get summary statistics for the the continuous column for each category
harpo_summary <-harpo %>%
group_by(tutor) %>%
summarise(n = n(),
mean_grade = mean(grade),
sd_grade = sd(grade),
min_grade = min(grade),
max_grade = max(grade)
)
harpo_summary
## # A tibble: 2 × 6
## tutor n mean_grade sd_grade min_grade max_grade
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Anastasia 15 74.5 9.00 55 90
## 2 Bernadette 18 69.1 5.77 56 79
# Create a histogram of the grade variable
ggplot(harpo, aes(x = grade, fill = tutor)) +
geom_histogram(position = "dodge", bins = 10) +
facet_wrap(~tutor) +
labs(title = "Histogram of Grade by Tutor", x = "Grade", y = "Count") +
theme_bw()
# Create a box plot of the continuous variable
ggplot(harpo, aes(x = tutor, y = grade, fill = tutor)) +
geom_boxplot() +
labs(title = "Box plot of Grade by Tutor", x = "Tutor", y = "Grade") +
theme_bw()
# Create a customized density plot of the continuous variable
ggplot(harpo, aes(x = grade, fill = tutor)) +
geom_density(alpha = 0.5) +
labs(title = "Density plot of Value by Category", x = "Grade", y = "Density") +
theme_bw()
#separate each set of data by tutor
ana <- harpo %>%
filter(tutor=="Anastasia")
bern <- harpo %>%
filter(tutor=="Bernadette")
#calculate standard deviations
sd(ana$grade)
## [1] 8.998942
sd(bern$grade)
## [1] 5.774918
#calculate means
m_a = mean(ana$grade)
m_b = mean(bern$grade)
#get n (sample size)
n_a <- length(ana$grade)
n_a
## [1] 15
n_b <- length(bern$grade)
n_b
## [1] 18
#calculate pooled standard deviation
numerator = (n_a-1)*sd(ana$grade)^2 + (n_b-1)*sd(bern$grade)^2
sp = sqrt(numerator/(n_a+n_b-2))
sp
## [1] 7.406792
#calculate denominator of t-statistic
denominator = sqrt((sp^2/n_a) + (sp^2/n_b))
denominator
## [1] 2.589436
#calculate t-statistic
t = (m_a - m_b) / denominator
t
## [1] 2.115432
#use t-distribution to get corresponding p-value
pt(t, n_a + n_b - 2)
## [1] 0.9787353
(1-pt(t, n_a + n_b - 2))
## [1] 0.02126474
2*(1-pt(t, n_a + n_b - 2))
## [1] 0.04252949
We use the pt()
function. This is similar to the
pnorm
function. It allows us to give it a values and it
will return the p-value to us. For example,we give the z-score 1.95 in
pnorm(1.95)
gives us the area under the curve. In
pt()
we give it t, which is “2.11”. And then we need to
give it the degrees of freedom. The t-distribution always required the
degrees of freedom. In this case, the degrees of freedom is number of
ana plus number of bern minus 2. Usually it is just one less but because
we have two here, we do 2 less.
When we do pt()
of of that value, we get 0.9787. This is
the area to the left of that. But we want to know the area under the
tail. So we take 1 minus that value. This is 0.02. This is the area to
the right of that value.
Because we are doing a two tailed test, we need to include the area
under the other tail. This is why we multiply be 2, or we’d add them
together. 0.0425
is our p-value.
#install.packages("lsr")
library(lsr)
independentSamplesTTest(formula = grade ~ tutor, data=harpo, var.equal=TRUE)
##
## Student's independent samples t-test
##
## Outcome variable: grade
## Grouping variable: tutor
##
## Descriptive statistics:
## Anastasia Bernadette
## mean 74.533 69.056
## std dev. 8.999 5.775
##
## Hypotheses:
## null: population means equal for both groups
## alternative: different population means in each group
##
## Test results:
## t-statistic: 2.115
## degrees of freedom: 31
## p-value: 0.043
##
## Other information:
## two-sided 95% confidence interval: [0.197, 10.759]
## estimated effect size (Cohen's d): 0.74
OR
t.test(grade ~ tutor, data=harpo, var.equal=TRUE)
##
## Two Sample t-test
##
## data: grade by tutor
## t = 2.1154, df = 31, p-value = 0.04253
## alternative hypothesis: true difference in means between group Anastasia and group Bernadette is not equal to 0
## 95 percent confidence interval:
## 0.1965873 10.7589683
## sample estimates:
## mean in group Anastasia mean in group Bernadette
## 74.53333 69.05556
t.test(*dependent_first*
~
*independent*, name_data, var.equal=TRUE)
Because the
confidence interval does not range between 0, and the p-value is less
then 0.05, we can conclude that there is a difference between the means.
var.equal=TRUE
We are 95% confident that the population difference between the groups is between 0.197 and 10.759.
Create a histogram Shapiro-Wilk test tests a null that the data are normal. If the p-value is significant, then you have significant deviations from normality.This test has some challenges to it. If you have a large sample, then you are more likely you get statistical significance in the Shapiro-Wilk, however, of you have a big sample, you are more likely to have a normal sample (central limit theorem). If you have a small sample, you are less likely to have a Shapiro-Wilk test tell you that you are deviating from normality.
Use your brain
leveneTest
If the variances are not equal,
remove var.equal=TRUE
in:
t.test(*dependent_first*
~
*independent*, name_data, var.equal=TRUE)
It automatically
runs a Welch t-test.hist(ana$grade)
hist(bern$grade)
ana <- harpo %>% filter(tutor == "Anastasia")
bern <- harpo %>% filter(tutor == "Bernadette")
shapiro.test(ana$grade)
##
## Shapiro-Wilk normality test
##
## data: ana$grade
## W = 0.98186, p-value = 0.9806
shapiro.test(bern$grade)
##
## Shapiro-Wilk normality test
##
## data: bern$grade
## W = 0.96908, p-value = 0.7801
Make two histograms to see if the variances look about the same
Compare the standard deviations across the groups.
Run a test: There are a few different tests out there, but we’ll use Levene’s test, which is most frequently used in the literature.
H0 = sigma1 = sigma2
Ha = sigma1 ≠ sigma2
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
leveneTest(grade ~ tutor, data=harpo)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 2.1287 0.1546
## 31
If not equal, run Welch.
_ The standard error of the difference between two sample means is calculated differently
And degrees of freedom are calculated as (and don’t have to be an integer value):
Same as student t-test, except: independentSamplesTTest(formula = grade ~ tutor, data=harpo, var.equal=TRUE)
independentSamplesTTest(formula = grade ~ tutor, data=harpo)
##
## Welch's independent samples t-test
##
## Outcome variable: grade
## Grouping variable: tutor
##
## Descriptive statistics:
## Anastasia Bernadette
## mean 74.533 69.056
## std dev. 8.999 5.775
##
## Hypotheses:
## null: population means equal for both groups
## alternative: different population means in each group
##
## Test results:
## t-statistic: 2.034
## degrees of freedom: 23.025
## p-value: 0.054
##
## Other information:
## two-sided 95% confidence interval: [-0.092, 11.048]
## estimated effect size (Cohen's d): 0.724
OR
t.test(grade ~ tutor, data=harpo)
##
## Welch Two Sample t-test
##
## data: grade by tutor
## t = 2.0342, df = 23.025, p-value = 0.05361
## alternative hypothesis: true difference in means between group Anastasia and group Bernadette is not equal to 0
## 95 percent confidence interval:
## -0.09249349 11.04804904
## sample estimates:
## mean in group Anastasia mean in group Bernadette
## 74.53333 69.05556
Descriptive Statistics
Description of the null hypothesis
A “stat” block
The results are interpreted
On average, Anastasia’s students performed better
(M = 74.5, SD = 9.0
) than Bernadette’s students
(M = 69.1, SD = 5.77
). We conducted a t-test to test
whether the means of the two TAs was significantly different. This
difference, 5.4, was significant t(31) = 2.115, p = 0.043
.
Based on this result, we can conclude that Anastasia’s student performed
significantly better on average than Bernadette’s students.
These tests are the non-parametric equivalent of the independent t-test, meaning that you can utilize this test if the data are NOT normal.
Use to test differences between two conditions in which different participants have been used.
library(readr)
The tests in this lecture work on the principle of ranking the data for each group:
Lowest score = a rank of 1,
Next highest score = a rank of 2, and so on.
Tied ranks are given the same rank: the average of the potential ranks.
For an unequal group size
For an equal group size
Add up the ranks for the two groups and take the lowest of these sums to be our test statistic.
The analysis is carried out on the ranks rather than the actual data.
Let’s use the data: zirconium_content.csv
# Read in the data from the CSV file
zirconium_content <- read.csv("zirconium_content.csv", header = FALSE, sep = "\t")
head(zirconium_content)
## V1
## 1 zirconium_content color
## 2 131.5\t0
## 3 131.5\t0
## 4 131.6\t0
## 5 131.7\t0
## 6 131.8\t0
# Separate the single column into two columns
zirconium_content <- separate(zirconium_content, col = 1, into = c("zirconium_content", "color"), sep = "\t")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
head(zirconium_content)
## zirconium_content color
## 1 zirconium_content color <NA>
## 2 131.5 0
## 3 131.5 0
## 4 131.6 0
## 5 131.7 0
## 6 131.8 0
zirconium_content <- zirconium_content[-1,]
# View the resulting data frame
zirconium_content
## zirconium_content color
## 2 131.5 0
## 3 131.5 0
## 4 131.6 0
## 5 131.7 0
## 6 131.8 0
## 7 131.9 0
## 8 132.1 0
## 9 138.4 0
## 10 138.6 0
## 11 138.8 0
## 12 139.1 0
## 13 140.3 0
## 14 140.9 0
## 15 144.4 0
## 16 145.5 0
## 17 145.5 0
## 18 146.8 0
## 19 128.2 1
## 20 130.1 1
## 21 130.3 1
## 22 131.5 1
## 23 132.6 1
## 24 135.1 1
## 25 135.2 1
## 26 135.7 1
## 27 135.9 1
## 28 136.2 1
## 29 136.8 1
## 30 136.9 1
## 31 137 1
## 32 138.9 1
## 33 139.2 1
## 34 139.7 1
## 35 140.1 1
## 36 142.2 1
## 37 142.2 1
sapply(zirconium_content, class)
## zirconium_content color
## "character" "character"
# Convert the zirconium_content column to numeric
zirconium_content <- mutate(zirconium_content, zirconium_content = as.numeric(zirconium_content))
# Convert the color column to a factor
zirconium_content <- mutate(zirconium_content, color = factor(color))
#Normal within group
hist(zirconium_content$zirconium_content)
leveneTest(zirconium_content ~ color, data=zirconium_content)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 3.2741 0.07923 .
## 34
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
_Levene’s Test for Homogeneity of Variance compares the variance of
the zirconium_content
values between the two levels of the
color
factor. The Df
value indicates the
degrees of freedom for the test, which is one in this case
since there are two groups being compared. The F value
is
the test statistic, which is used to test the null hypothesis that the
variances are equal. The Pr(>F)
value is the p-value of
the test, which indicates the probability of obtaining the observed test
statistic under the null hypothesis of equal variances.
In this output, the p-value is 0.07923, which is greater than the significance level of 0.05. This indicates that there is no significant evidence to reject the null hypothesis of equal variances at the 5% level of significance. Therefore, we can assume that the variances of the two groups are equal.
Our test statistic W = sum of ranks – mean rank
N is sample size of one group
N_b = 17 #for black
N_g = 19 #for gray
#Mean_rank = N*(N+1)/2
Mean_rankblack <- 17*(18)/2
Mean_rankblack
## [1] 153
Mean_rankgray <- 19*(20)/2
Mean_rankgray
## [1] 190
Actual values we are getting subtract Mean_rank
Wblack <- 343-153
Wblack
## [1] 190
This all happens under the hood in R. Our test statistic is 190
Wgray <- 323-190
Wgray
## [1] 133
wilcox.test(zirconium_content ~ color, data=zirconium_content, paired=FALSE, exact=F)
##
## Wilcoxon rank sum test with continuity correction
##
## data: zirconium_content by color
## W = 190, p-value = 0.3748
## alternative hypothesis: true location shift is not equal to 0
The wilcox.test()
function is used to perform a Wilcoxon
rank sum test (also known as the Mann-Whitney U test), which is a
nonparametric test to compare the distribution of two independent
groups. The function takes several arguments to specify the variables
and parameters of the test:
zirconium_content ~ color
: specifies the variables to be
compared in the formula notation. In this case, we want to compare the
zirconium_content variable grouped by the levels of the color factor.
data = zirconium_content
: specifies the data frame that
contains the variables. paired = FALSE
: specifies that the
groups are independent (i.e., not paired). exact = FALSE
:
specifies whether to compute exact p-values or use asymptotic
approximations. In this case, we use the asymptotic approximation, which
is appropriate for larger sample sizes.
If there are ties, you need to say that ‘exact’ meaning to calculate an exact p-value is FALSE
The null hypothesis for the Wilcoxon rank sum test is that the two groups have the same distribution, or equivalently, that the location parameter of the two groups is the same. The mean of the ranks (medians) are equal to each other. The alternative hypothesis is that the two groups have different location parameters.
The test statistic W
is 190 and the p-value is
0.3748. We fail to reject the null
Zirconium in black obsidian artifacts (Mdn = 138.6) did not differ significantly from zirconium in gray obsidian (Mdn = 136.2), W = 190, p = .379.
_ Dependent t-test
Compares two means based on related data.
E.g., Data from the same people measured at different times.
Data from ‘matched’ samples.
Sometimes called: “Matched-samples” t-test
_ Are invisible people mischievous?
_ Manipulation
Placed participants in an enclosed community riddled with hidden cameras.
For first week participants normal behaviour was observed.
For the second week, participants were given an invisibility cloak.
_ Outcome
_ We will take the difference of the two scores for each participant. If there is no effect of the treatment (e.g., being invisible), then we expect the difference scores to be approximately 0, on average.
_ What is our null & alternative hypotheses?
Ho: mu = 0
Ha: mu ≠ 0
Focuses on difference scores
The sampling distribution is normally distributed. In the dependent t-test this means that the sampling distribution of the differences between scores should be normal, not the scores themselves.
Data are measured at least at the interval level. [same assumption as independent t-test]
Running the code in R (by hand)
data <- c("no_cloak" = 3, 1, 5, 4, 6, 4, 6, 2, 0, 5, 4, 5)
data2 <- c("cloak" = 4, 3, 6, 6, 8, 5, 5, 4, 2, 5, 7, 5)
#calculate difference scores
diff <- data - data2
mean(diff) #mean of differences
## [1] -1.25
sd(diff) #standard deviation of differences
## [1] 1.13818
t <- mean(diff) / (sd(diff)/sqrt(length(diff)))
#pt- area to the left on the t-distribution curve
pt(t, df=11)
## [1] 0.001460396
#2 sided test - need to multiply by 2
2*pt(t, df=11)
## [1] 0.002920793
#check assumption - normality
hist(diff)
shapiro.test(diff)
##
## Shapiro-Wilk normality test
##
## data: diff
## W = 0.91231, p-value = 0.2284
#run t-test
t.test(data, data2, paired=T)
##
## Paired t-test
##
## data: data and data2
## t = -3.8044, df = 11, p-value = 0.002921
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -1.9731653 -0.5268347
## sample estimates:
## mean difference
## -1.25
_ The Wilcoxon signed-rank test which utilizes ranks
Steps to creating ranks:
Rank the absolute value of the scores (1 for the smallest)
Then sum up the positive ranks
Sum up the negative ranks
Your test statistic is the smaller of the two sums
#Wilcoxon signed-rank test
wilcox.test(data, data2, paired=T, exact=F)
##
## Wilcoxon signed rank test with continuity correction
##
## data: data and data2
## V = 2.5, p-value = 0.01085
## alternative hypothesis: true location shift is not equal to 0