All analyses were conducted in R (v4.2.2; R Core Team, 2022) using the “psych” (vX.X; Author, year). “effectsize” (version, author, year).

Set up

Today we’re going to talk about effect sizes and power Let’s first set up our environment. Don’t forget to set your working directory You will need to install two new packages today: pwr and effectsize

library(pwr)
library(effectsize)

Let’s read in our data. We will use the same data set we used last week

MyData <- read.csv("Week 5 Favorability Data.csv")

Effect size

You may recall that we’ve talked about t-tests and how the ‘effect size’ we get from a t-test is technically uninterpretable because it’s essentially just a t-value. In order to make this more meaningful, we can convert that t-score into an effect size, Cohen’s d. This allows us to use ‘rules of thumb’ to get a sense of the magnitude of the effect.

Recalling t-tests from last class

We will use our paired samples t-test from last class as an example Last week, we simply used the function t.test() and got our results. That is totally okay, but remember that it’s always best to save them as an object in case we want to use it again later and don’t want to retype the whole function. Let’s save it as ‘model1’. You’ll see why soon.

One-Sample t-test

Let’s start with the one-sample t-test

model1 <- t.test(MyData$demFav,mu=2.5)
print(model1)
## 
##  One Sample t-test
## 
## data:  MyData$demFav
## t = -10.473, df = 2522, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 2.5
## 95 percent confidence interval:
##  2.301568 2.364155
## sample estimates:
## mean of x 
##  2.332861
library(report)
report(model1)
## Warning: Missing values detected. NAs dropped.
## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The One Sample t-test testing the difference between MyData$demFav (mean =
## 2.33) and mu = 2.5 suggests that the effect is negative, statistically
## significant, and small (difference = -0.17, 95% CI [2.30, 2.36], t(2522) =
## -10.47, p < .001; Cohen's d = -0.21, 95% CI [-0.25, -0.17])

t(2522)= -10.47, p< .001; Cohen’s d = -0.21, 95% CI [-.25, -.17]

We have our t-score, but that doesn’t tell us anything about the magnitude of the effect we’re trying to observe. That’s where Cohen’s d comes in handy.

To extract Cohen’s d from a t-value, we will use the cohens_d() function using the effectsize package.

cohens_d(model1)
## Warning: Missing values detected. NAs dropped.
## Cohen's d |         95% CI
## --------------------------
## -0.21     | [-0.25, -0.17]
## 
## - Deviation from a difference of 2.5.

Independent Samples t-test

Now let’s look at our independent samples t-test Remember that we first have to isolate the variables we want. In this case, we only want to look at Democrats and Republicans and see how they vary in how they feel about the Republican party. And remember we have to set the var.equal argument to FALSE because when we tested the variance last week, we found that they are significantly different form one another. And we also have to get the SD for each group since the t-test model only gives the means

df1 <- subset(MyData,MyData$PARTY=="Republican"|
                MyData$PARTY=="Democrat") # Subsetting variables of interest
model2 <- t.test(repFav~PARTY,data=df1,var.equal= F) # Running t-test
print(model2)
## 
##  Welch Two Sample t-test
## 
## data:  repFav by PARTY
## t = -41.05, df = 1206.8, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Democrat and group Republican is not equal to 0
## 95 percent confidence interval:
##  -1.302398 -1.183584
## sample estimates:
##   mean in group Democrat mean in group Republican 
##                 1.495334                 2.738325
aggregate(repFav~PARTY, df1, sd) # Getting SD for groups
##        PARTY    repFav
## 1   Democrat 0.5296485
## 2 Republican 0.6117943
report(model2)
## Warning: Unable to retrieve data from htest object. Returning an approximate
##   effect size using t_to_d().
## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The Welch Two Sample t-test testing the difference of repFav by PARTY (mean in
## group Democrat = 1.50, mean in group Republican = 2.74) suggests that the
## effect is negative, statistically significant, and large (difference = -1.24,
## 95% CI [-1.30, -1.18], t(1206.84) = -41.05, p < .001; Cohen's d = -2.36, 95% CI
## [-2.51, -2.22])

And now we extract Cohen’s d

cohens_d(model2)
## Warning: Unable to retrieve data from htest object. Returning an approximate
##   effect size using t_to_d().
## d     |         95% CI
## ----------------------
## -2.36 | [-2.51, -2.22]

Paired-Samples t-test

Now we’ll look back at our paired-samples t-test. In this case, the var.equal argument is set to TRUE because our variance analysis last week revealed that the two means were not significantly different from each other. We also have to remember to set the paired argument to TRUE, otherwise, R will automatically run it as an independent-samples t-test. And don’t forget to extract the Cohen’s d

model3 <- t.test(MyData$demFav,MyData$repFav,var.equal= T,paired= T)
print(model3)
## 
##  Paired t-test
## 
## data:  MyData$demFav and MyData$repFav
## t = 11.935, df = 2510, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.2835196 0.3950007
## sample estimates:
## mean difference 
##       0.3392602
cohens_d(model3)
## Warning: Missing values detected. NAs dropped.
## Cohen's d |       95% CI
## ------------------------
## 0.24      | [0.20, 0.28]
report(model3)
## Warning: Missing values detected. NAs dropped.
## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The Paired t-test testing the difference between MyData$demFav and
## MyData$repFav (mean difference = 0.34) suggests that the effect is positive,
## statistically significant, and small (difference = 0.34, 95% CI [0.28, 0.40],
## t(2510) = 11.93, p < .001; Cohen's d = 0.24, 95% CI [0.20, 0.28])

still need to descripe with mean and sd for each variable include in t test for independent and paired EX: A paired-samples t-test showed a significant difference between Democratic politicans’ favorability (M = 2.33 , SD = .8) and Republican politicans’ favorability (M = 1.99 , SD = .8) t(2510) = 11.93, p < .001; Cohen’s d = 0.24, 95% CI [0.20, 0.28].

Now, to report the means and SD for each “group”, you can just use the describe() function

library(psych)
## 
## Attaching package: 'psych'
## The following object is masked from 'package:effectsize':
## 
##     phi
describe(MyData$demFav)
##    vars    n mean  sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 2523 2.33 0.8   2.44    2.35 0.93   1   4     3 -0.21    -1.09 0.02
describe(MyData$repFav)
##    vars    n mean  sd median trimmed  mad min max range skew kurtosis   se
## X1    1 2514 1.99 0.8      2    1.94 0.99   1   4     3 0.42    -0.75 0.02

Power

Annie shared the link to G*Power. However, you can also estimate power in R To do this, we will use the pwr package

POWER TELLS HOW MANY PEOPLE ARE NEEDED?

WANT POWER OF AT LEAST .8 AND ALWAYS DO 2 SIDED

Estimating power for t-tests (one sample, 2 samples, or paired)

pwr.t.test(d = .3, sig.level = .05, power = .8, type = "two.sample", 
           alternative = "two.sided")
## 
##      Two-sample t test power calculation 
## 
##               n = 175.3847
##               d = 0.3
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

WOULD NEED A TOTAL OF 352 PEOPLE - 176 IN EACH GROUP

Leaving the code as is, let’s us know that if we want to find an effect size of at least (AKA, Cohen’s d) .3 (since that’s a medium size effect) with enough power in a two samples t-test (two-tailed), we would need a sample size of N = 179 participants. I’m only giving you the argument for two samples because as Annie said, we wouldn’t normally want to do a one-sample since we rarely know the population mean.

Estimating power for t-tests (two samples with unequal n)

You can also calculate power AFTER data collection. In this case, you already have your sample sizes in each group. It’s possible that you’ll have unequal samples sizes, but you want to know, based on your sample size, how much power you have to detect an effect size

pwr.t2n.test(n1 = 80, n2 = 64, d = .3, sig.level = .05)
## 
##      t test power calculation 
## 
##              n1 = 80
##              n2 = 64
##               d = 0.3
##       sig.level = 0.05
##           power = 0.4274097
##     alternative = two.sided

You can see that that’s very low power. Ideally, you’ll have a larger sample size. Also, FYI, you won’t need this particular function for Problem Set 3. This is if you ever encounter a situation when you want to know your power after you’ve collected data.

Estimating power for correlations

pwr.r.test(r = .3, sig.level = .05, power = .8)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 84.07364
##               r = 0.3
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

So, to get a correlation of at least .3, we need at least 85 participants

A power analysis revealed that our target sample size needs to be 85 participants to detect a medium-size effect (r = .3) with .8 power.

Additional info

If you want more ways to estimate power in R, you can look through the documentation of the power package. Just type ?pwr in the console to access the documentation