Steps of Performing the Analysis

The general steps we will undergo to conduct this analysis include:

A priori power analysis
Import the Data
Data Cleaning & Wrangling
Descriptive Statistics
Fit the model
Check whether model assumptions were met
Interpret model output
Calculate effect size & 95%CIs

Example: Today we will be analyzing data from a 1978 study on the effect of anonymity on people’s tendency to cooperate with one another. The data is included in the {carData} package and is called Guyer. In this study, twenty participants were asked to participate in groups of four in a prisoner’s dilemma game (although only the participant was actually particiapting - the other three members of the group were actors). In this game, each group member gets to decide whether they want to “cooperate” or “defect”.

If everyone in the group chooses to cooperate with each other, each person in the group earns 15 dollars. If only one person in the group chooses to defect, the defector receives 20 dollars and the cooperators receive 10. If two people choose to defect, the defectors receive 15 dollars and the two cooperators receive 5. If three people defect, they each receive 10 dollars and the single cooperator receives 0. If everyone chooses to defect, they each receive 5 dollars.

Each group made these decisions either publicly or anonymously. The groups in the public condition introduced themselves to each other at the start of the experiment and learned about the other group members’ choices at the end. In the anonymous condition, the group members were never introduced to one another and never learned of the others’ choices. Higher cooperation scores indicate that the members of the group chose to cooperate a greater number of times.

The researcher wants to test whether or not decisions being made publicly versus anonymously has an effect on people’s tendency to cooperate with one another.

Model Comparison

The model corresponding to the null hypothesis, Model C, is:

\[Model C: Cooperation_i = \beta_0 + \epsilon_i\]

The model corresponding to the alternative hypothesis, Model A, is: \[Model A: Cooperation_i = \beta_0 + \beta_1*Condition + \epsilon_i\] Where condition is a categorical predictor with 2 levels:

Public
Anonymous

Hypotheses

The null hypothesis is that which condition people were in (public vs anonymous) will not predict people’s cooperation scores (aka, the slope of the model will equal 0):

\[H_0: \beta_1\ = 0\] The alternative hypothesis states that the slope of the model will be different from zero (aka, which condition people were in will predict their cooperation scores):

\[H_1: \beta_1\ \neq 0\]

A priori power analysis

Before a researcher conducts a study, it is wise to perform an a priori power analysis for the researcher to determine the sample size they need to collect to achieve a desired level of power. The pwrss.f.reg function from the pwrss package can be used to perform power analyses when using regression models to test one’s hypotheses. Typically, the minimum power researchers desire to achieve is 80% (i.e., if the null hypothesis is false, there will be an 80% chance of the researcher correctly detecting an effect).

To use this function for a power analysis, four of the five following arguments need to be specified by the researcher. The value that the researcher wants to solve for should be set to NULL:

r2 = the estimated r-squared effect size
m = the number of predictors being tested in Model A
k = the total number of predictors in Model A
alpha = 0.05 (typically, in psychology we use an alpha of .05)
n = sample size
- Set to NULL when you want to solve for the sample size needed to achieve a desired level of power (called an a priori power analysis).
power = The desired power level
- Set to NULL when you want to solve for the power you achieved with a given sample size (called a post-hoc power analysis).

Cohen’s (1988) Conventions for \(R^2\) Effect Sizes:

\(R^2\) = .02 is a small effect size
\(R^2\) = .13 is a medium effect size
\(R^2\) = .26 is a large effect size

Ideally, a researcher would develop a good idea of the expected effect size based on previous findings for their topic area. If the researcher does not have a predicted effect size based on previous literature, then the effect size conventions listed above can be used instead. (For example, the researcher could state that they want to calculate the sample size that would be necessary to detect even a small effect size).

For this example, let’s say the previous literature suggests there should be a large effect of condition on people’s cooperation. A large effect would be considered an \(R^2 = 0.26\). We want to solve for the sample size that is needed to have an 80% chance (power = 0.80) of detecting a large effect of condition in the above model comparison. We will set n to NULL because the sample size is what we want to solve for. (Note: Only one of the options can be set to NULL).

r2 = .26
- The estimated effect size (large effect, based on Cohen’s conventions)
m = 1
- We want to test the significance of a single predictor
k = 1
- There is only one predictor in Model A
alpha = 0.05
- Willingness to make a Type I error is 5%
n = sample size
- Set to NULL so we can solve for the sample size
power = 0.80
- Minimum desired power level is typically 80%

pwrss.f.reg(r2 = 0.26, 
            m = 1, 
            k = 1, 
            alpha = 0.05,
            n = NULL,
            power = 0.80)

##  Linear Regression (F test) 
##  R-squared Deviation from 0 (zero) 
##  H0: r2 = 0 
##  HA: r2 > 0 
##  ------------------------------ 
##   Statistical power = 0.8 
##   n = 25 
##  ------------------------------ 
##  Numerator degrees of freedom = 1 
##  Denominator degrees of freedom = 22.417 
##  Non-centrality parameter = 8.579 
##  Type I error rate = 0.05 
##  Type II error rate = 0.2

The power analysis says we need 25 participants (if a fraction, always round up) in the study to have an 80% chance of detecting a significant, large effect of our categorical predictor in the model.

It’s ideal for the power analysis to be carried out prior to collecting one’s data. In the current scenario, we have data from a study that was conducted 45 years ago, so we can’t really tell the authors to go back and do an a priori power analysis. We can see now, though, that the original study, which had 20 total participants, was slightly underpowered for detecting a large effect (and greatly underpowered for detecting anything smaller than a large effect!)

Now, let’s get to analyzing our hypothesis.

Import the data

data <- Guyer

# Look at first few rows
head(data)

##   cooperation condition    sex
## 1          49    public   male
## 2          64    public   male
## 3          37    public   male
## 4          52    public   male
## 5          68    public   male
## 6          54    public female

# Look at the entire dataset
View(data)

Data Cleaning & Wrangling

First, check that each variable was imported as the correct type.

str(data)

## 'data.frame':    20 obs. of  3 variables:
##  $ cooperation: num  49 64 37 52 68 54 61 79 64 29 ...
##  $ condition  : Factor w/ 2 levels "anonymous","public": 2 2 2 2 2 2 2 2 2 2 ...
##  $ sex        : Factor w/ 2 levels "female","male": 2 2 2 2 2 1 1 1 1 1 ...

Looks good. We can also look at descriptive statistics to see if we need to clean the data at all.

# General descriptive statistics
describe(data)

##             vars  n mean    sd median trimmed   mad min max range skew kurtosis
## cooperation    1 20 48.3 14.29   46.5   47.69 15.57  27  79    52 0.31    -0.93
## condition*     2 20  1.5  0.51    1.5    1.50  0.74   1   2     1 0.00    -2.10
## sex*           3 20  1.5  0.51    1.5    1.50  0.74   1   2     1 0.00    -2.10
##               se
## cooperation 3.19
## condition*  0.11
## sex*        0.11

# Frequency tables
tab1(data$cooperation)

## data$cooperation : 
##         Frequency Percent Cum. percent
## 27              1       5            5
## 29              1       5           10
## 30              1       5           15
## 34              1       5           20
## 37              1       5           25
## 39              1       5           30
## 40              1       5           35
## 41              1       5           40
## 44              2      10           50
## 49              1       5           55
## 52              2      10           65
## 54              1       5           70
## 58              1       5           75
## 61              1       5           80
## 64              2      10           90
## 68              1       5           95
## 79              1       5          100
##   Total        20     100          100

tab1(data$condition)

## data$condition : 
##           Frequency Percent Cum. percent
## anonymous        10      50           50
## public           10      50          100
##   Total          20     100          100

# Visualizations
hist(data$cooperation)

ggplot(data) +
  geom_bar(aes(x = condition))

Everything looks good! The data is also already in the form we need it to be in (the IV and the DV are each their own columns).

Descriptive Statistics

Now, we can get descriptive statistics that we would be interested in including in a write-up of this data. For instance, we would probably want to include a table that gives the mean (M) and standard deviation (SD) on cooperation for the public and anonymous conditions. You can certainly provide even more descriptive statistics than just these, though.

data %>%
  group_by(condition) %>%
  summarise(mean = mean(cooperation),
            sd = sd(cooperation))

## # A tibble: 2 × 3
##   condition  mean    sd
##   <fct>     <dbl> <dbl>
## 1 anonymous  40.9  9.42
## 2 public     55.7 14.8

We can make this table look a little nicer in our output by storing it in an object and then passing the table to the kable function from the knitr package.

The kable function lets us customize:

The table title (caption =)
Column names (col.names =)
The number of digits to report (digits =)
Adding format.args = list(nsmall = 2)) ensures that two decimals are reported even when the second decimals is a trailing zero

descriptives_table <- data %>%
  group_by(condition) %>%
  summarise(mean = mean(cooperation),
            sd = sd(cooperation))

descriptives_table %>%
  knitr::kable(digits = 2, col.names = c("Condition", "Mean", "SD"), caption = "Descriptive Statistics for Cooperation in the Public and Anonymous Conditions", format.args = list(nsmall = 2))

Descriptive Statistics for Cooperation in the Public and Anonymous Conditions
Condition	Mean	SD
anonymous	40.90	9.42
public	55.70	14.85

Fit the Model

We want to fit a model that predicts cooperation from condition. But first we need to 1) Convert condition to a factor if it isn’t already a factor, and 2) Assign codes to each level of condition.

class(data$condition)

## [1] "factor"

The categorical predictor, condition, is already a factor. If we needed to transform the categorical predictor to a factor, though, we would use:

data$condition <- as.factor(data$condition)

Now, we need to choose a coding scheme and assign numerical values to each level of condition. As we discussed in lecture, contrast codes are usually the preferred coding method. Let’s assign contrast codes to the levels of condition.

# check how the variable is currently coded and see the order of the levels
contrasts(data$condition)

##           public
## anonymous      0
## public         1

# assign new contrast codes to the levels of condition
contrasts(data$condition) <- c(-1/2,1/2)

# see how the codes have been updated
contrasts(data$condition)

##           [,1]
## anonymous -0.5
## public     0.5

Now that the categorical predictor is coded, we can fit our linear model using it as a predictor. To fit our linear model, we’ll use the lm() function. This function uses the following general format:

\[model <- lm(Y \sim X1 + X2 + ... + Xp, data = data)\]

Where the outcome variable, Y, is on the left side of the tilde, ~, and the predictor variables are on the right side.

model <- lm(cooperation ~ condition, data = data)

Check whether model assumptions were met

Recall that we are making three assumptions about the model’s errors (aka, the model’s residuals):

Errors are normally distributed
Independence of errors
Homogeneity of variance

Let’s check whether or not the model’s residuals meet each of these assumptions.

Checking whether errors are normally distributed

Distribution of the Residuals

There are two ways to check this assumption. First, we could look at a distribution of the model’s residuals.

To graph the model’s residuals, we first need to obtain the residuals and store them in an object. We can obtain the model’s residuals by passing the model to the augment function from the broom package. This results in a table with columns corresponding to our model, like the value the model predicts for each participant (.fitted) and the residual for each participant (.resid).

We’ll be graphing the values in the .resid column.

# storing table containing residuals
augment_model <- augment(model)
augment_model

## # A tibble: 20 × 8
##    cooperation condition .fitted  .resid  .hat .sigma    .cooksd .std.resid
##          <dbl> <fct>       <dbl>   <dbl> <dbl>  <dbl>      <dbl>      <dbl>
##  1          49 public       55.7  -6.7     0.1   12.7 0.0179       -0.568  
##  2          64 public       55.7   8.3     0.1   12.6 0.0275        0.704  
##  3          37 public       55.7 -18.7     0.1   11.9 0.140        -1.59   
##  4          52 public       55.7  -3.70    0.1   12.8 0.00547      -0.314  
##  5          68 public       55.7  12.3     0.1   12.4 0.0604        1.04   
##  6          54 public       55.7  -1.70    0.1   12.8 0.00115      -0.144  
##  7          61 public       55.7   5.3     0.1   12.7 0.0112        0.449  
##  8          79 public       55.7  23.3     0.1   11.3 0.217         1.98   
##  9          64 public       55.7   8.3     0.1   12.6 0.0275        0.704  
## 10          29 public       55.7 -26.7     0.1   10.8 0.285        -2.26   
## 11          27 anonymous    40.9 -13.9     0.1   12.3 0.0771       -1.18   
## 12          58 anonymous    40.9  17.1     0.1   12.0 0.117         1.45   
## 13          52 anonymous    40.9  11.1     0.1   12.5 0.0492        0.941  
## 14          41 anonymous    40.9   0.100   0.1   12.8 0.00000399    0.00848
## 15          30 anonymous    40.9 -10.9     0.1   12.5 0.0474       -0.924  
## 16          40 anonymous    40.9  -0.900   0.1   12.8 0.000323     -0.0763 
## 17          39 anonymous    40.9  -1.90    0.1   12.8 0.00144      -0.161  
## 18          44 anonymous    40.9   3.10    0.1   12.8 0.00384       0.263  
## 19          34 anonymous    40.9  -6.90    0.1   12.7 0.0190       -0.585  
## 20          44 anonymous    40.9   3.10    0.1   12.8 0.00384       0.263

# plotting histogram of residuals
ggplot(data = augment_model, aes(x = .resid)) + 
  geom_density(fill = "purple") + # histogram of the residuals
  stat_function(linetype = 2, # a normal distribution overlaid on top for reference
                fun = dnorm, 
                args = list(mean = mean(augment_model$.resid), 
                                sd   =   sd(augment_model$.resid)))

They look pretty normally distributed!

Q-Q Plot

Another way to check whether the errors are normally distributed is by looking at a Q-Q plot:

ggplot(model) +
  geom_abline() + 
  stat_qq(aes(sample = .stdresid))

To meet the assumption, the points should be close to lying on the diagonal reference line. Looks like they do!

Checking independence of errors

Ensuring that each of the participants in the sample (and thus their errors) are independent of each other should occur during data collection.

To test for non-independence when the research design does not inform you, though, you can look at a plot of the residuals with a variable that the errors should not correspond to (e.g., ID number, time).

Let’s check whether we have violated the assumption of independence of errors by plotting residuals against ID numbers for the model.

# create an ID column (let's assume the participants are listed in the order in which they participated in the study)
data$ID <- c(1:nrow(data))

# add the ID column to the table containing the residuals
augment_model$ID <- data$ID

# plot the ID numbers against the residuals
ggplot(data = augment_model, aes(x = ID, y = .resid)) + 
  geom_point() +  
  geom_smooth(se = F) +
  geom_hline(yintercept = 0)

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

We want there to not be a systematic pattern in the relationship between the residuals and a variable the residuals should not be related to, like ID.

Checking homogeneity of variance

Homogeneity of variance is the assumption that the variance of an outcome is approximately the same across all values of the predictor(s). We can use a residuals plot to examine whether our model has met this assumption. To obtain a residuals plot, pass the model to the plot() function and request the firs plot.

plot(model, 1)

We are looking to see that the amount of spread in the data at each level of condition is approximately the same.

If you are still unsure after examining whether this assumption has been met visually, you can use a test of the homogeneity of variances assumption, like Levene’s test. We can run this test using the leveneTest() function from the car package.

leveneTest(cooperation ~ condition, data = data)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1    1.87 0.1883
##       18

Levene’s test is a test of whether the variance in each group is equal. We do not want the p-value to be less than .05 because that would indicate we have violated the assumption of homogeneity of variances. In this case, the p-value is greater than .05, so we have met the assumption of homogeneity of variances.

We have met the model assumptions. Now, we can finally move onto interpreting our model’s output.

Interpreting the Model Output

First, let’s pass the model to the summary() function.

summary(model)

## 
## Call:
## lm(formula = cooperation ~ condition, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -26.70  -6.75  -0.40   8.30  23.30 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   48.300      2.780  17.372 1.08e-12 ***
## condition1    14.800      5.561   2.661   0.0159 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.43 on 18 degrees of freedom
## Multiple R-squared:  0.2824, Adjusted R-squared:  0.2425 
## F-statistic: 7.084 on 1 and 18 DF,  p-value: 0.0159

Let’s unpack the output:

Estimate
- b0 = 48.30, the model’s y-intercept
- b1 = 14.80, the model’s slope

Question: Can you interpret the meaning of each parameter estimate in the context of the current research scenario?

Answer: b0 is the mean of the group means, as well as the grand mean (because the group sizes are equal)

# mean of group means
descriptives_table

## # A tibble: 2 × 3
##   condition  mean    sd
##   <fct>     <dbl> <dbl>
## 1 anonymous  40.9  9.42
## 2 public     55.7 14.8

(40.9+55.7)/2

## [1] 48.3

# grand mean
mean(data$cooperation)

## [1] 48.3

b1 is the difference between the mean of the group coded +1/2 (Public) minus the mean of the group coded -1/2 (Anonymous)

descriptives_table

## # A tibble: 2 × 3
##   condition  mean    sd
##   <fct>     <dbl> <dbl>
## 1 anonymous  40.9  9.42
## 2 public     55.7 14.8

55.7-40.9

## [1] 14.8

Std. Error
- We’re typically most interested in testing the significance of the predictor in our model (rather than the y-intercept). For that reason, let’s just focus on the standard error for b1, which is \(\sqrt(MSE/SS_x)\)
t-value
- \(t = b1/SE_{b1}\) = 14.80/5.561 = 2.661
Pr(>|t|)
- p = .016
- Condition was a significant predictor of participants’ cooperation scores
Residual standard error = 12.43
- \(\sqrt(SSE(A)/df_{Error})\)
- A description of the average error, aka, the average amount by which the model was off in trying to predict participant’s actual scores
Multiple R-squared = .2824
- The model including condition as a predictor accounted for approximately 28% of the variance in participants’ cooperation scores
Adjusted R-squared = .2425
- Multiple R-squared systematically overestimates the true proportion of variance accounted for by our model. Adjusted R-squared adjusts the estimate by penalizing the estimate for the number of predictors in the model (see lecture slides for the formula).
F-statistic and p-value
- F(1, 18) = 7.084, p = .016
- This F-statistic corresponds to a test of the significance of the overall model, which will differ from the test of the significance of individual predictors in the model when we have more than one predictor in the model

Second, let’s look at our model output using the Anova() function:

Anova(model)

## Anova Table (Type II tests)
## 
## Response: cooperation
##           Sum Sq Df F value Pr(>F)  
## condition 1095.2  1  7.0836 0.0159 *
## Residuals 2783.0 18                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sum Sq
- SSR = 1095.2
  - df_Reduced = 1
  - MSR (not shown) = SSR/df_Reduced = 1095.2
- SSE(A) = 2783.0
  - df_Error = 18
  - MSE (not shown) = SSE(A)/df_Error = 154.61
F value
- F(1, 18) = 7.08
- Remember that F = MSR/MSE = 1095.2/154.61 = 7.08
Pr(>F)
- p = .016
- Condition was a significant predictor of participants’ cooperation scores

Note: The output using Anova() is mostly redundant with the output using summary(). The benefit of also looking at the Anova() output is that it additionally shows the SSR and SSE(A), so we can see how the calculations we are performing “by-hand” in lecture are reflected in the analysis that we perform in R.

Effect Size & Confidence Intervals

Effect Size

Eta-squared is a commonly reported measure of effect size for each predictor in one’s model. Eta-squared is equal to the SS for a particular predictor divided by the total SS. It can be interpreted as the proportion of variance in the outcome variable that’s associated with a particular predictor in the model.

etaSquared(model)

##              eta.sq eta.sq.part
## condition 0.2823991   0.2823991

Once we start introducing more than one predictors into our models, we will discuss the difference between eta-squared and partial eta-squared.

Confidence Intervals

To obtain confidence intervals around each of your model’s parameters, pass the model to the confint() function:

confint(model)

##                 2.5 %   97.5 %
## (Intercept) 42.458622 54.14138
## condition1   3.117245 26.48276

For b0: 95%CI[42.46, 54.14]
For b1: 95%CI[3.12, 26.48]

See whether you can interpret the meaning of each of these 95%CIs in the context of the current research scenario.

APA-Style Summary

In this study, we examined whether people making a decision publicly or anonymously had an effect on their tendency to cooperate with one another. Using a linear regression analysis, we found that people who made their decision publicly (M = 55.70, SD = 14.80) cooperated significantly more with others compared to people who made their decision anonymously (M = 40.90, SD = 9.42), F(1, 18) = 7.08, p = .016, \(R^2\) = 0.28, b1 = 14.80, 95%CI[3.12, 26.48].

Linear Regression with a Single Categorical Predictor with 2 Independent Levels