For today’s lab, we’ll be using the data set called
disagreeableness.csv on Canvas.
Example: Let’s say a researcher is interested in whether people’s relationship statuses are related to how disagreeable they are. The researcher asks 162 participants whether they are single (1), dating (2), or seriously dating (3) and also measures their disagreeableness (sample item: “I tend to find fault with others.”).
One of the researcher’s hypotheses that they would like to test is that people who are single will score significantly differently on disagreeableness compared to people who are not single.
data <- import("disagreeableness.csv")
First, let’s inspect each variable’s measure type & correct any, if needed.
str(data)
## 'data.frame': 162 obs. of 3 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ RelationshipStatus: int 3 3 2 3 3 3 2 2 2 2 ...
## $ Disagreeableness : int 5 4 7 6 5 6 6 5 5 5 ...
ID & RelationshipStatus should be
transformed into factorsdata <- data %>%
mutate(ID = as.factor(ID),
RelationshipStatus = factor(RelationshipStatus, labels = c("Single", "Dating", "Seriously Dating")))
We can use descriptive statistics to check for any data entry errors.
describe(data)
## vars n mean sd median trimmed mad min max range
## ID* 1 162 81.50 46.91 81.5 81.50 60.05 1 162 161
## RelationshipStatus* 2 162 1.88 0.88 2.0 1.85 1.48 1 3 2
## Disagreeableness 3 162 5.29 1.18 6.0 5.38 1.48 1 7 6
## skew kurtosis se
## ID* 0.00 -1.22 3.69
## RelationshipStatus* 0.24 -1.68 0.07
## Disagreeableness -0.93 1.10 0.09
tab1(data$RelationshipStatus)
## data$RelationshipStatus :
## Frequency Percent Cum. percent
## Single 74 45.7 45.7
## Dating 34 21.0 66.7
## Seriously Dating 54 33.3 100.0
## Total 162 100.0 100.0
hist(data$Disagreeableness)
Let’s start with descriptively examining the M and SD on
disagreeableness across the three levels of
RelationshipStatus.
descriptives <- data %>%
group_by(RelationshipStatus) %>%
summarize(M = mean(Disagreeableness, na.rm = TRUE),
SD = sd(Disagreeableness, na.rm = TRUE))
# Making the table look a little nicer
descriptives %>%
knitr::kable(digits = 5, col.names = c("Relationship Status", "Mean", "SD"), caption = "Descriptive Statistics for Disagreeableness Across Each Relationship Status", format.args = list(nsmall = 2))
| Relationship Status | Mean | SD |
|---|---|---|
| Single | 5.16216 | 1.23895 |
| Dating | 5.70588 | 0.87141 |
| Seriously Dating | 5.20370 | 1.23441 |
# And we can plot the descriptives
ggplot(descriptives, aes(x = RelationshipStatus, y = M, fill = RelationshipStatus)) +
geom_bar(stat = "identity") +
ggtitle("Average Disagreeableness Across Relationship Statuses") +
labs(x = "Relationship Status",
y = "Mean Disagreeableness") +
theme(plot.title = element_text(hjust = 0.5))
Recall that the number of codes that need to be included in the model
is equal to m-1, or the number of levels of the categorical predictor
minus 1. In this case, RelationshipStatus has three levels,
thus, we need to construct 2 codes to represent it in
the model. We will go with the recommended contrast
coding method.
Recall that the researcher had a hypothesis that people who are
single will score significantly differently on disagreeableness compared
to people who are not single. To test this hypothesis, let’s use
contrast codes that compare the mean of the Single
condition to the average across the Dating and
Seriously Dating conditions. For instance, we could
use:
Then, we just need to construct RelCode2 so that it meets the rules of contrast coding in combination with RelCode1. One option would be:
RelCode1 <- c(2/3, -1/3, -1/3)
RelCode2 <- c(0, 1/2, -1/2)
contrasts(data$RelationshipStatus) <- cbind(RelCode1, RelCode2)
contrasts(data$RelationshipStatus)
## RelCode1 RelCode2
## Single 0.6666667 0.0
## Dating -0.3333333 0.5
## Seriously Dating -0.3333333 -0.5
RelCode1 makes the comparison that the
researcher would like to test. Knowing this, we can construct our
desired model comparison.The model corresponding to the null hypothesis, Model C, is:
\[Model C: Disagreeableness_i = \beta_0 + \beta_2*RelCode2 + \epsilon_i\]
The model corresponding to the alternative hypothesis, Model A, is: \[Model A: Disagreeableness_i = \beta_0 + \beta_1*RelCode1 + \beta_2*RelCode2 + \epsilon_i\]
The null hypothesis is that the model parameter corresponding to
RelCode1 will be equal to zero.
\[H_0: \beta_1\ = 0\] The
alternative hypothesis is that the model parameter corresponding to
RelCode1 will not be equal to zero.
\[H_1: \beta_1\ \neq 0\]
Let’s fit the model predicting Disagreebleness from
RelationshipStatus, which we have just made custom contrast
codes for.
model <- lm(Disagreeableness ~ RelationshipStatus, data = data)
augment_model <- augment(model)
augment_model
## # A tibble: 162 × 8
## Disagreeableness RelationshipStatus .fitted .resid .hat .sigma .cooksd
## <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 5 Seriously Dating 5.20 -0.204 0.0185 1.17 0.000194
## 2 4 Seriously Dating 5.20 -1.20 0.0185 1.17 0.00678
## 3 7 Dating 5.71 1.29 0.0294 1.17 0.0127
## 4 6 Seriously Dating 5.20 0.796 0.0185 1.17 0.00297
## 5 5 Seriously Dating 5.20 -0.204 0.0185 1.17 0.000194
## 6 6 Seriously Dating 5.20 0.796 0.0185 1.17 0.00297
## 7 6 Dating 5.71 0.294 0.0294 1.17 0.000657
## 8 5 Dating 5.71 -0.706 0.0294 1.17 0.00378
## 9 5 Dating 5.71 -0.706 0.0294 1.17 0.00378
## 10 5 Dating 5.71 -0.706 0.0294 1.17 0.00378
## # ℹ 152 more rows
## # ℹ 1 more variable: .std.resid <dbl>
ggplot(data = augment_model, aes(x = .resid)) +
geom_density(fill = "purple") +
stat_function(linetype = 2,
fun = dnorm,
args = list(mean = mean(augment_model$.resid),
sd = sd(augment_model$.resid)))
ggplot(model) +
geom_abline() +
stat_qq(aes(sample = .stdresid))
data$ID <- c(1:nrow(data))
augment_model$ID <- data$ID
ggplot(data = augment_model, aes(x = ID, y = .resid)) +
geom_point() +
geom_smooth(se = F) +
geom_hline(yintercept = 0)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
plot(model, 1)
leveneTest(Disagreeableness ~ RelationshipStatus, data = data)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.7548 0.06666 .
## 159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First, let’s pass the model to the summary()
function.
summary(model)
##
## Call:
## lm(formula = Disagreeableness ~ RelationshipStatus, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2037 -0.7059 0.2941 0.8378 1.8378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.35725 0.09672 55.389 <2e-16 ***
## RelationshipStatusRelCode1 -0.29263 0.18691 -1.566 0.1194
## RelationshipStatusRelCode2 0.50218 0.25628 1.960 0.0518 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.171 on 159 degrees of freedom
## Multiple R-squared: 0.03324, Adjusted R-squared: 0.02108
## F-statistic: 2.734 on 2 and 159 DF, p-value: 0.06803
Question: Can you interpret the meaning of each parameter estimate in the context of the current research scenario?
Question: Is the parameter estimate corresponding to
a test of the difference between the mean of the Single
condition and the average of the Dating and
Seriously Dating conditions significant or
non-significant?
We might also be interested in whether
RelationshipStatus overall mattered for
predicting scores on Disagreeableness. In other words, we
could also wish to examine the following model comparison:
\[Model C: Disagreeableness_i = \beta_0 + \epsilon_i\]
\[Model A: Disagreeableness_i = \beta_0 + \beta_1*RelCode1 + \beta_2*RelCode2 + \epsilon_i\]
The null hypothesis would be that \(\beta_1\) and \(\beta_2\) are both equal to zero.
\[H_0: \beta_1\ = \beta_2\ = 0\]
We can test this model comparison by looking at our model output
using the anova() function:
anova(model)
## Analysis of Variance Table
##
## Response: Disagreeableness
## Df Sum Sq Mean Sq F value Pr(>F)
## RelationshipStatus 2 7.492 3.7460 2.7338 0.06803 .
## Residuals 159 217.872 1.3703
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Remember that the significance of the findings does not speak to the size of the findings. Effect sizes are important to report to give the audience an idea of the practical importance of the effect or relationship being discussed, as well as for future researchers who may want to include your study in a meta-analysis.
# Eta-Squared for relationship status
etaSquared(model)
## eta.sq eta.sq.part
## RelationshipStatus 0.03324424 0.03324424
# Cohen's d for specific group comparisons
means <- emmeans(model, ~RelationshipStatus)
eff_size(means, sigma = sigma(model), edf = df.residual(model))
## contrast effect.size SE df lower.CL upper.CL
## Single - Dating -0.4645 0.209 159 -0.87690 -0.0521
## Single - Seriously Dating -0.0355 0.179 159 -0.38898 0.3180
## Dating - Seriously Dating 0.4290 0.220 159 -0.00599 0.8640
##
## sigma used for effect sizes: 1.171
## Confidence level used: 0.95
Single and Dating conditions.# Obtaining Cohen's d for Single vs the combination of the Dating and Seriously Dating conditions
single <- c(1,0,0)
dating <- c(0,1,0)
serious <- c(0,0,1)
dating_and_serious <- (dating + serious)/2
eff_size(means, method = list("Single vs Not single" = single - dating_and_serious), sigma = sigma(model), edf = df.residual(model))
## contrast effect.size SE df lower.CL upper.CL
## Single vs Not single -0.25 0.16 159 -0.567 0.0666
##
## sigma used for effect sizes: 1.171
## Confidence level used: 0.95
emmeans package.confint(model)
## 2.5 % 97.5 %
## (Intercept) 5.166226117 5.54827270
## RelationshipStatusRelCode1 -0.661782925 0.07652119
## RelationshipStatusRelCode2 -0.003964717 1.00832202
In this study, we examined whether relationship status predicted people’s disagreeableness. Specifically, we wanted to test whether people who were single had significantly different disagreeableness compared to people who were dating or seriously dating.
Using a linear regression analysis, we found that people who were single did not score significantly differently on disagreeableness compared to people who were dating or seriously dating, t(159) = -1.57, p = .119, d = 0.25, b1 = -0.29, 95%CI[-0.66, 0.08].
In fact, overall, relationship status did not significantly predict people’s disagreeableness, F(2, 159) = 2.73, p = .068, \(R^2\) = 0.03.