#install.packages("sjPlot")
library(psych)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
library(sjPlot)
d <- read.csv(file="Data/projectdata.csv", header=T)
We hypothesize that perceived stress and social support will significantly predict subjective wellbeing.
str(d)
## 'data.frame': 3162 obs. of 7 variables:
## $ ResponseID: chr "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
## $ gender : chr "f" "m" "m" "f" ...
## $ socmeduse : int 47 23 34 35 37 13 37 43 37 29 ...
## $ stress : num 3.3 3.3 4 3.2 3.1 3.5 3.3 2.4 2.9 2.7 ...
## $ swb : num 4.33 4.17 1.83 5.17 3.67 ...
## $ belong : num 2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
## $ support : num 6 6.75 5.17 5.58 6 ...
cont <- na.omit(subset(d, select=c(swb, stress, support)))
cont$row_id <- 1:nrow(cont)
cont$stress <- scale(cont$stress, center=T, scale=T)
cont$support <- scale(cont$support, center=T, scale=T)
describe(cont)
## vars n mean sd median trimmed mad min max range
## swb 1 3162 4.48 1.32 4.67 4.53 1.48 1.00 7.00 6.00
## stress 2 3162 0.00 1.00 -0.08 -0.01 0.99 -2.92 2.75 5.67
## support 3 3162 0.00 1.00 0.19 0.11 0.88 -4.90 1.30 6.20
## row_id 4 3162 1581.50 912.94 1581.50 1581.50 1172.00 1.00 3162.00 3161.00
## skew kurtosis se
## swb -0.36 -0.45 0.02
## stress 0.03 -0.17 0.02
## support -1.10 1.43 0.02
## row_id 0.00 -1.20 16.24
hist(cont$swb)
hist(cont$stress)
hist(cont$support)
plot(cont$stress, cont$swb)
plot(cont$support, cont$swb)
plot(cont$stress, cont$support)
corr_output_m <- corr.test(x = cont[, c("stress", "support", "swb")])
corr_output_m
## Call:corr.test(x = cont[, c("stress", "support", "swb")])
## Correlation matrix
## stress support swb
## stress 1.00 -0.21 -0.50
## support -0.21 1.00 0.47
## swb -0.50 0.47 1.00
## Sample Size
## [1] 3162
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## stress support swb
## stress 0 0 0
## support 0 0 0
## swb 0 0 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
#cont <- subset(cont, row_id!=c())
reg_model <- lm(swb ~ stress + support, data = cont)
Assumptions we’ve discussed previously:
New assumptions:
needed <- 80 + 8*2
nrow(cont) >= needed
## [1] TRUE
vif(reg_model)
## stress support
## 1.046669 1.046669
The plot below shows the residuals for each case and the fitted line. The red line is the average residual for the specified point of the dependent variable. If the assumption of linearity is met, the red line should be horizontal. This indicates that the residuals average to around zero. However, a bit of deviation is okay – just like with skewness and kurtosis, there’s a range that we can work in before non-normality becomes a critical issue. For some examples of good Residuals vs Fitted plot and ones that show serious errors, check out this page.
plot(reg_model, 1)
The plots below both address leverage, or how much each data point is able to influence the regression line. Outliers are points that have undue influence on the regression line, the way that Bill Gates entering the room has an undue influence on the mean income.
The first plot, Cook’s distance, is a visualization of a score called (you guessed it) Cook’s distance, calculated for each case (aka row or participant) in the dataframe. Cook’s distance tells us how much the regression would change if the point was removed. The second plot also includes the residuals in the examination of leverage. The standardized residuals are on the y-axis and leverage is on the x-axis; this shows us which points have high residuals (are far from the regression line) and high leverage. Points that have large residuals and high leverage are especially worrisome, because they are far from the regression line but are also exerting a large influence on it.
# Cook's distance
plot(reg_model, 4)
# Residuals vs Leverage
plot(reg_model, 5)
This plot is a bit new. It’s called a Q-Q plot and shows the standardized residuals plotted against a normal distribution. If our variables are perfectly normal, the points will fit on the dashed line perfectly. This page shows how different types of non-normality appear on a Q-Q plot.
It’s normal for Q-Q plots to show a bit of deviation at the ends. This page shows some examples that help us put our Q-Q plot into context.
plot(reg_model, 2)
Before interpreting our results, we assessed our variables to see if they met the assumptions for a multiple linear regression. We did not detect any serious issues with linearity in a Residuals vs Fitted plot, as the red line remained approximately horizontal across the range of fitted values. We did not detect any prominent outliers (by visually analyzing Cook’s Distance and Residuals vs Leverage plots), nor were there any serious issues with the normality of our residuals (by visually analyzing a Q-Q plot, which tracked the diagonal line closely with only minor deviations at the tails). There were also no issues of multicollinearity among our two independent variables (VIF = 1.05).
summary(reg_model)
##
## Call:
## lm(formula = swb ~ stress + support, data = cont)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1946 -0.6882 0.0509 0.6776 4.1669
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.47554 0.01834 244.04 <2e-16 ***
## stress -0.55602 0.01877 -29.63 <2e-16 ***
## support 0.50426 0.01877 26.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.031 on 3159 degrees of freedom
## Multiple R-squared: 0.3908, Adjusted R-squared: 0.3904
## F-statistic: 1013 on 2 and 3159 DF, p-value: < 2.2e-16
Effect size, based on Regression ß (Beta Estimate) value in our output
To test our hypothesis that perceived stress and social support would significantly predict subjective wellbeing, we used a multiple regression to model the associations between these variables. We confirmed that our data met the assumptions of a linear regression.
Our hypothesis was supported. The model was statistically significant, Adj. R2 = .39, F(2, 3159) = 1013, p < .001. Our results indicate that perceived stress negatively predicts subjective wellbeing and had a large effect size (ß > 0.50; per Cohen, 1988), while social support positively predicts subjective wellbeing and had a large effect size (ß > 0.50). Full output from the regression model is reported in Table 1. This means that people’s subjective wellbeing decreases by 0.56 units for every one unit increase in their perceived stress, while it increases by 0.50 units for every one unit increase in their social support.
| Subjective Wellbeing | ||||
|---|---|---|---|---|
| Predictors | Estimates | SE | CI | p |
| Intercept | 4.48 | 0.02 | 4.44 – 4.51 | <0.001 |
| Perceived Stress | -0.56 | 0.02 | -0.59 – -0.52 | <0.001 |
| Social Support | 0.50 | 0.02 | 0.47 – 0.54 | <0.001 |
| Observations | 3162 | |||
| R2 / R2 adjusted | 0.391 / 0.390 | |||
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.