1 Loading Libraries

library(psych) # for the describe() command
library(car) # for the vif() command

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

library(sjPlot) # to visualize our results

## Learn more about sjPlot with 'browseVignettes("sjPlot")'.

2 Importing Data

# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use EAMMi2 data
d <- read.csv(file="eammi2_final.csv", header=T)

3 State Your Hypothesis

We hypothesize that stress (measured by the PSS), need to belong (measured by the NTBS), and social media use (measured by SMUS) will significantly predict subjective well-being (measured by the SWLS).

4 Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)

## 'data.frame':    3182 obs. of  7 variables:
##  $ ResponseId: chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
##  $ gender    : chr  "f" "m" "m" "f" ...
##  $ race_rc   : chr  "white" "white" "white" "other" ...
##  $ swb       : num  4.33 4.17 1.83 5.17 3.67 ...
##  $ stress    : num  3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
##  $ belong    : num  3.4 3.4 3.6 3.6 3.2 3.4 3.5 3.2 3.5 2.7 ...
##  $ SocMedia  : num  4.27 2.09 3.09 3.18 3.36 ...

cont <- na.omit(subset(d, select=c(swb,belong,stress,SocMedia)))
cont$belong <- scale(cont$belong, center=T, scale=T)
cont$stress <- scale(cont$stress, center=T, scale=T)
cont$SocMedia <- scale(cont$SocMedia, center=T, scale=T)
# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(cont)

##          vars    n mean   sd median trimmed  mad   min  max range  skew
## swb         1 3165 4.47 1.32   4.67    4.53 1.48  1.00 7.00  6.00 -0.36
## belong      2 3165 0.00 1.00  -0.02    0.03 0.90 -4.67 3.41  8.08 -0.33
## stress      3 3165 0.00 1.00   0.08   -0.01 1.09 -5.55 4.25  9.80 -0.16
## SocMedia    4 3165 0.00 1.00   0.07    0.03 0.86 -2.73 2.40  5.13 -0.31
##          kurtosis   se
## swb         -0.46 0.02
## belong       0.64 0.02
## stress       2.67 0.02
## SocMedia     0.26 0.02

# also use histograms to examine your continuous variables
hist(cont$belong)

hist(cont$stress)

hist(cont$SocMedia)

hist(cont$swb)

# last, use scatterplots to examine your continuous variables together
plot(cont$stress, cont$swb)

plot(cont$stress, cont$belong)

plot(cont$stress, cont$SocMedia)

plot(cont$belong, cont$swb)

plot(cont$SocMedia, cont$swb)

plot(cont$belong, cont$SocMedia)

5 View Your Correlations

corr_output_m <- corr.test(cont)
corr_output_m

## Call:corr.test(x = cont)
## Correlation matrix 
##            swb belong stress SocMedia
## swb       1.00  -0.05  -0.12     0.11
## belong   -0.05   1.00   0.21     0.28
## stress   -0.12   0.21   1.00     0.15
## SocMedia  0.11   0.28   0.15     1.00
## Sample Size 
## [1] 3165
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##           swb belong stress SocMedia
## swb      0.00   0.01      0        0
## belong   0.01   0.00      0        0
## stress   0.00   0.00      0        0
## SocMedia 0.00   0.00      0        0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

6 Run a Multiple Linear Regression

# use the lm() command to run the regression
# dependent/outcome variable on the left, independent/predictor variables on the right
reg_model <- lm(swb ~ belong + stress + SocMedia, data = cont)

7 Check Your Assumptions

7.1 Multiple Linear Regression Assumptions

Observations should be independent
Number of cases should be adequate (N ≥ 80 + 8m, where m is the number of IVs)
Independent variables should not be too correlated (aka multicollinearity)
Variables should be continuous and normally distributed
Outliers should be identified and removed
Relationship between the variables should be linear
Residuals should be normal and have constant variance

7.2 Count Number of Cases

needed <- 80 + 8*3
nrow(cont) >= needed

## [1] TRUE

7.3 Check multicollinearity

Higher values indicate more multicollinearity
Cutoff is usually 5

For your homework, you will need to discuss multicollinearity and any high values, but you don’t have to drop any variables.

vif(reg_model)

##   belong   stress SocMedia 
## 1.117089 1.056837 1.093380

7.4 Check linearity with Residuals vs Fitted plot

My plot seems to compare more to the ‘good’ plots than the ‘bad’ plots. My red line stays pretty close to the average of zero line throughout, with little non-linearity.

plot(reg_model, 1)

7.5 Check for outliers using Cook’s distance and a Residuals vs Leverage plot

I have identified my three highest outliers with Cook’s distance. They are participants 728, 2882, and 1663- with 728 being the highest. None of my outliers were close to the cutoff, therefor I will not eliminate any participants.

# Cook's distance
plot(reg_model, 4)

# Residuals vs Leverage
plot(reg_model, 5)

7.6 Check homogeneity of variance in a Scale-Location plot

My plot seems to be more similar to the ‘good’ plots. My red line is mostly horizontal but looks like it starts to deviate from the mean line, although it doesn’t look like anything extreme.

plot(reg_model, 3)

7.7 Check normality of residuals with a Q-Q plot

My plot compares more to the ‘good’ plot examples and stays close to the dashed line throughout, indicating nothing crazy when it comes to skew or kurtosis.

plot(reg_model, 2)

7.8 Issues with My Data

Before interpreting our results, we assessed our variables to see if they met the assumptions for a multiple linear regression. We analyzed a Scale-Location plot and detected some issues with homogeneity of variance, as well as some issues with linearity in a Residuals vs Fitted plot (but nothing extreme). However, we did not detect any outliers (visually analyzing a Residuals vs Leverage plot) or any serious issues with the normality of our residuals (visually analyzing a Q-Q plot).

8 View Test Output

summary(reg_model)

## 
## Call:
## lm(formula = swb ~ belong + stress + SocMedia, data = cont)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0330 -0.9111  0.1341  0.9781  3.6096 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.47246    0.02311 193.519  < 2e-16 ***
## belong      -0.08067    0.02443  -3.302 0.000971 ***
## stress      -0.16989    0.02376  -7.149 1.08e-12 ***
## SocMedia     0.18745    0.02417   7.756 1.18e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.3 on 3161 degrees of freedom
## Multiple R-squared:  0.03331,    Adjusted R-squared:  0.03239 
## F-statistic: 36.31 on 3 and 3161 DF,  p-value: < 2.2e-16

# note for section below: to type lowercase Beta below (ß) you need to hold down Alt key and type 225 on numeric keypad. If that doesn't work you should be able to copy/paste it from somewhere else

9 Write Up Results

To test our hypothesis that stress (measured by the PSS), need to belong (measured by the NTBS), and social media use (measured by SMUS) will significantly predict subjective well-being (measured by the SWLS), we used a multiple linear regression to model the relationship between the variables. We confirmed that our data met the assumptions of a linear regression, and although there were some small issues with homogeneity of variance and linearity we continued with the analysis anyway.

Our model was statistically significant, Adj. R² = .03, F(3,3161) = 36.31, p < .001. The relationship between social media use and subjective well-being was positive and has a large effect size (per Cohen, 1988), while the relationships between our remaining predictors (need to belong and perceived stress) and our outcome (subjective well-being) were negative and had effect sizes that were small. Full output from the regression model is reported in Table 1.

Table 1: Regression model of subjective well-being
	Subjective Well-being (SWLS)
Predictors	Estimates	SE	CI	p
Intercept	4.47	0.02	4.43 – 4.52	<0.001
Need to Belong (NTBS)	-0.08	0.02	-0.13 – -0.03	0.001
Perceived Stress (PSS)	-0.17	0.02	-0.22 – -0.12	<0.001
Social Media use(SMUS)	0.19	0.02	0.14 – 0.23	<0.001
Observations	3165
R² / R² adjusted	0.033 / 0.032

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.

Running a Multiple Linear Regression HW

Sophie Sabo

2023-06-18