1 Loading Libraries

library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
library(psych) # for the describe() command
library(broom) # for the augment() command
library(ggplot2) # to visualize our results
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

2 Importing Data

# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use ARC data downloaded previous for lab
d <- read.csv(file="Data/eammi2_final.csv", header=T)

3 Correlation: State Your Hypothesis

We predict that stress (measured by the PSS), need to belong (measured by the NTBS), subjective well-being (measured by the SWLS), and social media use (measured by the SMUS) will all be correlated with each other. Furthermore, we predict that subjective well-being will be lower in participants who are higher in stress or who report more use of social media.

4 Correlation: Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame':    3182 obs. of  7 variables:
##  $ ResponseId: chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
##  $ gender    : chr  "f" "m" "m" "f" ...
##  $ race_rc   : chr  "white" "white" "white" "other" ...
##  $ swb       : num  4.33 4.17 1.83 5.17 3.67 ...
##  $ stress    : num  3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
##  $ belong    : num  3.4 3.4 3.6 3.6 3.2 3.4 3.5 3.2 3.5 2.7 ...
##  $ SocMedia  : num  4.27 2.09 3.09 3.18 3.36 ...
# since we're focusing on our continuous variables, we're going to subset them into their own dataframe. this will make some stuff we're doing later easier.
cont <- subset(d, select=c(stress,belong,swb,SocMedia))

# you can use the describe() command on an entire dataframe (d) or just on a single variable (d$pss)
describe(cont)
##          vars    n mean   sd median trimmed  mad min max range  skew kurtosis
## stress      1 3175 3.27 0.41   3.30    3.26 0.44   1   5     4 -0.16     2.67
## belong      2 3175 3.31 0.49   3.30    3.33 0.44   1   5     4 -0.33     0.64
## swb         3 3178 4.47 1.32   4.67    4.53 1.48   1   7     6 -0.36    -0.46
## SocMedia    4 3175 3.13 0.78   3.18    3.16 0.67   1   5     4 -0.31     0.26
##            se
## stress   0.01
## belong   0.01
## swb      0.02
## SocMedia 0.01
# our fake variable has high kurtosis, which I'll ignore. you don't need to discuss univariate normality in the results write-ups for the labs/homework, but you will need to discuss it in your final manuscript

# also use histograms to examine your continuous variables
hist(d$stress)

hist(d$belong)

hist(d$swb)

hist(d$SocMedia)

# last, use scatterplots to examine your continuous variables together
plot(d$stress, d$belong)

plot(d$stress, d$swb)

plot(d$stress, d$SocMedia)

plot(d$belong, d$swb)

plot(d$belong, d$SocMedia)

plot(d$swb, d$SocMedia)

5 Correlation: Check Your Assumptions

5.1 Pearson’s Correlation Coefficient Assumptions

  • Should have two measurements for each participant
  • Variables should be continuous and normally distributed
  • Outliers should be identified and removed
  • Relationship between the variables should be linear

5.1.1 Checking for Outliers

d <- na.omit(d)
cont <- na.omit(cont)

d$stress_std <- scale(d$stress, center=T, scale=T)
hist(d$stress_std)

sum(d$stress_std < -3 | d$stress_std > 3)
## [1] 34
d$belong_std <- scale(d$belong, center=T, scale=T)
hist(d$belong_std)

sum(d$belong_std < -3 | d$belong_std > 3)
## [1] 17
d$swb_std <- scale(d$swb, center=T, scale=T)
hist(d$swb_std)

sum(d$swb_std < -3 | d$swb_std > 3)
## [1] 0
d$SocMedia_std <- scale(d$SocMedia, center=T, scale=T)
hist(d$SocMedia_std)

sum(d$SocMedia_std < -3 | d$SocMedia_std > 3)
## [1] 0

5.2 Issues with My Data

All assumptions of Pearson’s correlation coefficient were met except I had two variables showing outliers. We found outliers using z-score. One variable, perceived stress (stress), had 34 outliers. The other variable, need to belong (belong), had 17 outliers. Outliers can distort the relationship between two variables and sway the correlation in their direction.

6 Correlation: Create a Correlation Matrix

corr_output_m <- corr.test(cont)

7 Correlation: View Test Output

corr_output_m
## Call:corr.test(x = cont)
## Correlation matrix 
##          stress belong   swb SocMedia
## stress     1.00   0.21 -0.12     0.15
## belong     0.21   1.00 -0.05     0.28
## swb       -0.12  -0.05  1.00     0.11
## SocMedia   0.15   0.28  0.11     1.00
## Sample Size 
## [1] 3165
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##          stress belong  swb SocMedia
## stress        0   0.00 0.00        0
## belong        0   0.00 0.01        0
## swb           0   0.01 0.00        0
## SocMedia      0   0.00 0.00        0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

8 Correlation: Write Up Results

To test our hypothesis that stress (measured by the PSS), need to belong (measured by the NTBS), subjective well-being (measured by the SWLS), and social media use (measured by the SMUS) would be correlated with one another, we calculated a series of Pearson’s correlation coefficients. Most of our data met the assumptions of the test. Two variables, perceived stress (stress) and need to belong (belong), did have outliers so any significant results involving these variables should be evaluated carefully (also with the assumption that these identified outliers are removed).

As predicted, we found that all four variables were significantly correlated (all ps < .001). The effect sizes of all correlations were large (rs > .5; Cohen, 1988). This test also supported our second hypothesis, that subjective well-being would be lower in participants who are higher in stress or who report more use of social media, as can be seen by the correlation coefficients reported in Table 1.

Table 1: Means, standard deviations, and correlations with confidence intervals
Variable M SD 1 2 3
Perceived stress (PSS) 3.27 0.41
Need to belong (NTBS) 3.31 0.49 .21**
[.18, .24]
Subjective well-being (SWLS) 4.47 1.32 -.12** -.05**
[-.15, -.09] [-.08, -.01]
Social media use (SMUS) 3.13 0.78 .15** .28** .11**
[.12, .19] [.24, .31] [.07, .14]
Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
* indicates p < .05
** indicates p < .01.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.

Cohen, S, Kamarck, T and Mermelstein, R 1983 A global measure of perceived stress. Journal of Health and Social Behavior, 24: 385–396. DOI: https://doi. org/10.2307/2136404

Diener, E 2000 Subjective well-being: The science of happiness and a proposal for a national index. American Psychologist, 55(1): 34–43. DOI: https://doi. org/10.1037/0003-066X.55.1.34

Leary, M R, Kelly, K M, Cottrell, C A and Schreindorfer, L S 2013 Construct validity of the Need to Belong Scale: Mapping the nomological network. Journal of Personality Assessment, 95(6): 610–624. DOI: https://doi.org/10.1080/00223891.2 013.819511

Yang, C. and Brown, B B 2013 Motives for using Facebook, patterns of Facebook activities, and late adolescents’ social adjustment to college. Journal of Youth and Adolescence, 42: 403–416. DOI: https://doi. org/10.1007/s10964-012-9836-x

9 Regression: State Your Hypothesis

We hypothesize that stress (measured by the PSS) will significantly predict subjective well-being (measured by the SWLS), and that the relationship will be negative.

10 Regression: Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame':    3159 obs. of  11 variables:
##  $ ResponseId  : chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
##  $ gender      : chr  "f" "m" "m" "f" ...
##  $ race_rc     : chr  "white" "white" "white" "other" ...
##  $ swb         : num  4.33 4.17 1.83 5.17 3.67 ...
##  $ stress      : num  3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
##  $ belong      : num  3.4 3.4 3.6 3.6 3.2 3.4 3.5 3.2 3.5 2.7 ...
##  $ SocMedia    : num  4.27 2.09 3.09 3.18 3.36 ...
##  $ stress_std  : num [1:3159, 1] 0.0819 0.82 0.0819 -0.1642 0.574 ...
##   ..- attr(*, "scaled:center")= num 3.27
##   ..- attr(*, "scaled:scale")= num 0.406
##  $ belong_std  : num [1:3159, 1] 0.177 0.177 0.583 0.583 -0.229 ...
##   ..- attr(*, "scaled:center")= num 3.31
##   ..- attr(*, "scaled:scale")= num 0.493
##  $ swb_std     : num [1:3159, 1] -0.108 -0.234 -2.002 0.523 -0.613 ...
##   ..- attr(*, "scaled:center")= num 4.48
##   ..- attr(*, "scaled:scale")= num 1.32
##  $ SocMedia_std: num [1:3159, 1] 1.4645 -1.3355 -0.0522 0.0645 0.2978 ...
##   ..- attr(*, "scaled:center")= num 3.13
##   ..- attr(*, "scaled:scale")= num 0.779
##  - attr(*, "na.action")= 'omit' Named int [1:23] 61 199 304 421 511 728 743 1047 1084 1141 ...
##   ..- attr(*, "names")= chr [1:23] "61" "199" "304" "421" ...
# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(d)
##              vars    n    mean     sd  median trimmed     mad   min     max
## ResponseId*     1 3159 1580.00 912.07 1580.00 1580.00 1171.25  1.00 3159.00
## gender*         2 3159    1.28   0.49    1.00    1.21    0.00  1.00    3.00
## race_rc*        3 3159    5.54   2.12    7.00    5.88    0.00  1.00    7.00
## swb             4 3159    4.48   1.32    4.67    4.53    1.48  1.00    7.00
## stress          5 3159    3.27   0.41    3.30    3.26    0.44  1.00    5.00
## belong          6 3159    3.31   0.49    3.40    3.33    0.44  1.00    5.00
## SocMedia        7 3159    3.13   0.78    3.18    3.16    0.67  1.00    5.00
## stress_std      8 3159    0.00   1.00    0.08   -0.01    1.09 -5.58    4.26
## belong_std      9 3159    0.00   1.00    0.18    0.03    0.90 -4.69    3.42
## swb_std        10 3159    0.00   1.00    0.14    0.04    1.12 -2.63    1.91
## SocMedia_std   11 3159    0.00   1.00    0.06    0.03    0.86 -2.74    2.40
##                range  skew kurtosis    se
## ResponseId*  3158.00  0.00    -1.20 16.23
## gender*         2.00  1.39     0.88  0.01
## race_rc*        6.00 -0.99    -0.67  0.04
## swb             6.00 -0.36    -0.45  0.02
## stress          4.00 -0.11     2.48  0.01
## belong          4.00 -0.30     0.54  0.01
## SocMedia        4.00 -0.31     0.26  0.01
## stress_std      9.84 -0.11     2.48  0.02
## belong_std      8.12 -0.30     0.54  0.02
## swb_std         4.55 -0.36    -0.45  0.02
## SocMedia_std    5.13 -0.31     0.26  0.02
# also use histograms to examine your continuous variables
hist(d$stress)

hist(d$swb)

# last, use scatterplots to examine your continuous variables together
plot(d$stress, d$swb)

11 Regression: Run a Simple Regression

# to calculate standardized coefficients, we have to standardize our IV
d$stress_std <- scale(d$stress, center=T, scale=T)
hist(d$stress_std)

# use the lm() command to run the regression
# dependent/outcome variable on the left, idependent/predictor variable on the right
reg_model <- lm(swb ~ stress_std, data = d)

12 Regression: Check Your Assumptions

12.1 Simple Regression Assumptions

  • Should have two measurements for each participant
  • Variables should be continuous and normally distributed
  • Outliers should be identified and removed
  • Relationship between the variables should be linear
  • Residuals should be normal and have constant variance

12.2 Create plots and view residuals

model.diag.metrics <- augment(reg_model)

ggplot(model.diag.metrics, aes(x = stress_std, y = swb)) +
  geom_point() +
  stat_smooth(method = lm, se = FALSE) +
  geom_segment(aes(xend = stress_std, yend = .fitted), color = "red", size = 0.3)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'

12.3 Check linearity with Residuals vs Fitted plot

plot(reg_model, 1)

Our plot suggests there is some minor non-linearity, but nothing drastic enough to affect our results. Our plot seems to compare more with the ‘good’ plots and less with plots that demonstrated problems. Our red line shows small curves, but again nothing extreme.

12.4 Check for outliers

# Cook's distance
plot(reg_model, 4)

# Residuals vs Leverage
plot(reg_model, 5)

Our data doesn’t have any severe outliers: Cook’s distance has a cutoff of 0.5, and our highest outlier being a little over 0.015 puts us far away from having any severe outliers. To address the highest outliers in our data, we will note that participants 1663, 2038, and 2882 may stick out… with 1663 being our highest outlier.

12.5 Issues with My Data

Before interpreting our results, we assessed our variables to see if they met the assumptions for a simple linear regression. Analysis of a Residuals vs Fitted plot suggested that there is some minor non-linearity, but not enough to violate the assumption of linearity. We also checked Cook’s distance and a Residuals vs Leverage plot to detect outliers. Our highest detected outliers were all below the recommended cutoff for Cook’s distance.

13 Regression: View Test Output

summary(reg_model)
## 
## Call:
## lm(formula = swb ~ stress_std, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9832 -0.8887  0.1241  0.9847  3.0584 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.47573    0.02331 192.004  < 2e-16 ***
## stress_std  -0.16281    0.02331  -6.983 3.51e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.31 on 3157 degrees of freedom
## Multiple R-squared:  0.01521,    Adjusted R-squared:  0.0149 
## F-statistic: 48.76 on 1 and 3157 DF,  p-value: 3.505e-12
# note for section below: to type lowercase Beta below (ß) you need to hold down Alt key and type 225 on numeric keypad. If that doesn't work you should be able to copy/paste it from somewhere else

14 Regression: Write Up Results

To test our hypothesis that perceived stress (measured by the PSS) will significantly predict subjective well-being (measured by the SWLS), and that the relationship will be negative, we used a simple linear regression to model the relationship between the variables. We confirmed that our data met the assumptions of a linear regression, checking the linearity of the relationship using a Residuals vs Fitted plot and checking for outliers using Cook’s distance and a Residuals vs Leverage plot.

As predicted, we found that perceived stress significantly predicted subjective well-being, Adj. R2 = 0.015, F(1,3157) = 48.76, p < .001. The relationship between perceived stress and subjective well-being was negative, ß = -0.16, t(3157) = -6.98, p < .001 (refer to Figure 1). According to Cohen (1988), this constitutes a large effect size (> .50).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.

Cohen, S, Kamarck, T and Mermelstein, R 1983 A global measure of perceived stress. Journal of Health and Social Behavior, 24: 385–396. DOI: https://doi. org/10.2307/2136404

Diener, E 2000 Subjective well-being: The science of happiness and a proposal for a national index. American Psychologist, 55(1): 34–43. DOI: https://doi. org/10.1037/0003-066X.55.1.34