1 Loading Libraries

library(psych) # for the describe() command
library(broom) # for the augment() command
library(ggplot2) # to visualize our results

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

2 Importing Data

# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use EAMMi2 data
d <- read.csv(file="Data/mydata.csv", header=T)

3 State Your Hypothesis

We hypothesize that the need to belong will significantly predict social media use, and that the relationship will be positive.

4 Check Your Variables

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)

## 'data.frame':    2995 obs. of  9 variables:
##  $ gender          : chr  "f" "m" "m" "f" ...
##  $ edu             : chr  "2 Currently in college" "5 Completed Bachelors Degree" "2 Currently in college" "2 Currently in college" ...
##  $ moa_independence: num  3.67 3.67 3.5 3 3.83 ...
##  $ moa_role        : num  3 2.67 2.5 2 2.67 ...
##  $ moa_safety      : num  2.75 3.25 3 1.25 2.25 2.5 4 3.25 2.75 3.5 ...
##  $ moa_maturity    : num  3.67 3.33 3.67 3 3.67 ...
##  $ mindful         : num  2.4 1.8 2.2 2.2 3.2 ...
##  $ belong          : num  2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
##  $ socmeduse       : int  47 23 34 35 37 13 37 43 37 29 ...

# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(d)

##                  vars    n  mean   sd median trimmed  mad   min max range  skew
## gender*             1 2995  1.28 0.49   1.00    1.21 0.00  1.00   3  2.00  1.39
## edu*                2 2995  2.51 1.25   2.00    2.19 0.00  1.00   7  6.00  2.17
## moa_independence    3 2995  3.54 0.46   3.67    3.61 0.49  1.00   4  3.00 -1.43
## moa_role            4 2995  2.96 0.72   3.00    3.00 0.74  1.00   4  3.00 -0.32
## moa_safety          5 2995  3.20 0.64   3.25    3.26 0.74  1.00   4  3.00 -0.71
## moa_maturity        6 2995  3.59 0.43   3.67    3.65 0.49  1.00   4  3.00 -1.20
## mindful             7 2995  3.71 0.84   3.73    3.72 0.79  1.13   6  4.87 -0.06
## belong              8 2995  3.23 0.61   3.30    3.25 0.59  1.30   5  3.70 -0.26
## socmeduse           9 2995 34.45 8.58  35.00   34.73 7.41 11.00  55 44.00 -0.32
##                  kurtosis   se
## gender*              0.88 0.01
## edu*                 3.58 0.02
## moa_independence     2.49 0.01
## moa_role            -0.85 0.01
## moa_safety           0.03 0.01
## moa_maturity         1.90 0.01
## mindful             -0.13 0.02
## belong              -0.13 0.01
## socmeduse            0.27 0.16

# also use histograms to examine your continuous variables
hist(d$belong)

hist(d$socmeduse)

# last, use scatterplots to examine your continuous variables together
plot(d$belong, d$socmeduse)

5 Run a Simple Regression

# to calculate standardized coefficients, we have to standardize our IV
d$belong <- scale(d$belong, center=T, scale=T)
d$socmeduse <- scale(d$socmeduse, center=T, scale=T)

# use the lm() command to run the regression
# dependent/outcome variable on the left, idependent/predictor variable on the right
reg_model <- lm(socmeduse ~ belong, data = d)

6 Check Your Assumptions

6.1 Simple Regression Assumptions

Should have two measurements for each participant
Variables should be continuous and normally distributed
Outliers should be identified and removed
Relationship between the variables should be linear
Residuals should be normal and have constant variance note: we will not be evaluating whether our data meets these assumptions in this lab/homework – we’ll come back to them next week when we talk about multiple linear regression

6.2 Create plots and view residuals

model.diag.metrics <- augment(reg_model)

ggplot(model.diag.metrics, aes(x = belong, y = socmeduse)) +
  geom_point() +
  stat_smooth(method = lm, se = FALSE) +
  geom_segment(aes(xend = belong, yend = .fitted), color = "red", size = 0.3)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `geom_smooth()` using formula = 'y ~ x'

6.3 Check linearity with Residuals vs Fitted plot

This plot appears to be much closer to the ‘good’plot than it is to the ’serious issues’ plots, so we’ll consider our data okay and proceed with our analysis. The amount of variation around the line does not change along the length of the line and the points seem to be roughly symmetrically scattered about the line. The red line on the residuals vs fitted plot is slightly curved, but nearly straight.

plot(reg_model, 1)

6.4 Check for outliers

Our data does not have any severe outliers.

# Cook's distance
plot(reg_model, 4)

# Residuals vs Leverage
plot(reg_model, 5)

6.5 Issues with My Data

Before interpreting our results, we assessed our variables to see if they met the assumptions for a simple linear regression. Analysis of a Residuals vs Fitted plot suggested that there is some minor non-linearity, but not enough to violate the assumption of linearity. We also checked Cook’s distance and a Residuals vs Leverage plot to detect outliers and found that no cases were above the recommended cutoff for Cook’s distance.

7 View Test Output

Trivial: Less than 0.10 Small: 0.10–0.29 Medium: 0.30–0.49 Large: 0.50 or greater

summary(reg_model)

## 
## Call:
## lm(formula = socmeduse ~ belong, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2175 -0.5685  0.0342  0.6422  3.2299 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 8.577e-17  1.757e-02     0.0        1    
## belong      2.758e-01  1.757e-02    15.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9614 on 2993 degrees of freedom
## Multiple R-squared:  0.07609,    Adjusted R-squared:  0.07578 
## F-statistic: 246.5 on 1 and 2993 DF,  p-value: < 2.2e-16

# note for section below: to type lowercase Beta below (ß) you need to hold down Alt key and type 225 on numeric keypad. If that doesn't work you should be able to copy/paste it from somewhere else

8 Write Up Results

To test our hypothesis that the need to belong will significantly predict social media use, and that the relationship will be positive, we used a simple linear regression to model the relationship between the variables. We confirmed that our data met the assumptions of a linear regression, checking the linearity of the relationship using a Residuals vs Fitted plot and checking for outliers using Cook’s distance and a Residuals vs Leverage plot. Note: we are skipping the assumptions of normality and homogeneity of variance for this assignment.

As predicted, we found that the need to belong significantly predicted social media use, Adj. R² = .08, F(1,2993) = 246.5, p < .001. The relationship between need to belong and social media use was positive, ß = .28, t(2993) = 15.7, p < .001 (refer to Figure 1). According to Cohen (1988), this constitutes a small effect size.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.

Running a Simple Regression- HW

Maris Morgan

2024-07-19