Research Project Assignment (RPA) #4

Correlations and Regression

Author

Research Methods in Applied Psychology II - APSY-UE-1137

Published

January 1, 2025

Assignment Overview

Group Member Names: [Victoria Vargas, Carolyn Zazueta, Chris Ramirez Ward]

Predictor construct + measure title: [Environments Raised in / Demographic-Urbanicity]

Outcome construct + measure title: [Self-Esteem / Rosenberg Self-Esteem]

For this assignment, you will focus on exploring the associations between your predictor, outcome, and other continuous variables using correlation and regression analyses.

Part 1: Identifying an Additional Construct

1a. Additional Construct Selection

Additional Construct: [Loneliness]

Measure Title: [Three-Item Loneliness Scale]

Variable Name(s): [grp2_lone_1 through grp2_lone_3]

Part 2: Preparing Your Data - Scaling

# Load the data
data <- read_sav("data/data.sav")

# Display the first few rows
head(data)

# A tibble: 6 × 357
  ID    sex        gender_identity sexual_orientation   age race_ethnicity     
  <chr> <dbl+lbl>  <dbl+lbl>       <dbl+lbl>          <dbl> <dbl+lbl>          
1 P0001 2 [Female] 1 [Man]         2 [Gay/Lesbian]       19 1 [White]          
2 P0002 1 [Male]   2 [Woman]       2 [Gay/Lesbian]       18 1 [White]          
3 P0003 1 [Male]   1 [Man]         1 [Heterosexual]      21 2 [Hispanic/Latino]
4 P0004 1 [Male]   1 [Man]         1 [Heterosexual]      21 5 [Multiracial]    
5 P0005 1 [Male]   1 [Man]         2 [Gay/Lesbian]       20 1 [White]          
6 P0006 2 [Female] 3 [Non-binary]  1 [Heterosexual]      19 3 [Asian]          
# ℹ 351 more variables: year_in_school <dbl+lbl>, income <dbl>,
#   greeklife <dbl>, intstatus <dbl>, environment <dbl>, gen <dbl>,
#   dass_1 <dbl>, dass_2 <dbl>, dass_3 <dbl>, dass_4 <dbl>, dass_5 <dbl>,
#   dass_6 <dbl>, dass_7 <dbl>, dass_8 <dbl>, dass_9 <dbl>, dass_10 <dbl>,
#   dass_11 <dbl>, dass_12 <dbl>, dass_13 <dbl>, dass_14 <dbl>, dass_15 <dbl>,
#   dass_16 <dbl>, dass_17 <dbl>, dass_18 <dbl>, dass_19 <dbl>, dass_20 <dbl>,
#   dass_21 <dbl>, swls_1 <dbl>, swls_2 <dbl>, swls_3 <dbl>, swls_4 <dbl>, …

2a. Outcome Scale Recreation

rse_score <- data %>%
    # Step 1: Select your outcome items
      select(ID, rse_1:rse_10) %>%

        # Add your outcome items here
        # Example: item1, item2, item3, etc.
    # Step 2: Reverse code items if needed
     mutate(
    rse_2_reversed = case_when(
    rse_2 == 1 ~ 4,
    rse_2 == 2 ~ 3,
    rse_2 == 3 ~ 2,
    rse_2 == 4 ~ 1,
    TRUE ~ NA_real_
  ),
  rse_5_reversed = case_when(
    rse_5 == 1 ~ 4,
    rse_5 == 2 ~ 3,
    rse_5 == 3 ~ 2,
    rse_5 == 4 ~ 1,
    TRUE ~ NA_real_
  ),
  rse_6_reversed = case_when(
    rse_6 == 1 ~ 4,
    rse_6 == 2 ~ 3,
    rse_6 == 3 ~ 2,
    rse_6 == 4 ~ 1,
    TRUE ~ NA_real_
  ),
  rse_8_reversed = case_when(
    rse_8 == 1 ~ 4,
    rse_8 == 2 ~ 3,
    rse_8 == 3 ~ 2,
    rse_8 == 4 ~ 1,
    TRUE ~ NA_real_
  ),
  rse_9_reversed = case_when(
    rse_9 == 1 ~ 4,
    rse_9 == 2 ~ 3,
    rse_9 == 3 ~ 2,
    rse_9 == 4 ~ 1,
    TRUE ~ NA_real_)
  ) %>%
    # Step 3: Create your outcome scale
  mutate(
    self_esteem = rowSums(
      select(., rse_1, rse_2_reversed, rse_3, rse_4, rse_5_reversed, rse_6_reversed, rse_7, rse_8_reversed, rse_9_reversed, rse_10),
      na.rm = TRUE
    )
  ) %>% 
  dplyr::select(ID, self_esteem)

# Check your outcome scale
summary(rse_score)

      ID             self_esteem   
 Length:200         Min.   :15.00  
 Class :character   1st Qu.:22.75  
 Mode  :character   Median :29.00  
                    Mean   :28.95  
                    3rd Qu.:35.00  
                    Max.   :39.00

2b. Predictor Variable Assessment

Is your predictor a multi-item scale or single item variable? [Single item]

If multi-item, which items need to be reverse coded? [None]

# HINT: Only complete this if your predictor is a multi-item scale
# Reverse code items if needed

# Your code here:
# data_with_scores <- data_with_scores %>%
#   mutate(
#     # Add reverse coding for predictor items
#   )

2c. Predictor Scale Creation

Numeric function for predictor scale: [Sum all items/Average all items]

# HINT: Create your predictor scale score
# Use reverse-coded items if necessary

# Your code here:
# data_with_scores <- data_with_scores %>%
#   mutate(
#     # Create predictor scale using rowMeans() or rowSums()
#   )

# Check your predictor scale
# summary(data_with_scores$predictor_scale)

2d. Predictor Descriptive Statistics (Single Item)

# Create a factor variable
envrionment <- data %>% 
  select(ID, environment)

enviornment_categories <- 
envrionment %>%
  mutate(enviornment_category = case_when(
    environment == 1 ~ "Rural", 
    environment == 2 ~ "Suburban", 
    environment == 3 ~ "Urban", 
    .default = NA
  ))


enviornment_categories %>% count(enviornment_category)

# A tibble: 3 × 2
  enviornment_category     n
  <chr>                <int>
1 Rural                    2
2 Suburban                93
3 Urban                  105

Sample size (n): [Rural = 2; Suburban = 93; Urban = 105]

2e. Additional Construct Assessment

Is your additional construct a multi-item scale or single item variable? [Multi-item]

If multi-item, which items need to be reverse coded? [None]

# HINT: Only complete this if your additional construct is a multi-item scale
# Reverse code items if needed

# Your code here:
# data_with_scores <- data_with_scores %>%
#   mutate(
#     # Add reverse coding for additional construct items
#   )

2f. Additional Construct Scale Creation

Numeric function for additional construct scale: [Sum all items: 1, 2 & 3]

# HINT: Create your additional construct scale score
# Use reverse-coded items if necessary

# Your code here:
grp_score <- data %>%
      select(ID, grp2_lone_1:grp2_lone_3) %>%
     mutate(
    loneliness = rowSums(
      select(., grp2_lone_1, grp2_lone_2, grp2_lone_3),
      na.rm = TRUE
    )
  ) %>% 
  select(ID, loneliness)
        
# Check your additional construct scale
summary(grp_score$loneliness)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.000   6.000   7.000   7.665   9.000  13.000

2g. Additional Construct Descriptive Statistics (Single Item)

# HINT: Only complete this if your additional construct is a single item
# Run descriptive statistics on your additional construct variable

# Your code here:
# data_with_scores %>%
#   summarise(
#     n = sum(!is.na(additional_variable)),
#     mean = mean(additional_variable, na.rm = TRUE),
#     sd = sd(additional_variable, na.rm = TRUE)
#   )

Sample size (n): [N/A]

Mean: [N/A]

Standard deviation: [N/A]

Part 3: Bivariate Correlations

3a. Correlation Matrix

data_with_scores <- 
  left_join(envrionment, rse_score, by = "ID") %>% 
  left_join(grp_score, by = "ID") %>% 
  select(-ID)

# HINT: Run bivariate correlations between all three variables
# You can use cor(), cor.test(), or GGally::ggpairs()

# Your code here:
cor.matrix <- cor(data_with_scores, method = "spearman")
#See results 
cor.matrix

            environment self_esteem  loneliness
environment  1.00000000   0.1233366 -0.04379368
self_esteem  0.12333662   1.0000000 -0.32983703
loneliness  -0.04379368  -0.3298370  1.00000000

# For significance tests:
# HINT
# cor.test(data_with_scores$outcome_scale, data_with_scores$predictor_scale)

# Test for correlation between outcome and predictor
cor.test(data_with_scores$self_esteem, data_with_scores$environment)


    Pearson's product-moment correlation

data:  data_with_scores$self_esteem and data_with_scores$environment
t = 1.5326, df = 198, p-value = 0.127
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.03093202  0.24335894
sample estimates:
      cor 
0.1082737

# Test for correlation between outcome and additional construct
cor.test(data_with_scores$self_esteem, data_with_scores$loneliness)


    Pearson's product-moment correlation

data:  data_with_scores$self_esteem and data_with_scores$loneliness
t = -5.6278, df = 198, p-value = 6.175e-08
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.4851008 -0.2452477
sample estimates:
      cor 
-0.371353

# Test for correlation between predictor and additional construct
cor.test(data_with_scores$loneliness, data_with_scores$environment)


    Pearson's product-moment correlation

data:  data_with_scores$loneliness and data_with_scores$environment
t = -0.70273, df = 198, p-value = 0.4831
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.18732313  0.08948182
sample estimates:
        cor 
-0.04987843

3b. APA Style Correlation Descriptions

Correlation between Predictor and Outcome:

[A Pearson’s correlation was conducted to examine the relationship between self-esteem and environment. Results indicated a small, positive correlation between self-esteem and environment, r(198) = .11, p = .13, which was not statistically significant.]

Correlation between Predictor and Additional Construct:

[A Pearson’s correlation was conducted to examine the relationship between loneliness and environment. Results indicated a very weak, negative correlation between loneliness and environment, r(198) = -.05, p = .48, which was not statistically significant.]

Correlation between Outcome and Additional Construct:

[A Pearson’s correlation was conducted to examine the relationship between self-esteem and loneliness. There was a moderate, negative correlation between self-esteem and loneliness, r(198) = -.37, p < .001, indicating that higher levels of self-esteem were associated with lower levels of loneliness.]

3c. Strongest Association

Which construct has the strongest association with your outcome? [Additional Construct]

How do you know? [Among the two constructs, loneliness showed the strongest association with self-esteem (r = -.37), indicating a moderate, negative relationship. In contrast, the correlation between self-esteem and environment was small and not statistically significant (r = .11).]

3d. R Square Calculations

R Square for Predictor and Outcome: [r = .11; R2= (.11)2= .0121; R2= .0121]

R Square for Predictor and Additional Construct: [r = -.05; R2 = (-.05)2= 0.0025; R2 = 0.0025]

R Square for Outcome and Additional Construct: [r = -.37; R2 = (-.37)2= 0.1369; R2 = 0.1369]

3e. R Square Interpretation

Definition of R Square:

[The coefficient of determination (R2), represents how strongly two variables share variance.]

R Square Description for Predictor and Outcome:

[This means 1.2% of the variance in self-esteem is explained by the environment (or vice versa).]

R Square Description for Predictor and Additional Construct:

[This means .25% of the variance in the environment is explained by loneliness (or vice versa).]

R Square Description for Outcome and Additional Construct:

[This means 14% of the variance in self-esteem is explained by loneliness (or vice versa).]

Part 4: Simple Regression

4a. Simple Linear Regression

# HINT: Run simple regression predicting outcome from predictor
# Use lm() function: lm(outcome ~ predictor, data = dataset)
# Standardized use : lm(scale(outcome) ~ scale (predictor))

# Your code here:

# Model 1

model1 <- lm(self_esteem ~ environment, 
             data = data_with_scores)

# Display results Model 1
summary(model1)


Call:
lm(formula = self_esteem ~ environment, data = data_with_scores)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.7158  -6.6128   0.8528   6.8528  11.4215 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   25.010      2.629   9.515   <2e-16 ***
environment    1.569      1.024   1.533    0.127    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.518 on 198 degrees of freedom
Multiple R-squared:  0.01172,   Adjusted R-squared:  0.006732 
F-statistic: 2.349 on 1 and 198 DF,  p-value: 0.127

# Model 1.2 - Standardized Coefficients

model1.2 <- lm(scale(self_esteem) ~ scale(environment), 
               data = data_with_scores)

# Display results Model 1 - Standardized Coefficients
summary(model1.2)


Call:
lm(formula = scale(self_esteem) ~ scale(environment), data = data_with_scores)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.9507 -0.8766  0.1130  0.9084  1.5140 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)        2.697e-16  7.047e-02   0.000    1.000
scale(environment) 1.083e-01  7.065e-02   1.533    0.127

Residual standard error: 0.9966 on 198 degrees of freedom
Multiple R-squared:  0.01172,   Adjusted R-squared:  0.006732 
F-statistic: 2.349 on 1 and 198 DF,  p-value: 0.127

# Display effect size
parameters:: model_parameters(model1)

Parameter   | Coefficient |   SE |         95% CI | t(198) |      p
-------------------------------------------------------------------
(Intercept) |       25.01 | 2.63 | [19.83, 30.19] |   9.51 | < .001
environment |        1.57 | 1.02 | [-0.45,  3.59] |   1.53 | 0.127

parameters:: model_parameters(model1.2)

Parameter   | Coefficient |   SE |        95% CI |   t(198) |      p
--------------------------------------------------------------------
(Intercept) |    2.70e-16 | 0.07 | [-0.14, 0.14] | 3.83e-15 | > .999
environment |        0.11 | 0.07 | [-0.03, 0.25] |     1.53 | 0.127

4b. Regression Output Interpretation

Piece of R Output	Definition	Interpretation for Your Analysis
R square	[Represents the proportion of variance in self-esteem that can be explained by the environment raised in.]	[R2= .012 means that about 1.2% of the variance in self-esteem is explained by the environment, indicating a very weak relationship between the two.]
Regression F-test	[The F-test evaluates whether the overall regression model significantly predicts the outcome variable better than a model with no predictors.]	[The F(1, 198) = 2.35, p = .13, shows that the overall model was not statistically significant – environment does not significantly predict self-esteem]
Intercept	[The intercept is the expected value of self-esteem when the environment equals zero.]	[The intercept = 25.01 means that when the environment is 0, the predicted self-esteem score is 25.01.]
The b coefficient	[The unstandardized b coefficient represents how much the self-esteem changes for a one-unit increase in the environment.]	[The b = 1.57 indicates that for each one-unit increase in environment, self-esteem increases by 1.57 points, though this effect is not statistically significant (p = .13).]
The Beta coefficient	[The standardized beta coefficient expresses the relationship in standard deviation units, allowing comparison of the relative strength of predictors.]	[The = .11 means that for every one standard deviation increase in environment, self-esteem increases by 0.11 standard deviations. This is a small, positive, and nonsignificant effect.]

4c. APA Style Write-Up

Write your results in APA style:

[The regression model examining environment as a predictor of self-esteem was not statistically significant, F(1, 198) = 2.35, p = .13, and explained only about 1% of the variance in the self-esteem (R2= .01). Although the relationship was positive, both the unstandardized coefficient (b = 1.57) and standardized coefficient ( = .11) indicated a very small and nonsignificant effect.]

4d. Plain Language Translation

Take-home message for someone outside the class:

[People in slightly more positive environments tended to have a little higher self-esteem, but this pattern was weak and not statistically significant. In other words, environment doesn’t appear to be a strong or reliable predictor of self-esteem in this dataset.]

Part 5: Hierarchical Multiple Regression

5a. Hierarchical Multiple Regression

# HINT: Run Multiple regression
# Model 2: outcome ~ predictor + additional_construct

# Your code here:

# Model 2
model2 <- lm(self_esteem ~ environment+loneliness,
             data = data_with_scores)
# Display results
summary(model2)


Call:
lm(formula = self_esteem ~ environment + loneliness, data = data_with_scores)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5529  -5.2193   0.8694   6.1436  11.5952 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.1233     3.1605  11.430  < 2e-16 ***
environment   1.3035     0.9551   1.365    0.174    
loneliness   -1.3629     0.2449  -5.565 8.49e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.007 on 197 degrees of freedom
Multiple R-squared:  0.146, Adjusted R-squared:  0.1373 
F-statistic: 16.84 on 2 and 197 DF,  p-value: 1.777e-07

# Model 2 - Standardized Coefficients
model2.2 <- lm(scale(self_esteem)~scale(environment)+scale(loneliness), data = data_with_scores)

# Display results
summary(model2.2)


Call:
lm(formula = scale(self_esteem) ~ scale(environment) + scale(loneliness), 
    data = data_with_scores)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0617 -0.6919  0.1152  0.8144  1.5371 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)         2.523e-16  6.568e-02   0.000    1.000    
scale(environment)  8.997e-02  6.592e-02   1.365    0.174    
scale(loneliness)  -3.669e-01  6.592e-02  -5.565 8.49e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9288 on 197 degrees of freedom
Multiple R-squared:  0.146, Adjusted R-squared:  0.1373 
F-statistic: 16.84 on 2 and 197 DF,  p-value: 1.777e-07

# Display effect size
parameters::model_parameters(model2)

Parameter   | Coefficient |   SE |         95% CI | t(197) |      p
-------------------------------------------------------------------
(Intercept) |       36.12 | 3.16 | [29.89, 42.36] |  11.43 | < .001
environment |        1.30 | 0.96 | [-0.58,  3.19] |   1.36 | 0.174 
loneliness  |       -1.36 | 0.24 | [-1.85, -0.88] |  -5.56 | < .001

parameters::model_parameters(model2.2)

Parameter   | Coefficient |   SE |         95% CI |   t(197) |      p
---------------------------------------------------------------------
(Intercept) |    2.52e-16 | 0.07 | [-0.13,  0.13] | 3.84e-15 | > .999
environment |        0.09 | 0.07 | [-0.04,  0.22] |     1.36 | 0.174 
loneliness  |       -0.37 | 0.07 | [-0.50, -0.24] |    -5.56 | < .001

5b. R Square Comparison

Why is there a difference in R square across the two models?

[Adding an additional construct, loneliness, increased the explained (R²≈ 0.12) to Model 2 (R²≈0.145). The change in R² is 0.133. Loneliness contributed to R² rising (F-change ≈ 30.70, p < .001).]

5c. Intercept Comparison

Why is there a difference in the y intercept (constant) across the two models?

[The intercept changes self-esteem when all predictors are equal to 0. When adding another predictor, it centers to the point where environment and loneliness both equal zero.]

5d. Coefficient Comparison

Why is there a difference in the coefficients for your predictor across the two models?

[Coefficients change when you add predictors because each slope reflects an individual association for controlling the other predictor. Although there is a weak correlation between environment and loneliness, controlling loneliness can adjust the environment slope to reflect variance.]

5e. Predictor Association Comparison

Which predictor is more highly associated with your outcome? [Loneliness]

How can you assess this from the output? [It has the most significant standardized coefficient (|β| ≈ .37 vs .09 and a significant t test (p < .001).]

5f. APA Style Write-Up

Write your results in APA style:

[A multiple regression examined whether environment and loneliness predicted self-esteem. The overall model was significant, F(2, 197)= 16.74, p < 0.001, (R²≈ 0.145), adjusted(R²= 0.137).]

5g. Plain Language Translation

Take-home message for someone outside the class:

[When considering loneliness and self esteem, students who feel lonelier report levels of lower self esteem. There is no change between environment and self esteem once adding loneliness into one of the predictors.]

5h. Causation Inference

Can you determine if your predictors cause your outcome in this study?

[No, because we would need to conduct a randomized group or a longitudinal study to determine the outcome of this study.]

5i. Potential Confound Variable

Identify another variable from the Class Data Set that could be a confound:

[ A potential confounding variable would be socioeconomic status since it may influence both environment and self-esteem. It could affect stress levels and access to resources could affect various variables.]

5j. Extending the Analysis

How could you extend the hierarchical regression to account for this confound?

[Building a hierarchical regression model by entering SES as a main predictor.]

5k. Unmeasured Confounds

Are there other confounds not measured in the Class Data Set?

[Another variable could be race or cultural values, cultural values may affect the experience of self and their environment.]

Visualization (Optional but Recommended)

# HINT: Create visualizations to help interpret your results
# Consider scatterplots, correlation plots, or regression diagnostic plots

# Example scatterplot:

ggplot(data_with_scores, aes(x = environment, y = self_esteem )) +
    geom_point() +
    geom_smooth(method = "lm") +
    labs(
        title = "Environment vs Self Esteem",
        x = "Environment",
        y = "Self Esteem"
    ) +
    theme_minimal()

Submission Instructions

Complete all code chunks and text responses in this document
Ensure all code runs without errors
Save the document as RPA_4_YourTeamName.qmd
Render the document - this will automatically create both HTML and DOCX versions
Submit the .qmd file along with either the .html or .docx file (or both if preferred)
Make sure your team name is clearly indicated at the top of the document

This document was created for Research Methods in Applied Psychology II (APSY-UE-1137) - Fall 2025