Correlation HW

Author

Evie Bottoms

Loading Libraries

library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table

Importing Data

d <- read.csv(file="Data/mydata.csv", header=T)
# 
# since we're focusing on our continuous variables, we're going to drop our categorical variables. this will make some stuff we're doing later easier.
d <- subset(d, select=-c(gender, phys_sym))

State Your Hypotheses - PART OF YOUR WRITEUP

We predict that markers of adulthood, satisfaction with life, and need to belong will be positively correlated, and that all three of these variables will be negatively correlated with perceived stress.

Check Your Assumptions

Pearson’s Correlation Coefficient Assumptions

  • Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
  • Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (confirmed above – if issues, make a note and continue)
  • Outliers should be identified and removed, or results will be inaccurate (will do below)
  • Relationship between the variables should be linear, or they will not be detected (will do below)

Checking for Outliers

Outliers can mask potential effects and cause Type II error (you assume there is no relationship when there really is one, e.g., false negative).

# using the scale() command to standardize our variable, viewing a histogram, and then counting statistical outliers
d$moa_independence <- scale(d$moa_independence, center=T, scale=T)
hist(d$moa_independence)

sum(d$moa_independence < -3 | d$moa_independence > 3)
[1] 51
d$swb <- scale(d$swb, center=T, scale=T)
hist(d$swb)

sum(d$swb < -3 | d$swb > 3)
[1] 0
d$belong <- scale(d$belong, center=T, scale=T)
hist(d$belong)

sum(d$belong < -3 | d$belong > 3)
[1] 7
d$stress <- scale(d$stress, center=T, scale=T)
hist(d$stress)

sum(d$stress < -3 | d$stress > 3)
[1] 0

Checking for Linear Relationships

Non-linear relationships cannot be detected by Pearson’s correlation (the type of correlation we’re doing here). This means that you may underestimate the relationship between a pair of variables if they have a non-linear relationship, and thus your understanding of what’s happening in your data will be inaccurate.

# use scatterplots to examine your continuous variables together
plot(d$moa_independence, d$swb)

plot(d$moa_independence, d$belong)

plot(d$moa_independence, d$stress)

plot(d$swb, d$belong)

plot(d$swb, d$stress)

plot(d$belong, d$stress)

Check Your Variables

describe(d)
                 vars    n mean sd median trimmed  mad   min  max range  skew
moa_independence    1 3090    0  1   0.28    0.15 1.06 -5.46 0.99  6.45 -1.44
swb                 2 3090    0  1   0.15    0.04 1.12 -2.62 1.91  4.53 -0.37
belong              3 3090    0  1   0.11    0.03 0.98 -3.19 2.91  6.10 -0.26
stress              4 3090    0  1  -0.08   -0.01 0.99 -2.91 2.74  5.66  0.03
                 kurtosis   se
moa_independence     2.52 0.02
swb                 -0.45 0.02
belong              -0.13 0.02
stress              -0.17 0.02
# also use histograms to examine your continuous variables
hist(d$moa_independence)

hist(d$swb)

hist(d$belong)

hist(d$stress)

Issues with My Data - PART OF YOUR WRITEUP

I have two outliers. The first one is markers of adulthood, it has 51 outliers. The second one is need to belong, it has 7 outliers.

I have three indications of non-linearity. The non-linearity pairings are markers of adulthood with satisfaction with life, markers of adulthood with need to belong, and markers of adulthood with perceived stress.

Make a note here if you hvae any skew/kurtosis I have one variable where kurtosis is outside of the range of -2 to +2. The variable is markers of adulthood, and it has a kurtosis of 2.52.

The data set presents three key issues that may affect the interpretation of results. First, indications of non-linearity in three variables suggest that standard linear assumptions may not fully capture the relationships, potentially leading to misinterpretation of correlations or regression estimates. Second, the presence of outliers in two variables may skew statistical measures, influencing central tendencies and variance, thereby affecting the robustness of analyses. Lastly, one variable exhibits kurtosis outside the expected range, indicating deviations from normality that could impact statistical tests relying on normal distribution assumptions.

Run Pearson’s Correlation

There are two ways to run Pearson’s correlation in R. You can calculate each correlation one-at-a-time using multiple commands, or you can calculate them all at once and report the scores in a matrix. The matrix output can be confusing at first, but it’s more efficient. We’ll do it both ways.

Run a Single Correlation

corr_output <- corr.test(d$moa_independence, d$swb)

View Single Correlation

Strong effect: Between |0.50| and |1| Moderate effect: Between |0.30| and |0.49| Weak effect: Between |0.10| and |0.29| Trivial effect: Less than |0.09|

corr_output
Call:corr.test(x = d$moa_independence, y = d$swb)
Correlation matrix 
     [,1]
[1,]  0.1
Sample Size 
[1] 3090
These are the unadjusted probability values.
  The probability values  adjusted for multiple tests are in the p.adj object. 
     [,1]
[1,]    0

 To see confidence intervals of the correlations, print with the short=FALSE option

Create a Correlation Matrix

corr_output_m <- corr.test(d)

View Test Output

Strong effect: Between |0.50| and |1| Moderate effect: Between |0.30| and |0.49| Weak effect: Between |0.10| and |0.29| Trivial effect: Less than |0.09|

corr_output_m
Call:corr.test(x = d)
Correlation matrix 
                 moa_independence   swb belong stress
moa_independence             1.00  0.10   0.02  -0.03
swb                          0.10  1.00  -0.15  -0.51
belong                       0.02 -0.15   1.00   0.30
stress                      -0.03 -0.51   0.30   1.00
Sample Size 
[1] 3090
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
                 moa_independence swb belong stress
moa_independence             0.00   0   0.32   0.32
swb                          0.00   0   0.00   0.00
belong                       0.30   0   0.00   0.00
stress                       0.16   0   0.00   0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Write Up Results

Write up your results. Again, make sure to maintain an appropriate tone, and follow APA guidelines for reporting statistical results. I recommend following the below outline:

We predict that markers of adulthood, satisfaction with life, and need to belong will be positively correlated, and that all three of these variables will be negatively correlated with perceived stress. The data set presents three key issues that may affect the interpretation of results. First, indications of non-linearity in three variables suggest that standard linear assumptions may not fully capture the relationships, potentially leading to misinterpretation of correlations or regression estimates. Second, the presence of outliers in two variables may skew statistical measures, influencing central tendencies and variance, thereby affecting the robustness of analyses. Lastly, one variable exhibits kurtosis outside the expected range, indicating deviations from normality that could impact statistical tests relying on normal distribution assumptions. The results show p = 0.00 (p < 0.001). The results also show that the relationships between satisfaction with life and markeres of adulthood, satisfaction with life and need to belong, satisfaction with life and perceived stress, and need to belong and perceived stress are statistically significant (p < 0.05), suggesting meaningful associations. Additionally, the relationships between markers of adulthood and need to belong, and markers of adulthood and perceived stress are not statistically significant (p > 0.05), meaning no strong evidence of correlation. (refer to table 1).

The effect size was small according to Cohen (1988).

Table 1: Means, standard deviations, and correlations with confidence intervals
Variable M SD 1 2 3
Markers of Adulthood 0.00 1.00
Satisfaction with Life 0.00 1.00 .10**
[.07, .14]
Need to Belong 0.00 1.00 .02 -.15**
[-.02, .05] [-.18, -.11]
Perceived Stress (PSS) 0.00 1.00 -.03 -.51** .30**
[-.06, .01] [-.53, -.48] [.27, .34]
Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
* indicates p < .05
** indicates p < .01.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.