1 Loading Libraries

#install.packages("apaTables")
#install.packages("kableExtra")

library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table

2 Importing Data

d <- read.csv(file="Data/projectdata.csv", header=T)

# For HW, import the your project dataset you cleaned previously; this will be the dataset you'll use throughout the rest of the semester

3 State Your Hypothesis

We predict that conscientiousness, depression, support, and mental well-being will be correlated with each other. Additionally, we predict that there will be a significant, positive correlation between conscientiousness and mental well-being.

4 Check Your Variables

# you only need to check the variables you're using in the current analysis
# it's always a good idea to look them to be sure that everything is correct
str(d)
## 'data.frame':    917 obs. of  7 variables:
##  $ X          : int  20 30 31 33 57 58 81 104 113 117 ...
##  $ age        : chr  "1 under 18" "1 under 18" "4 between 36 and 45" "4 between 36 and 45" ...
##  $ urban_rural: chr  "city" "city" "town" "city" ...
##  $ big5_con   : num  3.33 5.33 5.67 6 3.33 ...
##  $ phq        : num  3.33 1 2.33 1.11 2.33 ...
##  $ support    : num  2.17 5 2.5 3.67 4.17 ...
##  $ swemws     : num  2.29 4.29 3.29 4 3.29 ...
# We're going to create a fake variable for this lab, so that we have four variables. 
# NOTE: YOU WILL SKIP THIS STEP FOR THE HOMEWORK!

# Since we're focusing only on our continuous variables, we're going to subset them into their own dataframe. This will make some stuff we're doing later on easier.

d2 <- subset(d, select=c(big5_con, phq, support, swemws))

# You can use the describe() command on an entire dataframe (d) or just on a single variable (d$pss)

describe(d2)
##          vars   n mean   sd median trimmed  mad min max range  skew kurtosis
## big5_con    1 917 4.84 1.20   5.00    4.87 1.48   1   7     6 -0.27    -0.31
## phq         2 917 2.05 0.85   1.89    1.97 0.99   1   4     3  0.66    -0.61
## support     3 917 3.61 0.94   3.67    3.66 0.99   1   5     4 -0.45    -0.56
## swemws      4 917 3.18 0.84   3.29    3.19 0.85   1   5     4 -0.19    -0.38
##            se
## big5_con 0.04
## phq      0.03
## support  0.03
## swemws   0.03
# NOTE: Our fake variable has high kurtosis, which we'll ignore for the lab. You don't need to discuss univariate normality in the results write-ups for the labs/homework, but you will need to discuss it in your final project manuscript.

# also use histograms to examine your continuous variables

hist(d$big5_con)

hist(d$phq)

hist(d$support)

hist(d$swemws)

# last, use scatterplots to examine your continuous variables together, for each pairing

plot(d$big5_con, d$phq)

plot(d$big5_con, d$support)

plot(d$big5_con, d$swemws)

plot(d$phq, d$support)

plot(d$phq, d$swemws)

plot(d$swemws, d$support)

5 Check Your Assumptions

5.1 Pearson’s Correlation Coefficient Assumptions

  • Should have two measurements for each participant
  • Variables should be continuous and normally distributed
  • Outliers should be identified and removed
  • Relationship between the variables should be linear

5.1.1 Checking for Outliers

Note: You are not required to screen out outliers or take any action based on what you see here. This is something you will check and then discuss in your write-up.

# We are going to standardize (z-score) all of our variables, and check them for outliers.

d2$big5_con <- scale(d2$big5_con, center=T, scale=T)
hist(d2$big5_con)

sum(d2$big5_con < -3 | d2$big5_con > 3)
## [1] 1
d2$phq <- scale(d2$phq, center=T, scale=T)
hist(d2$phq)

sum(d2$phq < -3 | d2$phq > 3)
## [1] 0
d2$support <- scale(d2$support, center=T, scale=T)
hist(d2$support)

sum(d2$support < -3 | d2$support > 3)
## [1] 0
d2$swemws <- scale(d2$swemws, center=T, scale=T)
hist(d2$swemws)

sum(d2$swemws < -3 | d2$swemws > 3)
## [1] 0

5.2 Issues with My Data

Three of my variables meet all of the assumptions of Pearson’s correlation coefficient. One variable, conscientiousness, had 1 outlier. Outliers can distort the relationship between two variables and sway the correlation in their direction.

6 Run a Single Correlation

corr_output <- corr.test(d2$big5_con, d2$swemws)

7 View Single Correlation

corr_output
## Call:corr.test(x = d2$big5_con, y = d2$swemws)
## Correlation matrix 
##      [,1]
## [1,] 0.38
## Sample Size 
## [1] 917
## These are the unadjusted probability values.
##   The probability values  adjusted for multiple tests are in the p.adj object. 
##      [,1]
## [1,]    0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

8 Create a Correlation Matrix

Strong: Between |0.50| and |1| Moderate: Between |0.30| and |0.49| Weak: Between |0.10| and |0.29| Trivial: Less than |0.09|

Remember, Pearson’s r is also an effect size!

corr_output_m <- corr.test(d2)

9 View Test Output

corr_output_m
## Call:corr.test(x = d2)
## Correlation matrix 
##          big5_con   phq support swemws
## big5_con     1.00 -0.41    0.27   0.38
## phq         -0.41  1.00   -0.52  -0.76
## support      0.27 -0.52    1.00   0.59
## swemws       0.38 -0.76    0.59   1.00
## Sample Size 
## [1] 917
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##          big5_con phq support swemws
## big5_con        0   0       0      0
## phq             0   0       0      0
## support         0   0       0      0
## swemws          0   0       0      0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option
# Remember to report the p-values from the matrix that are ABOVE the diagonal

10 Write Up Results

To test our hypothesis that conscientiousness, depression, support, and mental well-being would be correlated with one another, we calculated a series of Pearson’s correlation coefficients. Three of the variables (depression, support, and mental well-being) met the required assumptions of the test, with all three meeting the standards of normality and containing no outliers. One variable, conscientiousness, had 1 outlier; so any significant results involving conscientiousness should be evaluated carefully.

As predicted, we found that all four variables were significantly correlated (all ps < .001). The effect size of one correlation (support and conscientiousness) was weak (r = 0.27), while two correlations showed moderate effect sizes (depression and conscientiousness, r = 0.41, mental well-being and conscientiousness, r = 0.38). Additionally, three correlations demonstrated strong effect sizes (support and depression, r = 0.52, support and mental well-being, r = 0.59, depression and mental well-being, r = 0.76; Cohen, 1988). Our second hypothesis was also supported, that conscientiousness and mental well-being would be significantly positively correlated and that mental well-being would be higher in participants who reported higher levels of conscientiousness (r = 0.38, p < .001), as can be seen by the correlation coefficients reported in Table 1.

Table 1: Means, standard deviations, and correlations with confidence intervals
Variable M SD 1 2 3
Conscientiousness 4.84 1.20
Depression 2.05 0.85 -.41**
[-.46, -.35]
Support 3.61 0.94 .27** -.52**
[.21, .33] [-.57, -.47]
Mental Well-being 3.18 0.84 .38** -.76** .59**
[.33, .44] [-.78, -.73] [.55, .63]
Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
* indicates p < .05
** indicates p < .01.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.