Correlation HW

Author

Nat Haseltine

Loading Libraries

library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table

Importing Data

d <- read.csv(file="Data/mydata.csv", header=T)

# since we're focusing on our continuous variables, we're going to drop our categorical variables. this will make some stuff we're doing later easier.
d <- subset(d, select=-c(gender, age))

State Your Hypotheses - PART OF YOUR WRITEUP

We predict that social well-being and support are positively correlated and stress level negatively correlated with those two variables. Social media use is not a prevalent factor in the other variables.

We predict that intolerance of uncertainty, depression score, and perceived stress will be positively correlated, and all three mental health variables will be negatively correlated with self-esteem.

Check Your Assumptions

Pearson’s Correlation Coefficient Assumptions

  • Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
  • Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (confirmed above – if issues, make a note and continue)(will do below)
  • Outliers should be identified and removed, or results will be inaccurate (will do below)
  • Relationship between the variables should be linear, or they will not be detected (will do below)

Checking for Outliers

Outliers can mask potential effects and cause Type II error (you assume there is no relationship when there really is one, e.g., false negative).

Note: You are not required to screen out outliers or take any action based on what you see here. This is something you will check and then discuss in your write-up.

# using the scale() command to standardize our variable, viewing a histogram, and then counting statistical outliers
d$swb <- scale(d$swb, center=T, scale=T)
hist(d$swb)

sum(d$swb < -3 | d$swb > 3)
[1] 0
d$support <- scale(d$support, center=T, scale=T)
hist(d$support)

sum(d$support < -3 | d$support > 3)
[1] 27
d$socmeduse <- scale(d$socmeduse, center=T, scale=T)
hist(d$socmeduse)

sum(d$socmeduse < -3 | d$socmeduse > 3)
[1] 0
d$stress <- scale(d$stress, center=T, scale=T)
hist(d$stress)

sum(d$stress < -3 | d$stress > 3)
[1] 0

Checking for Linear Relationships

From a visual standpoint, all six relationship are linear. Though we should be wary of potential nonlinearity with the relationship between support and social media use and the relationship between support stress, as these relationship both have large clumps**.

# use scatterplots to examine your continuous variables together
plot(d$swb, d$support)

plot(d$swb, d$socmeduse)

plot(d$swb, d$stress)

plot(d$support, d$socmeduse)

plot(d$support, d$stress)

plot(d$socmeduse, d$stress)

Check Your Variables

describe(d)
          vars    n mean sd median trimmed  mad   min  max range  skew kurtosis
swb          1 2161    0  1   0.05    0.04 1.12 -2.59 1.93  4.52 -0.35    -0.49
support      2 2161    0  1   0.19    0.11 0.87 -4.89 1.30  6.18 -1.08     1.31
socmeduse    3 2161    0  1   0.09    0.03 0.86 -2.71 2.42  5.12 -0.31     0.20
stress       4 2161    0  1   0.06    0.00 0.99 -2.96 2.57  5.53 -0.02    -0.15
            se
swb       0.02
support   0.02
socmeduse 0.02
stress    0.02
# also use histograms to examine your continuous variables
hist(d$swb)

hist(d$support)

hist(d$socmeduse)

hist(d$stress)

Issues with My Data - PART OF YOUR WRITEUP

The variable ‘support’ has some outliers below -3, while the all of the other variables do not (social well-being, social media use, stress). We need to be wary of outliers as it creates the possibility for the data to be skewed and not relevant. When support is paired with the other three variables, we need to be wary of these results as it contians outliers. This can easily be remedied by not including the data from below -3.

Run Pearson’s Correlation

There are two ways to run Pearson’s correlation in R. You can calculate each correlation one-at-a-time using multiple commands, or you can calculate them all at once and report the scores in a matrix. The matrix output can be confusing at first, but it’s more efficient. We’ll do it both ways.

Run a Single Correlation

corr_output <- corr.test(d$swb, d$support)

View Single Correlation

  • Strong effect: Between |0.50| and |1|
  • Moderate effect: Between |0.30| and |0.49|
  • Weak effect: Between |0.10| and |0.29|
  • Trivial effect: Less than |0.09|
corr_output
Call:corr.test(x = d$swb, y = d$support)
Correlation matrix 
     [,1]
[1,] 0.46
Sample Size 
[1] 2161
These are the unadjusted probability values.
  The probability values  adjusted for multiple tests are in the p.adj object. 
     [,1]
[1,]    0

 To see confidence intervals of the correlations, print with the short=FALSE option

Create a Correlation Matrix

corr_output_m <- corr.test(d)

View Test Output

  • Strong effect: Between |0.50| and |1|
  • Moderate effect: Between |0.30| and |0.49|
  • Weak effect: Between |0.10| and |0.29|
  • Trivial effect: Less than |0.09|
corr_output_m
Call:corr.test(x = d)
Correlation matrix 
            swb support socmeduse stress
swb        1.00    0.46      0.09  -0.49
support    0.46    1.00      0.19  -0.20
socmeduse  0.09    0.19      1.00   0.11
stress    -0.49   -0.20      0.11   1.00
Sample Size 
[1] 2161
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
          swb support socmeduse stress
swb         0       0         0      0
support     0       0         0      0
socmeduse   0       0         0      0
stress      0       0         0      0

 To see confidence intervals of the correlations, print with the short=FALSE option

Write Up Results

Our hypothesis is that Females experience more support compared to Males. The only issue perceived in the data were the outliers present in the variable ‘support,’ with participants below the range of -3 and 3.

Our results are significant with a p-value of 0.31. Our effect size was small according to Cohen (1988).

Table 1: Means, standard deviations, and correlations with confidence intervals
Variable M SD 1 2 3
Social Well-Being (SWB) 0.00 1.00
Support (SUPPORT) 0.00 1.00 .46**
[.43, .50]
Social Media Use (SOCMEDUSE) 0.00 1.00 .09** .19**
[.05, .13] [.15, .23]
Stress (STRESS) 0.00 1.00 -.49** -.20** .11**
[-.52, -.46] [-.24, -.16] [.07, .15]
Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
* indicates p < .05
** indicates p < .01.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.