Correlation HW

Author

Regina Cai

Loading Libraries

library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table

Importing Data

d <- read.csv(file="Data/mydata.csv", header=T)

# since we're focusing on our continuous variables, we're going to drop our categorical variables 1. this will make some stuff we're doing later easier.
d <- subset(d, select=-c(income, race_rc))

State Your Hypotheses - PART OF YOUR WRITEUP

We predict that Mindful Attention Awareness Scale (Mindful), Inventory of the Dimensions of Emerging Adulthood (Idea), and satisfaction with life (SWB) will be positively correlated, and that all three of these variables will be negatively correlated with social media usage (Socmeduce).

Check Your Assumptions

Pearson’s Correlation Coefficient Assumptions

  • Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
  • Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (confirmed above – if issues, make a note and continue)
  • Outliers should be identified and removed, or results will be inaccurate (will do below)
  • Relationship between the variables should be linear, or they will not be detected (will do below)

Checking for Outliers

Outliers can mask potential effects and cause Type II error (you assume there is no relationship when there really is one, e.g., false negative).

Note: You are not required to screen out outliers or take any action based on what you see here. This is something you will check and then discuss in your write-up.

# using the scale() command to standardize our variable, viewing a histogram, and then counting statistical outliers
d$swb <- scale(d$swb, center=T, scale=T)
hist(d$swb)

sum(d$swb < -3 | d$swb > 3)
[1] 0
d$socmeduse <- scale(d$socmeduse, center=T, scale=T)
hist(d$socmeduse)

sum(d$socmeduse < -3 | d$socmeduse > 3)
[1] 0
d$idea <- scale(d$idea, center=T, scale=T)
hist(d$idea)

sum(d$idea < -3 | d$idea > 3)
[1] 36
d$mindful <- scale(d$mindful, center=T, scale=T)
hist(d$mindful)

sum(d$mindful < -3 | d$mindful > 3)
[1] 2

Checking for Linear Relationships

Non-linear relationships cannot be detected by Pearson’s correlation (the type of correlation we’re doing here). This means that you may underestimate the relationship between a pair of variables if they have a non-linear relationship, and thus your understanding of what’s happening in your data will be inaccurate.

Visually check that relationships are linear and write a brief description of any potential nonlinearity. You will have to use your judgement. There are no penalties for answering ‘wrong’, so try not to stress out about it too much – just do your best.

# use scatterplots to examine your continuous variables together
plot(d$swb, d$socmeduse)

plot(d$swb, d$idea)

plot(d$swb, d$mindful)

plot(d$socmeduse, d$idea)

plot(d$socmeduse, d$mindful)

plot(d$idea, d$mindful)

Check Your Variables

describe(d)
          vars    n mean sd median trimmed  mad   min  max range  skew kurtosis
mindful      1 3140    0  1   0.03    0.01 0.94 -3.06 2.72  5.78 -0.06    -0.14
swb          2 3140    0  1   0.15    0.04 1.12 -2.63 1.91  4.54 -0.36    -0.45
socmeduse    3 3140    0  1   0.06    0.03 0.87 -2.74 2.40  5.14 -0.32     0.26
idea         4 3140    0  1   0.14    0.13 0.97 -6.76 1.12  7.88 -1.52     4.30
            se
mindful   0.02
swb       0.02
socmeduse 0.02
idea      0.02
# also use histograms to examine your continuous variables
hist(d$swb)

hist(d$socmeduse)

hist(d$idea)

hist(d$mindful)

Issues with My Data - PART OF YOUR WRITEUP

There are 36 outliers found within the histogram of idea and there are 2 outliers found in the histogram of mindful.

There are no indications of non-linearity in the variable pairing.

There is a kurtosis of 4.30 in the variable “idea.”

The high kurtosis and outliers found within our data shows variability and deviation from our sample. The presence of extreme values could impact our assumption of normality and leads to potential biases affecting the validity of the statistical analyses.

Run Pearson’s Correlation

There are two ways to run Pearson’s correlation in R. You can calculate each correlation one-at-a-time using multiple commands, or you can calculate them all at once and report the scores in a matrix. The matrix output can be confusing at first, but it’s more efficient. We’ll do it both ways.

Run a Single Correlation

corr_output <- corr.test(d$swb, d$socmeduse)

View Single Correlation

Strong effect: Between |0.50| and |1|

Moderate effect: Between |0.30| and |0.49|

Weak effect: Between |0.10| and |0.29|

Trivial effect: Less than |0.09|

corr_output
Call:corr.test(x = d$swb, y = d$socmeduse)
Correlation matrix 
     [,1]
[1,]  0.1
Sample Size 
[1] 3140
These are the unadjusted probability values.
  The probability values  adjusted for multiple tests are in the p.adj object. 
     [,1]
[1,]    0

 To see confidence intervals of the correlations, print with the short=FALSE option

Create a Correlation Matrix

corr_output_m <- corr.test(d)

View Test Output

Strong effect: Between |0.50| and |1|

Moderate effect: Between |0.30| and |0.49|

Weak effect: Between |0.10| and |0.29|

Trivial effect: Less than |0.09|

corr_output_m
Call:corr.test(x = d)
Correlation matrix 
          mindful   swb socmeduse  idea
mindful      1.00  0.29     -0.10 -0.12
swb          0.29  1.00      0.10 -0.01
socmeduse   -0.10  0.10      1.00  0.23
idea        -0.12 -0.01      0.23  1.00
Sample Size 
[1] 3140
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
          mindful  swb socmeduse idea
mindful         0 0.00         0 0.00
swb             0 0.00         0 0.69
socmeduse       0 0.00         0 0.00
idea            0 0.69         0 0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Write Up Results

We tested the relationship between Mindful Attention Awareness Scale (Mindful), Inventory of the Dimensions of Emerging Adulthood (Idea), satisfaction with life (SWB), and social media usage (Socmeduce) using Pearson’s correlation coefficient, with the Inventory of Dimensions of emerging adulthood testing aspects of the emerging adulthood stage, such as identity exploration, self-sufficiency, and psychological features that distinguish this phase from adolescence and full adulthood.

We hypothesize that Mindful Attention Awareness Scale (Mindful), Inventory of the Dimensions of Emerging Adulthood (Idea), and satisfaction with life (SWB) will be positively correlated, and that all three of these variables will be negatively correlated with social media usage (Socmeduce).

Our data met all assumptions for Pearson’s correlation coefficient. In terms of our hypothesis, the results generally supported our expectations, except for the relationship between mindfulness and idea, which showed a weak negative correlation of -0.12. Additionally, the correlation between idea and satisfaction with life (SWB) was negligible, with a coefficient of -0.01, indicating little to no relationship between these variables. The p-values indicated statistical significance for all relationships, except for the one between idea and satisfaction with life (SWB), which had a p-value of 0.69, suggesting no significant correlation between these two variables (see Table 1).

Table 1: Means, standard deviations, and correlations with confidence intervals
Variable M SD 1 2 3
Satisfaction with Life (SWB) 0.00 1.00
Social Media Usage (Socmeduce) 0.00 1.00 .29**
[.26, .32]
Inventory of the Dimensions of Emerging Adulthood (Idea) 0.00 1.00 -.10** .10**
[-.13, -.06] [.07, .14]
Mindful Attention Awareness Scale (Mindful) 0.00 1.00 -.12** -.01 .23**
[-.16, -.09] [-.04, .03] [.19, .26]
Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
* indicates p < .05
** indicates p < .01.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.