library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
Correlation HW
Loading Libraries
Importing Data
<- read.csv(file="Data/mydata.csv", header=T)
d #
# since we're focusing on our continuous variables, we're going to drop our categorical variables. this will make some stuff we're doing later easier.
<- subset(d, select=-c(edu, gender)) d
State Your Hypotheses - PART OF YOUR WRITEUP
We predict that higher social media use will be negatively correlated with mindfulness and self-efficacy, while positively correlated with NPI.
Check Your Assumptions
Pearson’s Correlation Coefficient Assumptions
- Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
- Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (will do below)
- Outliers should be identified and removed, or results will be inaccurate (will do below)
- Relationship between the variables should be linear, or they will not be detected (will do below)
Checking for Outliers
Outliers can mask potential effects and cause Type II error (you assume there is no relationship when there really is one, e.g., false negative).
Note: You are not required to screen out outliers or take any action based on what you see here. This is something you will check and then discuss in your write-up.
# using the scale() command to standardize our variable, viewing a histogram, and then counting statistical outliers
$mindful <- scale(d$mindful, center=T, scale=T)
dhist(d$mindful)
sum(d$mindful < -3 | d$mindful > 3)
[1] 2
$socmeduse <- scale(d$socmeduse, center=T, scale=T)
dhist(d$socmeduse)
sum(d$socmeduse < -3 | d$socmeduse > 3)
[1] 0
$npi <- scale(d$npi, center=T, scale=T)
dhist(d$npi)
sum(d$npi < -3 | d$npi > 3)
[1] 0
$efficacy <- scale(d$efficacy, center=T, scale=T)
dhist(d$efficacy)
sum(d$efficacy < -3 | d$efficacy > 3)
[1] 16
Checking for Linear Relationships
Non-linear relationships cannot be detected by Pearson’s correlation (the type of correlation we’re doing here). This means that you may underestimate the relationship between a pair of variables if they have a non-linear relationship, and thus your understanding of what’s happening in your data will be inaccurate.
Visually check that relationships are linear and write a brief description of any potential nonlinearity. You will have to use your judgement. There are no penalties for answering ‘wrong’, so try not to stress out about it too much – just do your best.
# # use scatterplots to examine your continuous variables together
plot(d$mindful, d$socmeduse)
plot(d$mindful, d$npi)
plot(d$mindful, d$efficacy)
plot(d$socmeduse, d$npi)
plot(d$socmeduse, d$efficacy)
plot(d$npi, d$efficacy)
Check Your Variables
describe(d)
vars n mean sd median trimmed mad min max range skew kurtosis
mindful 1 3149 0 1 0.03 0.01 0.94 -3.06 2.72 5.78 -0.06 -0.14
socmeduse 2 3149 0 1 0.06 0.03 0.87 -2.74 2.40 5.14 -0.31 0.27
npi 3 3149 0 1 -0.41 -0.13 0.74 -0.91 2.34 3.25 0.94 -0.69
efficacy 4 3149 0 1 -0.06 0.01 1.00 -4.54 1.95 6.49 -0.24 0.46
se
mindful 0.02
socmeduse 0.02
npi 0.02
efficacy 0.02
#
# also use histograms to examine your continuous variables
hist(d$mindful)
hist(d$socmeduse)
hist(d$npi)
hist(d$efficacy)
Issues with My Data - PART OF YOUR WRITEUP
We found outliers within our data. The variable of mindfulness had two outliers, both of which were on the negative side. Self-efficacy had sixteen outliers, all of which on the negative end, some going as far as negative five.
Many of my variables show a non-linearity correlation, or a very small linearity. Social Media Use and Mindfulness does not show any clear upward or downward trend. Mindfulness and NPI also show almost no correlation, with no correlation, but the results look very different than the previous ones. Mindfulness and efficacy also show a weak relationship, but this scatterplot does look slightly more linear than the previous. Social media use and NPI looks similar to mindfulness and NPI, with non-linearity. Social media use and efficacy also show little to no correlation, no linearity.Social media use and efficacy show the same thing. NPI and efficacy also shows no correlation or linearity, but looks very different from the other scatterplots. Overall,our variables show very little to no correlation and show non-linearity.
There were no issues with skew or kurtosis.
Run Pearson’s Correlation
There are two ways to run Pearson’s correlation in R. You can calculate each correlation one-at-a-time using multiple commands, or you can calculate them all at once and report the scores in a matrix. The matrix output can be confusing at first, but it’s more efficient. We’ll do it both ways.
Run a Single Correlation
<- corr.test(d$npi, d$socmeduse) corr_output
View Single Correlation
Strong effect: Between |0.50| and |1|
Moderate effect: Between |0.30| and |0.49|
Weak effect: Between |0.10| and |0.29|
Trivial effect: Less than |0.09|
corr_output
Call:corr.test(x = d$npi, y = d$socmeduse)
Correlation matrix
[,1]
[1,] 0.06
Sample Size
[1] 3149
These are the unadjusted probability values.
The probability values adjusted for multiple tests are in the p.adj object.
[,1]
[1,] 0
To see confidence intervals of the correlations, print with the short=FALSE option
Create a Correlation Matrix
<- corr.test(d) corr_output_m
View Test Output
Strong effect: Between |0.50| and |1|
Moderate effect: Between |0.30| and |0.49|
Weak effect: Between |0.10| and |0.29|
Trivial effect: Less than |0.09|
corr_output_m
Call:corr.test(x = d)
Correlation matrix
mindful socmeduse npi efficacy
mindful 1.00 -0.10 -0.02 0.25
socmeduse -0.10 1.00 0.06 0.04
npi -0.02 0.06 1.00 0.17
efficacy 0.25 0.04 0.17 1.00
Sample Size
[1] 3149
Probability values (Entries above the diagonal are adjusted for multiple tests.)
mindful socmeduse npi efficacy
mindful 0.00 0.00 0.26 0.00
socmeduse 0.00 0.00 0.00 0.05
npi 0.26 0.00 0.00 0.00
efficacy 0.00 0.03 0.00 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
Write Up Results
We predicted that higher social media use will be negatively correlated with mindfulness and self-efficacy, while positively correlated with NPI. We did encounter some issues, such as a few outliers within two variables and scatterplots that do not show linearity.
The individual test between NPI and social media use showed that there may be a weak positive correlation between the two. It is statistically significant, but the effect size is very small, so it may not be meaningful in the real world.
We performed a correlation matrix between our four continuous variables. This showed a positive correlatio between mindfulness and efficacy as well NPI and efficacy. There was a negative correlation between mindfulness and social media use. Although there were some correlations, none were extremely strong. Each of these correlations also showed significance. Higher mindfulness can be associated with higher self-efficacy and lower social media use, as you can see in Table 1.
Although the scatterplots show nonlinearity, there is still some correlation. This could be due to the the limitations of correlation tests.
Variable | M | SD | 1 | 2 | 3 |
---|---|---|---|---|---|
Mindfulness | -0.00 | 1.00 | |||
Social Media Use | 0.00 | 1.00 | -.10** | ||
[-.14, -.07] | |||||
Narcisstic Personality (NPI) | 0.00 | 1.00 | -.02 | .06** | |
[-.06, .01] | [.02, .09] | ||||
Self Efficacy | -0.00 | 1.00 | .25** | .04* | .17** |
[.22, .28] | [.00, .07] | [.13, .20] | |||
Note: | |||||
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation. | |||||
* indicates p < .05 | |||||
** indicates p < .01. |
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.