library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
Correlation HW
Loading Libraries
Importing Data
<- read.csv(file="Data/mydata.csv", header=T)
d
# since we're focusing on our continuous variables, we're going to drop our categorical variables. this will make some stuff we're doing later easier.
<- subset(d, select=-c(ethnicity, education)) d
State Your Hypotheses - PART OF YOUR WRITEUP
We predict that generalized anxiety disorder (GAD), eating disorder symptoms, and isolation will be positively correlated. And we predict that social support levels would be negatively correlated.
Check Your Assumptions
Pearson’s Correlation Coefficient Assumptions
- Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
- Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (will do below)
- Outliers should be identified and removed, or results will be inaccurate (will do below)
- Relationship between the variables should be linear, or they will not be detected (will do below)
Checking for Outliers
Outliers can mask potential effects and cause Type II error (you assume there is no relationship when there really is one, e.g., false negative).
Note: You are not required to screen out outliers or take any action based on what you see here. This is something you will check and then discuss in your write-up.
# using the scale() command to standardize our variable, viewing a histogram, and then counting statistical outliers
$edeq12 <- scale(d$edeq12, center=T, scale=T)
dhist(d$edeq12)
sum(d$edeq12 < -3 | d$edeq12 > 3)
[1] 0
$isolation <- scale(d$isolation, center=T, scale=T)
dhist(d$isolation)
sum(d$isolation < -3 | d$isolation > 3)
[1] 0
$support <- scale(d$support, center=T, scale=T)
dhist(d$support)
sum(d$support < -3 | d$support > 3)
[1] 0
$gad <- scale(d$gad, center=T, scale=T)
dhist(d$gad)
sum(d$gad < -3 | d$gad > 3)
[1] 0
Checking for Linear Relationships
After examining the data for linear relationships, no significant relationship was found. Additionally, the graphs indicate no evidence of potential nonlinearity.
# use scatterplots to examine your continuous variables together
plot(d$edeq12, d$isolation)
plot(d$edeq12, d$support)
plot(d$edeq12, d$gad)
plot(d$isolation, d$support)
plot(d$isolation, d$gad)
plot(d$support, d$gad)
Check Your Variables
describe(d)
vars n mean sd median trimmed mad min max range skew kurtosis
edeq12 1 1204 0 1 -0.20 -0.09 1.01 -1.22 2.87 4.09 0.67 -0.55
isolation 2 1204 0 1 -0.18 -0.03 1.32 -1.37 1.61 2.98 0.15 -1.29
support 3 1204 0 1 0.10 0.06 1.04 -2.71 1.50 4.21 -0.43 -0.56
gad 4 1204 0 1 -0.37 -0.10 0.93 -1.15 2.14 3.29 0.67 -0.74
se
edeq12 0.03
isolation 0.03
support 0.03
gad 0.03
# also use histograms to examine your continuous variables
hist(d$edeq12)
hist(d$isolation)
hist(d$support)
hist(d$gad)
Issues with My Data - PART OF YOUR WRITEUP
There were no outliers detected. As previously mentioned, no relationship was observed, and no evidence of nonlinearity was present. Additionally, there were no issues with skewness or kurtosis, as all variables fell within the range of -2 to +2.
Run Pearson’s Correlation
There are two ways to run Pearson’s correlation in R. You can calculate each correlation one-at-a-time using multiple commands, or you can calculate them all at once and report the scores in a matrix. The matrix output can be confusing at first, but it’s more efficient. We’ll do it both ways.
Run a Single Correlation
<- corr.test(d$edeq12, d$isolation) corr_output
View Single Correlation
Strong effect: Between |0.50| and |1| Moderate effect: Between |0.30| and |0.49| Weak effect: Between |0.10| and |0.29| Trivial effect: Less than |0.09|
corr_output
Call:corr.test(x = d$edeq12, y = d$isolation)
Correlation matrix
[,1]
[1,] 0.48
Sample Size
[1] 1204
These are the unadjusted probability values.
The probability values adjusted for multiple tests are in the p.adj object.
[,1]
[1,] 0
To see confidence intervals of the correlations, print with the short=FALSE option
Create a Correlation Matrix
<- corr.test(d) corr_output_m
View Test Output
Strong effect: Between |0.50| and |1| Moderate effect: Between |0.30| and |0.49| Weak effect: Between |0.10| and |0.29| Trivial effect: Less than |0.09|
corr_output_m
Call:corr.test(x = d)
Correlation matrix
edeq12 isolation support gad
edeq12 1.00 0.48 -0.34 0.51
isolation 0.48 1.00 -0.64 0.66
support -0.34 -0.64 1.00 -0.47
gad 0.51 0.66 -0.47 1.00
Sample Size
[1] 1204
Probability values (Entries above the diagonal are adjusted for multiple tests.)
edeq12 isolation support gad
edeq12 0 0 0 0
isolation 0 0 0 0
support 0 0 0 0
gad 0 0 0 0
To see confidence intervals of the correlations, print with the short=FALSE option
Write Up Results
We predict that generalized anxiety disorder (GAD), eating disorder symptoms, and isolation will be positively correlated, while social support levels will be negatively correlated. We found that these correlations were statistically significant(see table 1). The effect sizes ranged from moderate to strong (see Table 1 for details).
Variable | M | SD | 1 | 2 | 3 |
---|---|---|---|---|---|
Eating Disorder Questionnarie (edeq12) | -0.00 | 1.00 | |||
UCLA Loneliness Scale (isolation) | -0.00 | 1.00 | .48** | ||
[.43, .52] | |||||
Social Support Measure (support) | -0.00 | 1.00 | -.34** | -.64** | |
[-.39, -.29] | [-.67, -.60] | ||||
General Anxiety Disorder (gad) | 0.00 | 1.00 | .51** | .66** | -.47** |
[.47, .55] | [.63, .69] | [-.51, -.42] | |||
Note: | |||||
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation. | |||||
* indicates p < .05 | |||||
** indicates p < .01. |
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.