Correlation Lab

Author

Heather Perkins

Loading Libraries

library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table

Importing Data

 d <- read.csv(file="Data/mydata.csv", header=T)

# since we're focusing on our continuous variables, we're going to drop our categorical variables. this will make some stuff we're doing later easier.
d <- subset(d, select=-c( gender, age))

State Your Hypotheses - PART OF YOUR WRITEUP

I predict that Personality Openness Subscale , Personality Extraverison Subscale, and Personlity Agreeableness Subscale will be positively correlated, and all three personality variables will be negatively correlated with Penn State Worry Scale.

State your hypotheses. Remember, you are looking at the correlations between all four of your continuous/quantitatve variables. Depending on what you predict, you might need to write several sentences to describe all of the relationships. Make sure all four variables are included.

Check Your Assumptions

Pearson’s Correlation Coefficient Assumptions

Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (Will do below)
Outliers should be identified and removed, or results will be inaccurate (will do below)
Relationship between the variables should be linear, or they will not be detected (will do below)

Checking for Outliers

Outliers can mask potential effects and cause Type II error (you assume there is no relationship when there really is one, e.g., false negative).

Note: You are not required to screen out outliers or take any action based on what you see here. This is something you will check and then discuss in your write-up.

# using the scale() command to standardize our variable, viewing a histogram, and then counting statistical outliers

d$big5_open <- scale(d$big5_open, center=T, scale=T)
hist(d$big5_open)

sum(d$big5_open < -3 | d$big5_open > 3)

[1] 8

d$big5_ext <- scale(d$big5_ext, center=T, scale=T)
hist(d$big5_ext)

sum(d$big5_ext< -3 | d$big5_ext > 3)

[1] 0

d$big5_agr <- scale(d$big5_agr, center=T, scale=T)
hist(d$big5_agr)

sum(d$big5_agr < -3 | d$big5_agr > 3)

[1] 4

d$pswq <- scale(d$pswq, center=T, scale=T)
hist(d$pswq)

sum(d$pswq < -3 | d$pswq > 3)

[1] 0

Checking for Linear Relationships

Non-linear relationships cannot be detected by Pearson’s correlation (the type of correlation we’re doing here). This means that you may underestimate the relationship between a pair of variables if they have a non-linear relationship, and thus your understanding of what’s happening in your data will be inaccurate.

Visually check that relationships are linear and write a brief description of any potential nonlinearity. You will have to use your judgement. There are no penalties for answering ‘wrong’, so try not to stress out about it too much – just do your best.

# # use scatterplots to examine your continuous variables together
plot(d$big5_open, d$big5_ext)

plot(d$big5_open, d$big5_agr)

plot(d$big5_open, d$pswq)

plot(d$big5_ext, d$big5_agr)

plot(d$big5_ext, d$pswq)

plot(d$big5_agr, d$pswq)

Check Your Variables

describe(d)

          vars    n mean sd median trimmed  mad   min  max range  skew kurtosis
big5_open    1 1264    0  1   0.11    0.07 0.87 -3.72 1.58  5.30 -0.73     0.48
big5_ext     2 1264    0  1  -0.02    0.03 1.02 -2.33 1.82  4.15 -0.24    -0.78
big5_agr     3 1264    0  1   0.01    0.05 0.88 -3.53 1.78  5.31 -0.44     0.03
pswq         4 1264    0  1   0.04    0.01 1.18 -2.22 2.40  4.63 -0.08    -0.92
            se
big5_open 0.03
big5_ext  0.03
big5_agr  0.03
pswq      0.03

# 
# # also use histograms to examine your continuous variables
 hist(d$big5_open)

 hist(d$big5_ext)

 hist(d$big5_agr)

 hist(d$pswq)

Issues with My Data - PART OF YOUR WRITEUP

I found outliers in variables Personality Openness Subscale with 8 and Personality Agreegableness Subcale with 4. Its important to be wary of outliers since it can distort results and create misleading interpretations of analyzing the data.

Briefly describe any issues with your data and how they might impact the interpretation of your results. As usual, this should be written in an appropriate scientific tone.

Run Pearson’s Correlation

There are two ways to run Pearson’s correlation in R. You can calculate each correlation one-at-a-time using multiple commands, or you can calculate them all at once and report the scores in a matrix. The matrix output can be confusing at first, but it’s more efficient. We’ll do it both ways.

Run a Single Correlation

 corr_output <- corr.test(d$big5_open, d$big5_ext)

View Single Correlation

Strong effect: Between |0.50| and |1| Moderate effect: Between |0.30| and |0.49| Weak effect: Between |0.10| and |0.29| Trivial effect: Less than |0.09|

 corr_output

Call:corr.test(x = d$big5_open, y = d$big5_ext)
Correlation matrix 
     [,1]
[1,] 0.18
Sample Size 
[1] 1264
These are the unadjusted probability values.
  The probability values  adjusted for multiple tests are in the p.adj object. 
     [,1]
[1,]    0

 To see confidence intervals of the correlations, print with the short=FALSE option

Create a Correlation Matrix

 corr_output_m <- corr.test(d)

View Test Output

Strong effect: Between |0.50| and |1| Moderate effect: Between |0.30| and |0.49| Weak effect: Between |0.10| and |0.29| Trivial effect: Less than |0.09|

 corr_output_m

Call:corr.test(x = d)
Correlation matrix 
          big5_open big5_ext big5_agr  pswq
big5_open      1.00     0.18     0.06  0.00
big5_ext       0.18     1.00     0.08 -0.25
big5_agr       0.06     0.08     1.00 -0.07
pswq           0.00    -0.25    -0.07  1.00
Sample Size 
[1] 1264
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
          big5_open big5_ext big5_agr pswq
big5_open      0.00     0.00     0.08 0.90
big5_ext       0.00     0.00     0.03 0.00
big5_agr       0.04     0.01     0.00 0.05
pswq           0.90     0.00     0.02 0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Write Up Results

I predict that Personality Openness Subscale , Personality Extraverison Subscale, and Personality Agreeableness Subscale will be positively correlated, and all three personality variables will be negatively correlated with Penn State Worry Scale. During the outleir check, I found outliers in variables Personality Openness Subscale with 8 and Personality Agreeableness Subcale with 4. Its important to be wary of outliers since it can distort results and create misleading interpretations of analyzing the data. With the results, the data was significant and it fell under either weak effect or trivial effect size, see table 1 for results.

Write up your results. Again, make sure to maintain an appropriate tone, and follow APA guidelines for reporting statistical results. I recommend following the below outline:

Briefly restate your hypothesis
Describe any issues with your data (you can copy/paste from above, just make sure everything flows).
Report your results. Since we are reporting our results in a table, you do NOT have to list out each individual r-value. Instead, I recommend focusing only on your significant results, and including your p-value and interpretation of effect size (trivial, small, medium, or large; don’t forget the citation).
Make sure to include a reference to Table 1 (created using the code below)

Table 1: Means, standard deviations, and correlations with confidence intervals
Variable	M	SD	1	2	3
Openness Subscale (big5_open)	0.00	1.00

Extraverison Subscale (big5_ext)	0.00	1.00	.18**
			[.12, .23]

Agreeables Subscale (big5_agr)	0.00	1.00	.06*	.08**
			[.00, .11]	[.02, .13]

Penn State Worry Scale (pswq)	-0.00	1.00	-.00	-.25**	-.07*
			[-.06, .05]	[-.30, -.20]	[-.12, -.01]

Note:
M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation.
^* indicates p < .05
^** indicates p < .01.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.