library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation tableCorrelation HW
Loading Libraries
Importing Data
d <- read.csv(file="Data/mydata.csv", header=T)
# since we're focusing on our continuous variables, we're going to drop our categorical variables. this will make some stuff we're doing later easier.
d <- subset(d, select=-c(gender, race_rc))State Your Hypotheses - PART OF YOUR WRITEUP
We predict that social media use will be negatively correlated with both satisfaction with life and safety. We also predict that interpersonal exploitativeness will be positively correlated with social media use. Finally, interpersonal exploitativeness is hypothesized to be negatively correlated with both satisfaction with life and safety.
Check Your Assumptions
Pearson’s Correlation Coefficient Assumptions
- Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
- Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (confirmed above – if issues, make a note and continue)
- Outliers should be identified and removed, or results will be inaccurate (will do below)
- Relationship between the variables should be linear, or they will not be detected (will do below)
Checking for Outliers
Outliers can mask potential effects and cause Type II error (you assume there is no relationship when there really is one, e.g., false negative).
Note: You are not required to screen out outliers or take any action based on what you see here. This is something you will check and then discuss in your write-up.
# using the scale() command to standardize our variable, viewing a histogram, and then counting statistical outliers
d$socmeduse <- scale(d$socmeduse, center=T, scale=T)
hist(d$socmeduse)sum(d$socmeduse < -3 | d$socmeduse > 3)[1] 0
d$swb <- scale(d$swb, center=T, scale=T)
hist(d$swb)sum(d$swb < -3 | d$swb > 3)[1] 0
d$moa_safety <- scale(d$moa_safety, center=T, scale=T)
hist(d$moa_safety)sum(d$moa_safety < -3 | d$moa_safety > 3)[1] 26
d$exploit <- scale(d$exploit, center=T, scale=T)
hist(d$exploit)sum(d$exploit < -3 | d$exploit > 3)[1] 33
Checking for Linear Relationships
Non-linear relationships cannot be detected by Pearson’s correlation (the type of correlation we’re doing here). This means that you may underestimate the relationship between a pair of variables if they have a non-linear relationship, and thus your understanding of what’s happening in your data will be inaccurate.
Visually check that relationships are linear and write a brief description of any potential nonlinearity. You will have to use your judgement. There are no penalties for answering ‘wrong’, so try not to stress out about it too much – just do your best.
# use scatterplots to examine your continuous variables together
plot(d$socmeduse, d$swb)plot(d$socmeduse, d$moa_safety)plot(d$socmeduse, d$exploit)plot(d$swb, d$moa_safety)plot(d$swb, d$exploit)plot(d$moa_safety, d$exploit)Check Your Variables
describe(d) vars n mean sd median trimmed mad min max range skew
swb 1 3109 0 1 0.14 0.04 1.12 -2.63 1.91 4.54 -0.37
socmeduse 2 3109 0 1 0.06 0.03 0.86 -2.74 2.40 5.13 -0.31
moa_safety 3 3109 0 1 0.07 0.09 1.16 -3.44 1.24 4.68 -0.71
exploit 4 3109 0 1 -0.28 -0.13 1.08 -1.01 3.35 4.37 0.94
kurtosis se
swb -0.45 0.02
socmeduse 0.26 0.02
moa_safety 0.04 0.02
exploit 0.36 0.02
# also use histograms to examine your continuous variables
hist(d$socmeduse)hist(d$swb)hist(d$moa_safety)hist(d$exploit)Issues with My Data - PART OF YOUR WRITEUP
We had 26 outliers in the safety variable, and 33 outliers within the interpersonal exploitativeness variable. We should be wary due to biases in correlational analysis possibly caused by the outliers. The variables social media use and subjective well-being has a potential problem of non-linearity and this could be an issue due to the researcher potentially underestimating true correlational strength. Exploit has a high positive skew, moa safety has a high negative skew, and swb or subjective well-being shows a small negative skew and negative kurtosis. We care about this because it can cause misrepresentation of the assumed correlation and inferences made from the data and could even warrant a need for a change in research methods if too severe.
Run Pearson’s Correlation
There are two ways to run Pearson’s correlation in R. You can calculate each correlation one-at-a-time using multiple commands, or you can calculate them all at once and report the scores in a matrix. The matrix output can be confusing at first, but it’s more efficient. We’ll do it both ways.
Run a Single Correlation
corr_output <- corr.test(d$socmeduse, d$swb )View Single Correlation
Strong effect: Between |0.50| and |1| Moderate effect: Between |0.30| and |0.49| Weak effect: Between |0.10| and |0.29| Trivial effect: Less than |0.09|
corr_outputCall:corr.test(x = d$socmeduse, y = d$swb)
Correlation matrix
[,1]
[1,] 0.1
Sample Size
[1] 3109
These are the unadjusted probability values.
The probability values adjusted for multiple tests are in the p.adj object.
[,1]
[1,] 0
To see confidence intervals of the correlations, print with the short=FALSE option
Create a Correlation Matrix
corr_output_m <- corr.test(d)View Test Output
- Strong effect: Between |0.50| and |1|
- Moderate effect: Between |0.30| and |0.49|
- Weak effect: Between |0.10| and |0.29|
- Trivial effect: Less than |0.09|
corr_output_mCall:corr.test(x = d)
Correlation matrix
swb socmeduse moa_safety exploit
swb 1.00 0.10 0.13 -0.08
socmeduse 0.10 1.00 0.03 0.09
moa_safety 0.13 0.03 1.00 -0.09
exploit -0.08 0.09 -0.09 1.00
Sample Size
[1] 3109
Probability values (Entries above the diagonal are adjusted for multiple tests.)
swb socmeduse moa_safety exploit
swb 0 0.00 0.00 0
socmeduse 0 0.00 0.11 0
moa_safety 0 0.11 0.00 0
exploit 0 0.00 0.00 0
To see confidence intervals of the correlations, print with the short=FALSE option
Write Up Results
Our hypothesis was as follows, We predicted that social media use will be negatively correlated with both satisfaction with life and safety. We also predicted that interpersonal exploitativeness will be positively correlated with social media use. Finally, interpersonal exploitativeness was hypothesized to be negatively correlated with both satisfaction with life and safety. Some slihgt issues occurred, We had 26 outliers in the safety variable, and 33 outliers within the interpersonal exploitativeness variable. We should be wary due to biases in correlational analysis possibly caused by the outliers. The variables social media use and subjective well-being has a potential problem of non-linearity and this could be an issue due to the researcher potentially underestimating true correlational strength. Exploit has a high positive skew, moa safety has a high negative skew, and swb or subjective well-being shows a small negative skew and negative kurtosis. We care about this because it can cause misrepresentation of the assumed correlation and inferences made from the data and could even warrant a need for a change in research methods if too severe. Reporting results with the following significant correlations found in our variables:
Social media use and satisfaction with life: r = 0.10, p < .01, weak positive correlation.
Social media use and exploit: r = 0.09, p < .01, weak positive correlation.
Satisfaction with life and safety: r = 0.13, p < .01, weak positive correlation.
Satisfaction with life and exploit: r = -0.08, p < .01, weak negative correlation.
Safety and exploit: r = -0.09, p < .01, weak negative correlation.
The effect sizes can be interpreted to indicate trivial to small effects. (Cohen, 1988). Citation for all results.^ *p.s. I know we don’t have to do all the r-values it is just easier for me to visualize results like this.
| Variable | M | SD | 1 | 2 | 3 |
|---|---|---|---|---|---|
| Social Media Use (SOCMEDUSE) | 0.00 | 1.00 | |||
| Average daily time spent on spcial media | |||||
| Satisfaction with Life Scale (SWB) | 0.00 | 1.00 | .10** | ||
| Overall happieness with life | [.07, .14] | ||||
| Interpersonal Exploitativeness Scale (EXPLOIT) | 0.00 | 1.00 | .13** | .03 | |
| Tendency to exploit other in relationships | [.09, .16] | [-.01, .06] | |||
| Safety (MOA_SAFETY) | 0.00 | 1.00 | -.08** | .09** | -.09** |
| Perceived safety in local areas | [-.11, -.04] | [.06, .13] | [-.13, -.06] | ||
| Note: | |||||
| M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation. | |||||
| * indicates p < .05 | |||||
| ** indicates p < .01. |
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.