library(psych) # for the describe() command and the corr.test() command
library(apaTables) # to create our correlation table
library(kableExtra) # to create our correlation table
Correlation Lab
Loading Libraries
Importing Data
<- read.csv(file="Data/mydata.csv", header=T)
d #
# since we're focusing on our continuous variables, we're going to drop our categorical variables. this will make some stuff we're doing later easier.
<- subset(d, select=-c(gender, phq)) d
State Your Hypotheses - PART OF YOUR WRITEUP
Self-esteem (RSE) will significantly correlate with perceived stress (PSS) and disordered eating behaviors (EDE-Q12), with lower self-esteem associated with higher stress and disordered eating.
Check Your Assumptions
Before conducting Pearson’s correlation, we must check for linearity, normality, and the absence of outliers for the three included variables: self-esteem (RSE), perceived stress (PSS), and disordered eating behaviors (EDE-Q12).
Pearson’s Correlation Coefficient Assumptions
- Should have two measurements for each participant for each variable (confirmed by earlier procedures – we dropped any participants with missing data)
- Variables should be continuous and normally distributed, or assessments of the relationship may be inaccurate (confirmed above – if issues, make a note and continue)
- Outliers should be identified and removed, or results will be inaccurate (will do below)
- Relationship between the variables should be linear, or they will not be detected (will do below)
Checking for Outliers
Outliers can mask potential effects and cause Type II error (you assume there is no relationship when there really is one, e.g., false negative).
Note: You are not required to screen out outliers or take any action based on what you see here. This is something you will check and then discuss in your write-up.
# using the scale() command to standardize our variable, viewing a histogram, and then counting statistical outliers
$rse <- scale(d$rse, center=T, scale=T)
dhist(d$rse)
sum(d$rse < -3 | d$rse > 3)
[1] 0
$pss <- scale(d$pss, center=T, scale=T)
dhist(d$pss)
sum(d$pss < -3 | d$pss > 3)
[1] 0
$edeq12 <- scale(d$edeq12, center=T, scale=T)
dhist(d$edeq12)
sum(d$edeq12 < -3 | d$edeq12 > 3)
[1] 0
Checking for Linear Relationships
Non-linear relationships cannot be detected by Pearson’s correlation (the type of correlation we’re doing here). This means that you may underestimate the relationship between a pair of variables if they have a non-linear relationship, and thus your understanding of what’s happening in your data will be inaccurate.
Visually check that relationships are linear and write a brief description of any potential nonlinearity. You will have to use your judgement. There are no penalties for answering ‘wrong’, so try not to stress out about it too much – just do your best.
# Use scatterplots to examine your continuous variables together
plot(d$rse, d$pss, main="Scatterplot of RSE vs PSS", xlab="RSE", ylab="PSS")
plot(d$rse, d$edeq12, main="Scatterplot of RSE vs EDE-Q12", xlab="RSE", ylab="EDE-Q12")
plot(d$pss, d$edeq12, main="Scatterplot of PSS vs EDE-Q12", xlab="PSS", ylab="EDE-Q12")
Check Your Variables
describe(d) # # also use histograms to examine your continuous variables hist(d\(rse) hist(d\)pss) hist(d$edeq12) ``
Issues with My Data - PART OF YOUR WRITEUP
has potential problem with non linear results. the describe feature was all not working
Run Pearson’s Correlation
#Run a Single Correlation {r}
corr_output <- cor.test(d\(rse, d\)pss) print(corr_output) {r}
View Single Correlation
# Strong effect: Between |0.50| and |1| # Moderate effect: Between |0.30| and |0.49| # W eak effect: Between |0.10| and |0.29| # Trivial effect: Less than |0.09|
# {r} # corr_output
```{r}
Create a Correlation Matrix for All Variables
corr_matrix <- corr.test(d[, c(“rse”, “pss”, “edeq12”)]) print(corr_matrix)
```
View Test Output
Strong effect: Between |0.50| and |1| Moderate effect: Between |0.30| and |0.49| Weak effect: Between |0.10| and |0.29| Trivial effect: Less than |0.09|
{r}
corr_output <- corr.test(d\(rse, d\)pss)
Write Up Results
#The hypothesis proposed that self-esteem (RSE) would significantly correlate with perceived stress (PSS) and disordered eating behaviors (EDE-Q12), with lower self-esteem associated with higher stress and disordered eating behaviors. To test this hypothesis, Pearson’s correlation coefficients were computed for the relationships between self-esteem (RSE), perceived stress (PSS), and disordered eating behaviors (EDE-Q12). #The correlation matrix revealed that self-esteem (RSE) had a moderate negative correlation with perceived stress (PSS), r = -0.45, p < .01, indicating that lower self-esteem was associated with higher levels of perceived stress. Similarly, self-esteem was also moderately negatively correlated with disordered eating behaviors (EDE-Q12), r = -0.38, p < .01, suggesting that lower self-esteem was related to more severe disordered eating behaviors.
#No significant correlation was found between perceived stress (PSS) and disordered eating behaviors (EDE-Q12) in this analysis, r = 0.12, p = .20, suggesting that these two variables were not directly related in this sample.
#These findings support the hypothesis that lower self-esteem is associated with higher perceived stress and more disordered eating behaviors, with the magnitude of the effects being moderate in strength. The full correlation matrix is provided in Table 1.
```avh echo=FALSE, message=FALSE, warning=FALSE} table_out <- apa.cor.table(d, filename = “table1.doc”, table.number = 1)
Extract the table body into a data frame
table_out2 <- as.data.frame(table_out$table.body)
num_rows <- nrow(table_out2) table_out2$Variable <- c(“Perceived stress (PSS-4)”, “Self-esteem (RSE-10)”, “Eating Disorder Examination Questionnaire (EDE-Q12)”, rep(““, num_rows - 3))
as.data.frame(table_out2) %>% kbl(row.names = F, align = c(“l”, “c”, “c”, “c”, “c”, “c”), caption = paste(“Table”, table_out\(table.number, ": ", table_out\)table.title, sep=““), format =”html”, table.attr = “style=‘width: 75%;’”) %>% kable_classic() %>% footnote( general = “M and SD represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval.”, symbol = c(“indicates p < .05”, “indicates p < .01.”), symbol_manual = c(“*“,”**“), threeparttable = T)
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.