── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
d <-read_csv("correlation_practice.csv")
Rows: 54 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): ID, Sex
dbl (3): OPIc_rating, Vocab_score, Grammar_score
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Scatter Plots
Scatter plots are a very useful way to visualize the relationship between two variables. We’ll try a quick way and a pretty way to make scatter plots.
Quick way:
plot(d$OPIc_rating, d$Vocab_score)
Pretty way:
d %>%ggplot(aes(x = OPIc_rating, y = Vocab_score)) +geom_point()+labs(x ="Speaking Proficiency (OPIc rating)", y ="Vocabulary Knowledge")+theme_bw()
A nice extra step is to add a trend line:
d %>%ggplot(aes(x = OPIc_rating, y = Vocab_score)) +geom_point()+geom_smooth(method ="lm")+labs(x ="Speaking Proficiency (OPIc rating)", y ="Vocabulary Knowledge")+theme_bw()
`geom_smooth()` using formula = 'y ~ x'
Correlations
R has several built-in functions for running correlations. It is important to specify which type of correlation you want to run.
In our data, the OPIc_score is an ordinal test score. So we should use Spearman correlation.
The correlation between OPIc scores and Vocabulary scores is .82. This is a positive, strong correlation.
To correlation two continuous variables, you don’t need to specify the method = argument, as Pearson correlations are the default, but we will do so anyway just to practice:
cor(d$Vocab_score, d$Grammar_score)
[1] 0.8756754
While you cannot directly compare a Spearman correlation and a Pearson correlation, it seems likely that the correlation between Vocabulary and Grammar is likely stronger than the correlation between Speaking and Vocabulary.