You will use college tuition and diversity data for this quiz. See below for the definition of some of the variables.
Hint: The data file is posted in Moodle. See Module 5. It’s named as “college_tuition.csv”.
data <- read.csv("~//BusStats/data/college_tuition.csv")
head(data)
## name state state_code type
## 1 Aaniiih Nakoda College Montana MT Public
## 2 Abilene Christian University Texas TX Private
## 3 Abraham Baldwin Agricultural College Georgia GA Public
## 4 Academy College Minnesota MN For Profit
## 5 Academy of Art University California CA For Profit
## 6 Adams State University Colorado CO Public
## degree_length room_and_board in_state_tuition in_state_total
## 1 2 Year 4280 2380 2380
## 2 4 Year 10350 34850 45200
## 3 2 Year 8474 4128 12602
## 4 2 Year 8474 17661 17661
## 5 4 Year 16648 27810 44458
## 6 4 Year 8782 9440 18222
## out_of_state_tuition out_of_state_total total_enrollment percent_minority
## 1 2380 2380 291 0.8865979
## 2 34850 45200 4427 0.2737746
## 3 12550 21024 3458 0.1966455
## 4 17661 17661 127 0.2755906
## 5 27810 44458 15212 0.2584144
## 6 20456 29238 3154 0.3845910
## percent_foreign total_minority foreign_enrollment name_imp state_imp
## 1 0.00000000 258 0 FALSE FALSE
## 2 0.04178902 1212 185 FALSE FALSE
## 3 0.01417004 680 49 FALSE FALSE
## 4 0.00000000 35 0 FALSE FALSE
## 5 0.34006048 3931 5173 FALSE FALSE
## 6 0.00000000 1213 0 FALSE FALSE
## state_code_imp type_imp degree_length_imp room_and_board_imp
## 1 FALSE FALSE FALSE TRUE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE TRUE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## in_state_tuition_imp in_state_total_imp out_of_state_tuition_imp
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
## out_of_state_total_imp total_enrollment_imp percent_minority_imp
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
## percent_foreign_imp total_minority_imp foreign_enrollment_imp
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 4.2. Map in_state_total to the y-axis and percent_minority to the x-axis.
library(tidyverse)
ggplot(data,
aes(x = percent_minority ,
y = in_state_total)) +
geom_point()
Hint: Interpret both the direction and the strength of the correlation
cor(data$percent_minority, data$in_state_total)
## [1] -0.245447
The correlation has a strong negative relationship, it’s between -0.3 and -0.1.
Hint: For the code, refer to one of our textbooks, Data Visualization with R: Chapter 8.1.
# select numeric variables
df <- dplyr::select_if(data, is.numeric)
# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
## room_and_board in_state_tuition in_state_total
## room_and_board 1.00 0.72 0.78
## in_state_tuition 0.72 1.00 0.98
## in_state_total 0.78 0.98 1.00
## out_of_state_tuition 0.77 0.95 0.95
## out_of_state_total 0.82 0.93 0.97
## total_enrollment 0.10 -0.17 -0.13
## percent_minority -0.16 -0.24 -0.25
## percent_foreign 0.34 0.39 0.40
## total_minority -0.01 -0.22 -0.21
## foreign_enrollment 0.29 0.16 0.20
## out_of_state_tuition out_of_state_total total_enrollment
## room_and_board 0.77 0.82 0.10
## in_state_tuition 0.95 0.93 -0.17
## in_state_total 0.95 0.97 -0.13
## out_of_state_tuition 1.00 0.98 -0.01
## out_of_state_total 0.98 1.00 -0.01
## total_enrollment -0.01 -0.01 1.00
## percent_minority -0.25 -0.25 0.13
## percent_foreign 0.41 0.41 0.09
## total_minority -0.11 -0.12 0.85
## foreign_enrollment 0.28 0.29 0.62
## percent_minority percent_foreign total_minority
## room_and_board -0.16 0.34 -0.01
## in_state_tuition -0.24 0.39 -0.22
## in_state_total -0.25 0.40 -0.21
## out_of_state_tuition -0.25 0.41 -0.11
## out_of_state_total -0.25 0.41 -0.12
## total_enrollment 0.13 0.09 0.85
## percent_minority 1.00 -0.09 0.40
## percent_foreign -0.09 1.00 0.03
## total_minority 0.40 0.03 1.00
## foreign_enrollment -0.01 0.46 0.44
## foreign_enrollment
## room_and_board 0.29
## in_state_tuition 0.16
## in_state_total 0.20
## out_of_state_tuition 0.28
## out_of_state_total 0.29
## total_enrollment 0.62
## percent_minority -0.01
## percent_foreign 0.46
## total_minority 0.44
## foreign_enrollment 1.00
library(ggcorrplot)
ggcorrplot(r)
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
variables with a strong positive association with in_state_total are “in_state_tuition”, “out_of_state_tuition”, and “out_of_state_total”.
Hint: A correct answer must include all of the following: 1) direction and strength of the correlation coefficient, and 2) linear versus non-linear relationship.
Based on the correlation plot, there is a positive relationship between in_state_total and percent_foreign, it is somewhere between 0-0.5.We can visually see this as the the correlation moves upwards and to the right of the plot. There is a linear relationship between the in_state_total and percent_foreign.
Hint: Use message, warning, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.