Required packages: Tidyverse
library(tidyverse)
csv_link <- 'https://raw.githubusercontent.com/pmahdi/cuny-bridge/main/causaldata_scorecard.csv'
college_data_original <- read_csv(file = csv_link)
head(college_data_original, 3)
## # A tibble: 3 × 9
## ...1 unitid inst_name state…¹ pred_…² year earni…³ count…⁴ count…⁵
## <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 100654 Alabama A & M Univ… AL 3 2007 36600 116 1139
## 2 2 100663 University of Alab… AL 3 2007 40800 366 2636
## 3 3 100690 Amridge University AL 3 2007 NA 6 25
## # … with abbreviated variable names ¹state_abbr, ²pred_degree_awarded_ipeds,
## # ³earnings_med, ⁴count_not_working, ⁵count_working
college_data <- college_data_original[-1]
filt_degrees <- which(names(college_data) == 'pred_degree_awarded_ipeds')
names(college_data)[filt_degrees] <- 'degrees_awarded'
college_data$degrees_awarded <- as.character(college_data$degrees_awarded)
college_data[college_data$degrees_awarded == '1', 'degrees_awarded'] <- 'less than 2 yr'
college_data[college_data$degrees_awarded == '2', 'degrees_awarded'] <- '2 yr'
college_data[college_data$degrees_awarded == '3', 'degrees_awarded'] <- '4 yr or more'
working_ratio <- college_data['count_working'] / (college_data['count_working'] + college_data['count_not_working'])
college_data['working_ratio'] <- working_ratio
head(college_data, 3)
## # A tibble: 3 × 9
## unitid inst_name state…¹ degre…² year earni…³ count…⁴ count…⁵ worki…⁶
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 100654 Alabama A & M Un… AL 4 yr o… 2007 36600 116 1139 0.908
## 2 100663 University of Al… AL 4 yr o… 2007 40800 366 2636 0.878
## 3 100690 Amridge Universi… AL 4 yr o… 2007 NA 6 25 0.806
## # … with abbreviated variable names ¹state_abbr, ²degrees_awarded,
## # ³earnings_med, ⁴count_not_working, ⁵count_working, ⁶working_ratio
earnings_all <- median(college_data$earnings_med, na.rm = TRUE)
earnings_target_group <- median(college_data$earnings_med[college_data$degrees_awarded == '4 yr or more'], na.rm = TRUE)
working_ratio_all <- median(college_data$working_ratio, na.rm = TRUE)
working_ratio_target_group <- median(college_data$working_ratio[college_data$degrees_awarded == '4 yr or more'], na.rm = TRUE)
print(list('earnings_all' = earnings_all, 'earnings_target_group' = earnings_target_group, 'working_ratio_all' = working_ratio_all, 'working_ratio_target_group' = working_ratio_target_group))
## $earnings_all
## [1] 31600
##
## $earnings_target_group
## [1] 41700
##
## $working_ratio_all
## [1] 0.8304795
##
## $working_ratio_target_group
## [1] 0.888587
Target group’s median values pertaining to both variables (earnings_med and working_ratio) are higher than the overall median values for those variables. This suggests that the target group (institutions predominantly offering 4+ year degrees) has higher values within the two columns.
result <- aggregate(cbind(earnings_med, working_ratio) ~ degrees_awarded, college_data, median)
names(result) <- c('degrees_awarded', 'median_earnings_med', 'median_working_ratio')
result <- result[c(3, 1, 2), ]
result
## degrees_awarded median_earnings_med median_working_ratio
## 3 less than 2 yr 24200 0.7835052
## 1 2 yr 30900 0.8244955
## 2 4 yr or more 41800 0.8891667
Analyzing the relevant median values both mathematically and visually suggests that the answer to the question initially posed at the beginning of this report is a yes. Students who graduated from institutions offering mostly 4+ year degrees are indeed earning more money and working more than the students who graduated from institutions offering mostly shorter degrees. The disparity is much more pronounced in the case of earnings data, with many more high-value outliers visible in the target group’s boxplot than in the other groups’ boxplots.
cor_earnings_working <- cor(x = college_data$earnings_med, y = college_data$working_ratio, method = 'pearson', use = 'pairwise.complete.obs')
cor_earnings_working
## [1] 0.7164122
A value of 0.7164122 indicates that there is positive correlation between the variables earnings_med and working_ratio.