College costs are rising. Anecdotally, I saw my own tuition at Purdue more than double in the 12 years that I attended ($3724/yr in 1999 to $9070/yr in 20101. However, anecdotes are the beginning of a scientific investigation. Tuition for a public 4 year school has increased at a rate averaging between 3.2% to 4.4% since 19872. The rate of inflation during this time has been between 3.2%3. Coming from a working class back ground, I found the cost of universities to be prohibitive without a scholarship, which motivated me to join the US Navy. However, military service should not be the only route for affordable education. Until public policy changes in such a way that makes higher education more affordable to more people, prospective students may have to make hard choices about how much debt they take on, and how quickly they can pay down that debt. Employment rates and salary statistics are important for cost-conscience students to choose a path that not encumber a student with unmanageable debt. To address these concerns I ask:
Which college majors offer the best opportunities in terms of unemployment rate and salary?
These Data were collated by the 538 website and was posted to their github page4. They in turn used data from:
“All data is from American Community Survey 2010-2012 Public Use Microdata Series. Download data here: http://www.census.gov/programs-surveys/acs/data/pums.html Documentation here: http://www.census.gov/programs-surveys/acs/technical-documentation/pums.html Major categories are from Carnevale et al,”What’s It Worth?: The Economic Value of College Majors." Georgetown University Center on Education and the Workforce, 2011.5" Details for the Georgetown data set can be found here: https://1gyhoq479ufd3yna29x7ubjn-wpengine.netdna-ssl.com/wp-content/uploads/2015/01/WIW1-Methodology.pdf
From the above methodology report:
Unique Data Characteristics 1)For the first time in this survey the Census Bureau asked individuals who indicated that their degree was a bachelor’s degree or higher, to supply their undergraduate major. Their responses were then coded and collapsedby the Census Bureau into 171 different degree majors.
2) Unlike other data sources focused on recent degree recipients, the Census data enable analysis across an individual’s full life cycle. 3) The Census data also result in robust estimates due to the very large sample involved. 531,337 persons surveyed who are representative of 51,547,518 people having Bachelor’s degrees (including those with graduate degrees), when weighted.
Since these data were collected by survey and lacks experimental features like a control group and blinding, this study is an observational study. We therefore cannot establish a causal link between variables. However the sample size is small enough compared to the population where cases are independent, and the data were collected at random. We can make inferences and predictions using these data.
In establishing the scope of inference we must bare in mind that the data were collected on college degree holders within the US. We can only make predictions on degree holders within the US. Predictions made here are not valid for degree holders in other countries, even if they obtained their degree in the US. Social and economic conditions in other countries that would invalidate any predictions made.
In many observational surveys the cases are individual people. That is not true for this study. Although the data was collected by asking individuals what their major, degree level, pay, and employment status was. The data is organized in such a way that the cases are the college majors. In the “All_ages” set," each case represents majors offered by colleges and universities in the US. These data include both undergrads and grad students. In the “Grad_students” set, each case represents majors offered by colleges and universities in the US. These data include only grad students aged 25+ years. Finally, in the “Recent_grad” set, each case represents majors offered by colleges and universities in the US. These data include only undergraduate students aged <28 years. “Recent_grad” also includes gender statistics. In all sets, the same 173 majors are used.
In asking what the economic outlook is for the college majors, the response variable are the college majors and are categorical. Results will take the form of ordered lists. These lists will be created using the explanatory variables such as the counts of employed and unemployed college degree holders and the statistics of their income. These data are numerical. We will also see what effect gender has on income and employment rate and that data is categorical.
The Appendix contains tables and graphs of median salary data and unemployment rate for each of the 173 majors at the three attainment categories, recent graduate, graduate degree and all ages. These data were relegated to an Appendix to make them available for the interested reader, but not in such a way that interrupts the flow of this paper, since these tables and graphs take up nearly 70 pages.
In this chapter, we will compare unemployment rate and median salary data based solely on attainment level.
First we will look at overall unemployment rate for the 3 categories: all ages, recent grads, and grad students.
summary(all_ages$Unemployment_rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.04626 0.05472 0.05736 0.06904 0.15615
summary(rct_grad$Unemployment_rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.05031 0.06796 0.06819 0.08756 0.17723
summary(grad_stdnt$Grad_unemployment_rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.02607 0.03665 0.03934 0.04805 0.13851
unempl <- cbind(all_ages$Unemployment_rate, rct_grad$Unemployment_rate, grad_stdnt$Grad_unemployment_rate)
boxplot(unempl,names = c("All", "Recent Grad", "Grad Student"), ylab = "Unemployment Rate")
It appears that people holding only a Bachelor’s degree have nearly twice as high median unemployment as those with higher degrees. This suggests that having a graduate degree improves a person’s chance at finding a job.
We will also look at median income for the three categories.
summary(all_ages$Median)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 35000 46000 53000 56816 65000 125000
hist(all_ages$Median, main = "Histogram for Median Income All Ages", xlab = "Median Income by Major All Ages (USD)", col = "dark blue")
summary(rct_grad$Median)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22000 33000 36000 40151 45000 110000
hist(rct_grad$Median, main = "Histogram for Median Income Recent Grads", xlab = "Median Income by Major Recent Grads (USD)", col = "dark blue")
summary(grad_stdnt$Grad_median)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 47000 65000 75000 76756 90000 135000
hist(grad_stdnt$Grad_median, main = "Histogram for Median Income Grad Students", xlab = "Median Income by Major Grad Student (USD)", col = "dark blue")
medsal <- cbind(all_ages$Median, rct_grad$Median, grad_stdnt$Grad_median)
boxplot(medsal, names = c("All", "Recent Grad", "Grad Student"), ylab = "Median Salary USD")
We see from these graphs that the median salary of the graduate students is considered a high outlier for the recent graduate set, and the medial salary for the recent graduate data set is a low outlier for the graduate student set. This suggests that getting a graduate degree greatly improves earning potential.
The tables and bar plots in the appendix also show that majors that emphasize so-called “hard” skills, such as the Science Technology Engineering and Math (STEM) majors tend to out perform majors that emphasize the so-called “soft” skills, such as Fine Arts, Liberal Arts, and Social Sciences.
It is not enough to simply look at graphs and draw conclusions as to whether our hypothesis is correct. Further statistical tests need to be preformed to test if what the graphs tell us is actually significant. To that end, we perform \(\chi^2\) tests for independence, as this test is used to check for significance of a categorical variable like employed vs. unemployed6. Our null hypothesis is that major choice is independent of employment status. The alternative hypothesis is that employment status depends on major choice.
First for all ages:
all_age_contin <- all_ages %>% dplyr::select(Major, Employed, Unemployed) # For user-freindliness we'll pull major, number employed, number unemployed.
head(all_age_contin)
## # A tibble: 6 x 3
## Major Employed Unemployed
## <chr> <int> <int>
## 1 GENERAL AGRICULTURE 90245 2423
## 2 AGRICULTURE PRODUCTION AND MANAGEMENT 76865 2266
## 3 AGRICULTURAL ECONOMICS 26321 821
## 4 ANIMAL SCIENCES 81177 3619
## 5 FOOD SCIENCE 17281 894
## 6 PLANT SCIENCE AND AGRONOMY 63043 2070
#barplot(as.matrix(all_age_contin), beside = TRUE)
chisq.test(all_age_contin[,-1]) #We remove the major names for the chi-squared test
##
## Pearson's Chi-squared test
##
## data: all_age_contin[, -1]
## X-squared = 96644, df = 172, p-value < 2.2e-16
Since the p-value is less than 0.05, we can reject the null hypothesis that the choice of major does not affects employment status, and we accept the alternative hypothesis that choice of major does affect employment status in the all ages category.
Next, we will test for grad students:
head(grad_stdnt)
## # A tibble: 6 x 22
## Major_code Major
## <int> <chr>
## 1 1101 AGRICULTURE PRODUCTION AND MANAGEMENT
## 2 1100 GENERAL AGRICULTURE
## 3 1302 FORESTRY
## 4 1303 NATURAL RESOURCES MANAGEMENT
## 5 1105 PLANT SCIENCE AND AGRONOMY
## 6 1102 AGRICULTURAL ECONOMICS
## # ... with 20 more variables: Major_category <chr>, Grad_total <int>,
## # Grad_sample_size <int>, Grad_employed <int>,
## # Grad_full_time_year_round <int>, Grad_unemployed <int>,
## # Grad_unemployment_rate <dbl>, Grad_median <dbl>, Grad_P25 <int>,
## # Grad_P75 <dbl>, Nongrad_total <int>, Nongrad_employed <int>,
## # Nongrad_full_time_year_round <int>, Nongrad_unemployed <int>,
## # Nongrad_unemployment_rate <dbl>, Nongrad_median <dbl>,
## # Nongrad_P25 <int>, Nongrad_P75 <dbl>, Grad_share <dbl>,
## # Grad_premium <dbl>
grd_st_contin <- grad_stdnt %>% dplyr::select(Major, Grad_employed, Grad_unemployed)# For user-freindliness we'll pull major, number employed, number unemployed.
head(grd_st_contin)
## # A tibble: 6 x 3
## Major Grad_employed Grad_unemployed
## <chr> <int> <int>
## 1 AGRICULTURE PRODUCTION AND MANAGEMENT 13104 473
## 2 GENERAL AGRICULTURE 28930 874
## 3 FORESTRY 16831 725
## 4 NATURAL RESOURCES MANAGEMENT 23394 711
## 5 PLANT SCIENCE AND AGRONOMY 22782 735
## 6 AGRICULTURAL ECONOMICS 10592 216
#barplot(as.matrix(all_age_contin), beside = TRUE)
chisq.test(grd_st_contin[,-1]) #We remove the major names for the chi-squared test
##
## Pearson's Chi-squared test
##
## data: grd_st_contin[, -1]
## X-squared = 62013, df = 172, p-value < 2.2e-16
Again, p<0.05, we reject the null hypothesis and accept the alternative hypothesis that major choice at the grad level affects employment status.
Now for recent bachelor’s degree grads:
head(rct_grad)
## # A tibble: 6 x 21
## Rank Major_code Major Total Men Women
## <int> <int> <chr> <int> <int> <int>
## 1 22 1104 FOOD SCIENCE NA NA NA
## 2 64 1101 AGRICULTURE PRODUCTION AND MANAGEMENT 14240 9658 4582
## 3 65 1100 GENERAL AGRICULTURE 10399 6053 4346
## 4 72 1102 AGRICULTURAL ECONOMICS 2439 1749 690
## 5 108 1303 NATURAL RESOURCES MANAGEMENT 13773 8617 5156
## 6 112 1302 FORESTRY 3607 3156 451
## # ... with 15 more variables: Major_category <chr>, ShareWomen <dbl>,
## # Sample_size <int>, Employed <int>, Full_time <int>, Part_time <int>,
## # Full_time_year_round <int>, Unemployed <int>, Unemployment_rate <dbl>,
## # Median <int>, P25th <int>, P75th <int>, College_jobs <int>,
## # Non_college_jobs <int>, Low_wage_jobs <int>
rct_gr_contin <- rct_grad %>% dplyr::select(Major,Employed,Unemployed) %>% filter(Major != "MILITARY TECHNOLOGIES" ) # For user-freindliness we'll pull major, number employed, number unemployed. One Major, military technology had 0 in both employed and unemployed columns, was excluded.
rct_gr_contin
## # A tibble: 172 x 3
## Major Employed Unemployed
## <chr> <int> <int>
## 1 FOOD SCIENCE 3149 338
## 2 AGRICULTURE PRODUCTION AND MANAGEMENT 12323 649
## 3 GENERAL AGRICULTURE 8884 178
## 4 AGRICULTURAL ECONOMICS 2174 182
## 5 NATURAL RESOURCES MANAGEMENT 11797 842
## 6 FORESTRY 3007 322
## 7 SOIL SCIENCE 613 0
## 8 PLANT SCIENCE AND AGRONOMY 6594 314
## 9 ANIMAL SCIENCES 17112 917
## 10 MISCELLANEOUS AGRICULTURE 1290 82
## # ... with 162 more rows
#barplot(as.matrix(all_age_contin), beside = TRUE)
chisq.test(rct_gr_contin[,-1]) #We remove the major names for the chi-squared test
##
## Pearson's Chi-squared test
##
## data: rct_gr_contin[, -1]
## X-squared = 29941, df = 171, p-value < 2.2e-16
As with the other two cases,we reject the null and accept the alternative that choice of major affects unemployment rate. Thus, regardless of degree level your choice of major will affect your unemployment rate. Generally speaking you’ll have better chances of finding a job in certain majors as compared to other majors.
We can also compare grad vs. under grad:
#This will give proportions for making the bar plot.
a <- sum(grd_st_contin[,2])/(sum(grd_st_contin[,2])+sum(grd_st_contin[,3]))
b <- sum(grd_st_contin[,3])/(sum(grd_st_contin[,2])+sum(grd_st_contin[,3]))
c <- sum(rct_gr_contin[,2])/(sum(rct_gr_contin[,2])+sum(rct_gr_contin[,3]))
d <- sum(rct_gr_contin[,3])/(sum(rct_gr_contin[,2])+sum(rct_gr_contin[,3]))
#Now to make a matrix to plot
gr_ug_contin_prop <- matrix(c(a, c,b,d),byrow = TRUE, nrow = 2)
barplot(gr_ug_contin_prop,beside = TRUE, names.arg = c("Grad Students", "Undergrads"), ylab = "%",main = "Employment/Unemployment")
#For Ch-sq we will use absolute count instead of proportion.
e <- sum(grd_st_contin[,2])
f <- sum(grd_st_contin[,3])
g <- sum(rct_gr_contin[,2])
h <- sum(rct_gr_contin[,3])
gr_ug_contin <- matrix(c(e, f,g,h),byrow = TRUE, nrow = 2)
gr_ug_contin
## [,1] [,2]
## [1,] 16268407 606612
## [2,] 5396348 418025
chisq.test(gr_ug_contin)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: gr_ug_contin
## X-squared = 129590, df = 1, p-value < 2.2e-16
For level of attainment, we reject the null hypothesis that degree level does not affect employment status and accept the alternative hypothesis that degree attainment bachelor’s to graduate degree does affect unemployment rate.
Choice of major and level of degree attainment affects unemployment rate at all age levels.
As we stated in the Data section, exploration of the median salary data in the appendix shows that quantitative analysis majors, the STEM majors, appear to have more earning potential than qualitative analysis majors, such as Liberal Arts. Since median salary is a numerical measurement, it is appropriate to use a Student’s t-test7 or a Kolmogorov-Smirnov8 test to compare similarity between data sets. The Student’s t-test is a parametric test that compare’s against the t distribution. The Kolmogorov-Smirnov is a non-parametric test, in that it does not assume the survey data is drawn from a population with a given distribution, instead it measures likelihood of similarity by comparing the biggest difference in to data set’s continuous probability distribution. Since the salary data has a right-skew across all attainment levels, adding a non-parametric test will increase the robustness of this analysis.
To make these comparisons, we must bare in mind that we have (14 major categories x 3 attainment levels) 42 categories that have to be combined in groups of 2 for a total of \(C(42,2) = \frac{42!}{(2! *40!)} = 861\) combinations. This is prohibitively long given the time constraints for this project. Therefore, We will analyze 4 major categories from the all ages set to bring us to \(C(4,2) = \frac{4!}{2!*2!} = 6\) combinations. These major categories are, Engineering, Physical Sciences, Liberal Arts, and Psychology & Social Work. We selected these categories based on observation of the median salary tables in the Appendix.
The Null hypothesis is that there is no difference between median salaries of Engineering majors and Physical Science Majors. Initial two-sided tests, that only check that the distributions are different, and not that one is greater or less than the other, showed significance in all cases. We show below the results of single sided tests to definitely say that median salary of one degree category is greater than the other.
boxplot(all_ages_eng$Median, all_ages_sci$Median, names = c("Engineering", "Physical Sciences"), ylab = "Median Salary USD")
t.test(all_ages_eng$Median, all_ages_sci$Median, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: all_ages_eng$Median and all_ages_sci$Median
## t = 4.3198, df = 29.522, p-value = 8.094e-05
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 9321.027 Inf
## sample estimates:
## mean of x mean of y
## 77758.62 62400.00
ks.test(all_ages_eng$Median, all_ages_sci$Median, alternative = "less") #KS test has opposite sign convention than t test
## Warning in ks.test(all_ages_eng$Median, all_ages_sci$Median, alternative =
## "less"): cannot compute exact p-value with ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: all_ages_eng$Median and all_ages_sci$Median
## D^- = 0.63103, p-value = 0.00268
## alternative hypothesis: the CDF of x lies below that of y
The median salary of Engineering majors is higher than that of Physical Science majors at the 95% confidence level.
This time we repeat the same Null and Alternative hypotheses with Engineering and Liberal Arts.
boxplot(all_ages_eng$Median, all_ages_la$Median, names = c("Engineering", "Liberal Arts"), ylab = "Median Salary USD")
t.test(all_ages_eng$Median, all_ages_la$Median, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: all_ages_eng$Median and all_ages_la$Median
## t = 11.611, df = 33.361, p-value = 1.455e-13
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 27227.15 Inf
## sample estimates:
## mean of x mean of y
## 77758.62 45887.50
ks.test(all_ages_eng$Median, all_ages_la$Median, alternative = "less") #KS test has opposite sign convention than t test
## Warning in ks.test(all_ages_eng$Median, all_ages_la$Median, alternative =
## "less"): cannot compute exact p-value with ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: all_ages_eng$Median and all_ages_la$Median
## D^- = 1, p-value = 1.106e-09
## alternative hypothesis: the CDF of x lies below that of y
The median salary of Engineering majors is higher than that of Liberal Arts majors at the 95% confidence level.
This time we repeat the same Null and Alternative hypotheses with Liberal Arts and Physical Sciences.
boxplot(all_ages_la$Median, all_ages_sci$Median, names = c("Liberal Arts", "Physical Sciences"), ylab = "Median Salary USD")
t.test(all_ages_la$Median, all_ages_sci$Median, alternative = "less")
##
## Welch Two Sample t-test
##
## data: all_ages_la$Median and all_ages_sci$Median
## t = -6.4755, df = 11.198, p-value = 2.107e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -11940.38
## sample estimates:
## mean of x mean of y
## 45887.5 62400.0
ks.test(all_ages_la$Median, all_ages_sci$Median, alternative = "greater") #KS test has opposite sign convention than t test
## Warning in ks.test(all_ages_la$Median, all_ages_sci$Median, alternative =
## "greater"): cannot compute exact p-value with ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: all_ages_la$Median and all_ages_sci$Median
## D^+ = 1, p-value = 4.517e-06
## alternative hypothesis: the CDF of x lies above that of y
Physical Science median salary is higher than Liberal Arts median salary at the 95% confidence level.
In terms of median pay the ranking is as follows:
Additionally, the Industrial and Organizational Psycology Major is similar in pay to Physical Sciences.
Job market pressure can have an impact on both median salary and unemployment rate. If a field has low demand but high supply this can depress the salary and increase the unemployment rate. Conversely, a high demand/low supply field will see increased salaries and decreased unemployment rates. Another effect to consider is that people in over-subscribed field may spend a greater time looking for a job, which would also decrease median salary as they may be unemployed or underemployed during the job hunt. This effect could show in the data as a correlation between unemployment rate and salary.
To test if there is a connection between unemployment rate and median salary, we will take the “all_ages” data set and create linear regression models. If the residuals of the model do not show the necessary behavior of Normal Distribution and Constant Variance, we will perform a Box-Cox transformation on the data to get an exponential factor to improve the model.
fit1<-lm(all_ages$Median ~ all_ages$Unemployment_rate)
summary(fit1)
##
## Call:
## lm(formula = all_ages$Median ~ all_ages$Unemployment_rate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23370 -8995 -3272 8079 64676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70097 3380 20.738 < 2e-16 ***
## all_ages$Unemployment_rate -231551 55906 -4.142 5.41e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14060 on 171 degrees of freedom
## Multiple R-squared: 0.09117, Adjusted R-squared: 0.08586
## F-statistic: 17.15 on 1 and 171 DF, p-value: 5.406e-05
ggplot(all_ages, aes(x = Unemployment_rate, y = Median)) +
geom_point(color = 'blue')+
geom_smooth(method = "lm", formula = y~x)
hist(resid(fit1))
plot(fitted(fit1), resid(fit1))
myt <- boxcox(fit1)
myt_df <- as.data.frame(myt)
optimal_lambda = myt_df[which.max(myt$y),1] #syntax from https://rpubs.com/FelipeRego/SimpleLinearRegression
optimal_lambda
## [1] -1.070707
fit2 <- lm(all_ages$Median^optimal_lambda ~ all_ages$Unemployment_rate)
summary(fit2)
##
## Call:
## lm(formula = all_ages$Median^optimal_lambda ~ all_ages$Unemployment_rate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.646e-06 -1.475e-06 2.614e-07 1.295e-06 5.141e-06
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.755e-06 4.666e-07 14.476 < 2e-16 ***
## all_ages$Unemployment_rate 3.272e-05 7.718e-06 4.239 3.66e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.941e-06 on 171 degrees of freedom
## Multiple R-squared: 0.09509, Adjusted R-squared: 0.0898
## F-statistic: 17.97 on 1 and 171 DF, p-value: 3.664e-05
hist(resid(fit2))
plot(fitted(fit2), resid(fit2))
qqnorm(resid(fit2))
qqline(resid(fit2))
all_ages <- all_ages %>% mutate(transMedian = Median^optimal_lambda)
head(all_ages)
## # A tibble: 6 x 12
## Major_code Major
## <int> <chr>
## 1 1100 GENERAL AGRICULTURE
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT
## 3 1102 AGRICULTURAL ECONOMICS
## 4 1103 ANIMAL SCIENCES
## 5 1104 FOOD SCIENCE
## 6 1105 PLANT SCIENCE AND AGRONOMY
## # ... with 10 more variables: Major_category <chr>, Total <int>,
## # Employed <int>, Employed_full_time_year_round <int>, Unemployed <int>,
## # Unemployment_rate <dbl>, Median <int>, P25th <int>, P75th <dbl>,
## # transMedian <dbl>
ggplot(all_ages, aes(x = Unemployment_rate, y = transMedian)) +
geom_point(color = 'blueviolet')+
geom_smooth(method = "lm", formula = y~x)
#The correlation seems to be due to outliers
all_ages_no_outlr <- all_ages %>% filter(Unemployment_rate != max(Unemployment_rate) & Unemployment_rate != 0)
fit3 <- lm(all_ages_no_outlr$transMedian ~ all_ages_no_outlr$Unemployment_rate)
summary(fit3)
##
## Call:
## lm(formula = all_ages_no_outlr$transMedian ~ all_ages_no_outlr$Unemployment_rate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.616e-06 -1.475e-06 1.990e-07 1.290e-06 5.153e-06
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.611e-06 5.382e-07 12.285 < 2e-16
## all_ages_no_outlr$Unemployment_rate 3.539e-05 8.999e-06 3.933 0.000122
##
## (Intercept) ***
## all_ages_no_outlr$Unemployment_rate ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.949e-06 on 168 degrees of freedom
## Multiple R-squared: 0.08432, Adjusted R-squared: 0.07887
## F-statistic: 15.47 on 1 and 168 DF, p-value: 0.0001225
hist(resid(fit3))
plot(fitted(fit3), resid(fit3))
qqnorm(resid(fit3))
qqline(resid(fit3))
ggplot(all_ages_no_outlr, aes(x = Unemployment_rate, y = transMedian)) +
geom_point(color = 'firebrick')+
geom_smooth(method = "lm", formula = y~x)
# We can also see what Majors have the most and least unemployment
mjr_umploy <- all_ages %>% dplyr::select(Major,Unemployment_rate) %>% arrange(Unemployment_rate)
head(mjr_umploy, 10)
## # A tibble: 10 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 EDUCATIONAL ADMINISTRATION AND SUPERVISION 0.00000000
## 2 GEOLOGICAL AND GEOPHYSICAL ENGINEERING 0.00000000
## 3 PHARMACOLOGY 0.01611080
## 4 MATERIALS SCIENCE 0.02233333
## 5 MATHEMATICS AND COMPUTER SCIENCE 0.02490040
## 6 GENERAL AGRICULTURE 0.02614711
## 7 TREATMENT THERAPY PROFESSIONS 0.02629160
## 8 NURSING 0.02679682
## 9 AGRICULTURE PRODUCTION AND MANAGEMENT 0.02863606
## 10 AGRICULTURAL ECONOMICS 0.03024832
tail(mjr_umploy, 10)
## # A tibble: 10 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 ARCHITECTURE 0.08599113
## 2 ASTRONOMY AND ASTROPHYSICS 0.08602150
## 3 SOCIAL PSYCHOLOGY 0.08733625
## 4 COMPUTER PROGRAMMING AND DATA PROCESSING 0.09026422
## 5 VISUAL AND PERFORMING ARTS 0.09465800
## 6 LIBRARY SCIENCE 0.09484299
## 7 SCHOOL STUDENT COUNSELING 0.10174594
## 8 MILITARY TECHNOLOGIES 0.10179641
## 9 CLINICAL PSYCHOLOGY 0.10271216
## 10 MISCELLANEOUS FINE ARTS 0.15614749
mjr_salary <- all_ages %>% dplyr::select(Major,Median) %>% arrange(Median)
head(mjr_salary, 10)
## # A tibble: 10 x 2
## Major Median
## <chr> <int>
## 1 NEUROSCIENCE 35000
## 2 EARLY CHILDHOOD EDUCATION 35300
## 3 STUDIO ARTS 37600
## 4 HUMAN SERVICES AND COMMUNITY ORGANIZATION 38000
## 5 COUNSELING PSYCHOLOGY 39000
## 6 VISUAL AND PERFORMING ARTS 40000
## 7 ELEMENTARY EDUCATION 40000
## 8 TEACHER EDUCATION: MULTIPLE LEVELS 40000
## 9 LIBRARY SCIENCE 40000
## 10 COMPOSITION AND RHETORIC 40000
tail(mjr_salary, 10)
## # A tibble: 10 x 2
## Major Median
## <chr> <int>
## 1 GEOLOGICAL AND GEOPHYSICAL ENGINEERING 85000
## 2 CHEMICAL ENGINEERING 86000
## 3 ELECTRICAL ENGINEERING 88000
## 4 MATHEMATICS AND COMPUTER SCIENCE 92000
## 5 MINING AND MINERAL ENGINEERING 92000
## 6 NUCLEAR ENGINEERING 95000
## 7 METALLURGICAL ENGINEERING 96000
## 8 NAVAL ARCHITECTURE AND MARINE ENGINEERING 97000
## 9 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 106000
## 10 PETROLEUM ENGINEERING 125000
Initially the data had marginal behavior regarding the residuals. The Box cox transformation did make the residuals Normal and Homoskedacstic. In that regard the transformed model is fit to make predictions. Both the initial slope of the linear regression model of -231551 and the Box Cox exponent of -1.07 shows that unemployment rate and median salary are inversely related. That is low unemployment rates tend to have higher median salaries and high unemployment rates tend to lower salaries. This relationship is statistically significant, with a p-value of 0.0001225, even after influencing outliers were removed. However, the effect is weak with an R\(^2\) of 0.08432 after outleirs are removed. This means that only about 8.432% of the variability of median salary can be explained by unemployment rate.
We suggest to students who are researching the prospects of college majors is to treat underemployment rates and salary statistics separately. Do not just go off of advise like, “You’ll make a mint in this field” or “They’re hiring a lot of people in that field”. It does no good if a student accrues $100,000 in debt to be virtually guaranteed a job where they can’t pay the debt off, or they could pay it off if they get a job in that field, but the chances of that are small.
The wage gap that exists between men and women in the labor force is well documented9. Millenials have also been noted for redefining gender roles10. The recent grads data set includes data on the number of males and females earning degrees in each major. Millenials are defined as those between the ages of 14-34 as of the time of this writing, and are represented in the recent graduates data set. It may be of interest to prospective college students to choose majors with high gender inequity to correct those inequities through positive action. We will preform analysis of the data to that end so perspective students can make an informed discussion.
For defining male or female majority majors, we must account for the fact that 57% of college students are female11. Therefore a gender balanced major would be 57% female, 43% male, which would represent the underlying student population. We will use \(\pm 10\%\): 67% female / 47% female as the threshold for gender imbalance.
We begin by identifying top gender-unequal majors and with t-tests and KS-tests to verify the gender gap in pay.
gend_rct_grad <- rct_grad %>% dplyr::select(Major, ShareWomen, Median) %>% filter(Major != "FOOD SCIENCE")%>%arrange(ShareWomen)
head(gend_rct_grad, 10)
## # A tibble: 10 x 3
## Major ShareWomen Median
## <chr> <dbl> <int>
## 1 MILITARY TECHNOLOGIES 0.00000000 40000
## 2 MECHANICAL ENGINEERING RELATED TECHNOLOGIES 0.07745303 40000
## 3 CONSTRUCTION SERVICES 0.09071251 50000
## 4 MINING AND MINERAL ENGINEERING 0.10185185 75000
## 5 NAVAL ARCHITECTURE AND MARINE ENGINEERING 0.10731320 70000
## 6 MECHANICAL ENGINEERING 0.11955890 60000
## 7 PETROLEUM ENGINEERING 0.12056434 110000
## 8 TRANSPORTATION SCIENCES AND TECHNOLOGIES 0.12495049 35000
## 9 FORESTRY 0.12503465 35000
## 10 AEROSPACE ENGINEERING 0.13979280 60000
tail(gend_rct_grad, 10)
## # A tibble: 10 x 3
## Major ShareWomen Median
## <chr> <dbl> <int>
## 1 MISCELLANEOUS HEALTH MEDICAL PROFESSIONS 0.8812939 36000
## 2 NURSING 0.8960190 48000
## 3 SOCIAL WORK 0.9040745 30000
## 4 HUMAN SERVICES AND COMMUNITY ORGANIZATION 0.9055899 30000
## 5 SPECIAL NEEDS EDUCATION 0.9066773 35000
## 6 FAMILY AND CONSUMER SCIENCES 0.9109326 30000
## 7 ELEMENTARY EDUCATION 0.9237455 32000
## 8 MEDICAL ASSISTING SERVICES 0.9278072 42000
## 9 COMMUNICATION DISORDERS SCIENCES AND SERVICES 0.9679981 28000
## 10 EARLY CHILDHOOD EDUCATION 0.9689537 28000
male_major_salary <- rct_grad %>% dplyr::select(Major, ShareWomen, Median) %>% filter(ShareWomen <= 0.47)
female_major_salary <- rct_grad %>% dplyr::select(Major, ShareWomen, Median) %>% filter(ShareWomen >= 0.67)
boxplot(male_major_salary$Median, female_major_salary$Median, names = c("Majority Male","Majority Female"), ylab = "Median Salary USD")
t.test(male_major_salary$Median, female_major_salary$Median, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: male_major_salary$Median and female_major_salary$Median
## t = 8.4657, df = 97.588, p-value = 1.309e-13
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 11758.75 Inf
## sample estimates:
## mean of x mean of y
## 47898.57 33270.37
ks.test(male_major_salary$Median, female_major_salary$Median, alternative = "less") #Sign Convention Different
## Warning in ks.test(male_major_salary$Median, female_major_salary$Median, :
## cannot compute exact p-value with ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: male_major_salary$Median and female_major_salary$Median
## D^- = 0.69894, p-value = 1.161e-13
## alternative hypothesis: the CDF of x lies below that of y
eng_sci <- bind_rows(rct_eng, rct_sci)
la_ssc <- bind_rows(rct_la, rct_ssc)
boxplot(eng_sci$ShareWomen, la_ssc$ShareWomen, names = c("STEM", "L.A. & Social Work"), ylab = "% Women")
t.test(eng_sci$ShareWomen, la_ssc$ShareWomen, alternative = "less")
##
## Welch Two Sample t-test
##
## data: eng_sci$ShareWomen and la_ssc$ShareWomen
## t = -7.7065, df = 56.33, p-value = 1.136e-10
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.2358929
## sample estimates:
## mean of x mean of y
## 0.3080668 0.6093368
ks.test(eng_sci$ShareWomen, la_ssc$ShareWomen, alternative = "greater")
##
## Two-sample Kolmogorov-Smirnov test
##
## data: eng_sci$ShareWomen and la_ssc$ShareWomen
## D^+ = 0.74359, p-value = 4.825e-08
## alternative hypothesis: the CDF of x lies above that of y
Male Majority salary is higher than female majority salary at the 95% confidence interval. Engineer and Physical Sciences have fewer women by percent than Liberal Arts, Psychology & Social Work at the 95% confidence level. We will now make a linear regression model to understand this phenomena further.
fit4 <- lm(rct_grad$Median ~ rct_grad$ShareWomen)
summary(fit4)
##
## Call:
## lm(formula = rct_grad$Median ~ rct_grad$ShareWomen)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17261 -5474 -1007 3502 57604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56093 1705 32.90 <2e-16 ***
## rct_grad$ShareWomen -30670 2987 -10.27 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9031 on 170 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.3828, Adjusted R-squared: 0.3791
## F-statistic: 105.4 on 1 and 170 DF, p-value: < 2.2e-16
hist(resid(fit4))
plot(resid(fit4)~fitted(fit4))
qqnorm(resid(fit4))
qqline(resid(fit4))
#An outlier is effecting our linear regression. Box Cox will be used to correct.
myt2 <- boxcox(fit4)
myt2_df <- as.data.frame(myt2)
optimal_lambda2 = myt2_df[which.max(myt2$y),1] #syntax from https://rpubs.com/FelipeRego/SimpleLinearRegression
optimal_lambda2
## [1] -0.9494949
fit5 <- lm(rct_grad$Median^optimal_lambda2 ~ rct_grad$ShareWomen)
summary(fit5)
##
## Call:
## lm(formula = rct_grad$Median^optimal_lambda2 ~ rct_grad$ShareWomen)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.976e-05 -4.822e-06 1.294e-07 4.417e-06 2.016e-05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.049e-05 1.414e-06 21.57 <2e-16 ***
## rct_grad$ShareWomen 2.810e-05 2.476e-06 11.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.487e-06 on 170 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.431, Adjusted R-squared: 0.4276
## F-statistic: 128.8 on 1 and 170 DF, p-value: < 2.2e-16
hist(resid(fit5))
plot(resid(fit5)~fitted(fit5))
qqnorm(resid(fit5))
qqline(resid(fit5))
rct_grad_gend <- rct_grad %>% mutate(transMedian = Median^optimal_lambda2) %>% filter(Major != "FOOD SCIENCE")
ggplot(rct_grad_gend, aes(x = ShareWomen, y = transMedian)) +
geom_point(color = 'firebrick')+
geom_smooth(method = "lm", formula = y~x)
Note that the power will transformed the median salary, -0.9494949, means that the positive slope in the graph actually shows an inverse relationship between the percent of women in a major and the median salary. The p-value of the slope is much less than 0.05, Gender effects median salary at the 95% confidence level. Furthermore the R\(^2\) value of 0.431 is high for a social science study. The residuals are also Normally distributed and have constant variance, so this linear regression is predictive for recent college graduates in the US.
Unemployment rate gives an incomplete picture of the prospects for a given Major. Student may also be concerned with not being stuck in a low paying job12. Recent college grads being stuck in low paying retail and food industry jobs has been greatly documented in recent years. There is some debate over whether this phenomena should really be of long-term concern, or if it is just a result of the Great Recession or just part of personal growth131415. Regardless we highlight majors with high levels of low paying jobs, or people working in jobs that do not require a degree.
#low-wage jobs, we look at the % compared to total employment.
rct_grad_underemp <- rct_grad %>% mutate(Under_emp_rate = Low_wage_jobs/Total) %>% dplyr::select(Major, Under_emp_rate) %>% arrange(Under_emp_rate) %>% filter(Major != "FOOD SCIENCE") #Food science returns NA
hist(rct_grad_underemp$Under_emp_rate, main = "Histogram of Recent Graduate Low Wage Job Rate",
xlab = "Low Wage Job Rate")
head(rct_grad_underemp, 10)
## # A tibble: 10 x 2
## Major Under_emp_rate
## <chr> <dbl>
## 1 SOIL SCIENCE 0.00000000
## 2 SCHOOL STUDENT COUNSELING 0.00000000
## 3 METALLURGICAL ENGINEERING 0.00000000
## 4 NAVAL ARCHITECTURE AND MARINE ENGINEERING 0.00000000
## 5 MILITARY TECHNOLOGIES 0.00000000
## 6 MATERIALS SCIENCE 0.01892966
## 7 MISCELLANEOUS AGRICULTURE 0.02083333
## 8 MATERIALS ENGINEERING AND MATERIALS SCIENCE 0.02338791
## 9 COMPUTER ENGINEERING 0.02359058
## 10 OPERATIONS LOGISTICS AND E-COMMERCE 0.02429253
tail(rct_grad_underemp, 10)
## # A tibble: 10 x 2
## Major Under_emp_rate
## <chr> <dbl>
## 1 LIBRARY SCIENCE 0.1748634
## 2 ANTHROPOLOGY AND ARCHEOLOGY 0.1767583
## 3 COMPOSITION AND RHETORIC 0.1828734
## 4 PHYSICAL SCIENCES 0.1873259
## 5 HOSPITALITY MANAGEMENT 0.2076431
## 6 STUDIO ARTS 0.2112270
## 7 CLINICAL PSYCHOLOGY 0.2191684
## 8 MISCELLANEOUS FINE ARTS 0.2260479
## 9 DRAMA AND THEATER ARTS 0.2559134
## 10 COSMETOLOGY SERVICES AND CULINARY ARTS 0.3009515
#Jobs that dont require a degree, also comapared to
rct_grad_nodgr <- rct_grad %>% mutate(No_degree_rate = Non_college_jobs/Total) %>% dplyr::select(Major, No_degree_rate) %>% arrange(No_degree_rate) %>% filter(Major != "FOOD SCIENCE") #Food science returns NA
hist(rct_grad_nodgr$No_degree_rate, main = "Histogram of Recent Graduates % in Non-degree Jobs",
xlab = "Non-degree Job Rate")
head(rct_grad_nodgr, 10)
## # A tibble: 10 x 2
## Major No_degree_rate
## <chr> <dbl>
## 1 MILITARY TECHNOLOGIES 0.00000000
## 2 GEOLOGICAL AND GEOPHYSICAL ENGINEERING 0.06944444
## 3 NAVAL ARCHITECTURE AND MARINE ENGINEERING 0.08108108
## 4 ACTUARIAL SCIENCE 0.08313476
## 5 MATERIALS SCIENCE 0.09137649
## 6 MATERIALS ENGINEERING AND MATERIALS SCIENCE 0.10190444
## 7 MATHEMATICS AND COMPUTER SCIENCE 0.11001642
## 8 NURSING 0.12486509
## 9 SPECIAL NEEDS EDUCATION 0.13212012
## 10 ELECTRICAL ENGINEERING 0.13337913
tail(rct_grad_nodgr, 10)
## # A tibble: 10 x 2
## Major
## <chr>
## 1 INDUSTRIAL PRODUCTION TECHNOLOGIES
## 2 CRIMINOLOGY
## 3 FILM VIDEO AND PHOTOGRAPHIC ARTS
## 4 HOSPITALITY MANAGEMENT
## 5 CRIMINAL JUSTICE AND FIRE PROTECTION
## 6 DRAMA AND THEATER ARTS
## 7 MEDICAL ASSISTING SERVICES
## 8 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## 9 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
## 10 COSMETOLOGY SERVICES AND CULINARY ARTS
## # ... with 1 more variables: No_degree_rate <dbl>
For perspective students looking at majors that have high levels of employment in jobs that do not require a degree, alternatives like vocational certificates or apprenticeships may be a more economically viable alternative than a college degree.
We find that choice in college major has a significant effect on median salary and unemployment rate. This effect is seen at all age levels. Higher salaries and lower unemployment tend to favor STEM majors. Gender balance of majors also plays a significant effect on median salary. These findings that STEM and Gender affect median salary seem to be interrelated as the STEM majors tend to be male majority. There is a statistically, but not necessarily practically, significance between unemployment rate and median pay.
These data are only represent a single point in time. Measuring trends is important for perspective college student, as they need to be able to predict what the job market is going to look like when they graduate. These trends may also influence choices in graduate study. Therefore it is necessary to repeat these surveys at regular intervals, and add time series analysis to the above analysis.
In the graduate student data, no differentiation is made between masters, doctorates or professional degrees. Adding a column to future surveys will be useful as more detailed analysis can be made in terms of how level of attainment will affect earnings an unemployment rates.
As it is of interest to this paper will will list graphics of median salary by attainment (i.e., All, Recent Grad and Grad Student) and category (e.g., Humanities, Engineering).
barplot(all_ages_ag$Median, names.arg = all_ages_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_ag %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 10 x 2
## Major Median
## <chr> <int>
## 1 ANIMAL SCIENCES 46000
## 2 GENERAL AGRICULTURE 50000
## 3 PLANT SCIENCE AND AGRONOMY 50000
## 4 MISCELLANEOUS AGRICULTURE 52000
## 5 NATURAL RESOURCES MANAGEMENT 52000
## 6 AGRICULTURE PRODUCTION AND MANAGEMENT 54000
## 7 FORESTRY 58000
## 8 FOOD SCIENCE 62000
## 9 AGRICULTURAL ECONOMICS 63000
## 10 SOIL SCIENCE 63000
barplot(all_ages_art$Median, names.arg = all_ages_art$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_art %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 8 x 2
## Major Median
## <chr> <int>
## 1 STUDIO ARTS 37600
## 2 VISUAL AND PERFORMING ARTS 40000
## 3 DRAMA AND THEATER ARTS 42000
## 4 FINE ARTS 45000
## 5 MUSIC 45000
## 6 MISCELLANEOUS FINE ARTS 45000
## 7 COMMERCIAL ART AND GRAPHIC DESIGN 46600
## 8 FILM VIDEO AND PHOTOGRAPHIC ARTS 47000
barplot(all_ages_bio$Median, names.arg = all_ages_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_bio %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 14 x 2
## Major Median
## <chr> <int>
## 1 NEUROSCIENCE 35000
## 2 MOLECULAR BIOLOGY 45000
## 3 ECOLOGY 47500
## 4 GENETICS 48000
## 5 BOTANY 50000
## 6 PHYSIOLOGY 50000
## 7 BIOLOGY 51000
## 8 ENVIRONMENTAL SCIENCE 52000
## 9 MISCELLANEOUS BIOLOGY 52000
## 10 BIOCHEMICAL SCIENCES 53000
## 11 COGNITIVE SCIENCE AND BIOPSYCHOLOGY 53000
## 12 ZOOLOGY 55000
## 13 MICROBIOLOGY 60000
## 14 PHARMACOLOGY 60000
barplot(all_ages_bsn$Median, names.arg = all_ages_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_bsn %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 13 x 2
## Major Median
## <chr> <int>
## 1 HOSPITALITY MANAGEMENT 49000
## 2 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION 53000
## 3 HUMAN RESOURCES AND PERSONNEL MANAGEMENT 54000
## 4 INTERNATIONAL BUSINESS 54000
## 5 MARKETING AND MARKETING RESEARCH 56000
## 6 BUSINESS MANAGEMENT AND ADMINISTRATION 58000
## 7 GENERAL BUSINESS 60000
## 8 ACCOUNTING 65000
## 9 OPERATIONS LOGISTICS AND E-COMMERCE 65000
## 10 BUSINESS ECONOMICS 65000
## 11 FINANCE 65000
## 12 ACTUARIAL SCIENCE 72000
## 13 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS 72000
barplot(all_ages_cj$Median, names.arg = all_ages_cj$Major, horiz = TRUE, cex.names = 0.4, las =1)
all_ages_cj %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 4 x 2
## Major Median
## <chr> <int>
## 1 MASS MEDIA 48000
## 2 COMMUNICATIONS 50000
## 3 JOURNALISM 50000
## 4 ADVERTISING AND PUBLIC RELATIONS 50000
barplot(all_ages_com$Median, names.arg = all_ages_com$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_com %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 11 x 2
## Major Median
## <chr> <int>
## 1 COMMUNICATION TECHNOLOGIES 50000
## 2 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY 55000
## 3 COMPUTER NETWORKING AND TELECOMMUNICATIONS 55000
## 4 COMPUTER PROGRAMMING AND DATA PROCESSING 60000
## 5 COMPUTER AND INFORMATION SYSTEMS 65000
## 6 MATHEMATICS 66000
## 7 INFORMATION SCIENCES 68000
## 8 APPLIED MATHEMATICS 70000
## 9 STATISTICS AND DECISION SCIENCE 70000
## 10 COMPUTER SCIENCE 78000
## 11 MATHEMATICS AND COMPUTER SCIENCE 92000
barplot(all_ages_ed$Median, names.arg = all_ages_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_ed %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 16 x 2
## Major Median
## <chr> <int>
## 1 EARLY CHILDHOOD EDUCATION 35300
## 2 ELEMENTARY EDUCATION 40000
## 3 TEACHER EDUCATION: MULTIPLE LEVELS 40000
## 4 LIBRARY SCIENCE 40000
## 5 SCHOOL STUDENT COUNSELING 41000
## 6 SPECIAL NEEDS EDUCATION 42000
## 7 LANGUAGE AND DRAMA EDUCATION 42000
## 8 ART AND MUSIC EDUCATION 42600
## 9 GENERAL EDUCATION 43000
## 10 MATHEMATICS TEACHER EDUCATION 43000
## 11 SECONDARY TEACHER EDUCATION 45000
## 12 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION 45000
## 13 SCIENCE AND COMPUTER TEACHER EDUCATION 46000
## 14 PHYSICAL AND HEALTH EDUCATION TEACHING 48400
## 15 MISCELLANEOUS EDUCATION 50000
## 16 EDUCATIONAL ADMINISTRATION AND SUPERVISION 58000
barplot(all_ages_eng$Median, names.arg = all_ages_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_eng %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 29 x 2
## Major Median
## <chr> <int>
## 1 MECHANICAL ENGINEERING RELATED TECHNOLOGIES 60000
## 2 BIOLOGICAL ENGINEERING 62000
## 3 ARCHITECTURE 63000
## 4 ENGINEERING TECHNOLOGIES 63000
## 5 MISCELLANEOUS ENGINEERING TECHNOLOGIES 63000
## 6 BIOMEDICAL ENGINEERING 65000
## 7 ENGINEERING MECHANICS PHYSICS AND SCIENCE 65000
## 8 ELECTRICAL ENGINEERING TECHNOLOGY 67000
## 9 ENVIRONMENTAL ENGINEERING 70000
## 10 MISCELLANEOUS ENGINEERING 70000
## # ... with 19 more rows
barplot(all_ages_hlt$Median, names.arg = all_ages_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_hlt %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 12 x 2
## Major Median
## <chr> <int>
## 1 COMMUNICATION DISORDERS SCIENCES AND SERVICES 42000
## 2 MISCELLANEOUS HEALTH MEDICAL PROFESSIONS 45000
## 3 COMMUNITY AND PUBLIC HEALTH 47000
## 4 NUTRITION SCIENCES 49500
## 5 GENERAL MEDICAL AND HEALTH SERVICES 50000
## 6 HEALTH AND MEDICAL ADMINISTRATIVE SERVICES 50000
## 7 HEALTH AND MEDICAL PREPARATORY PROGRAMS 50000
## 8 MEDICAL ASSISTING SERVICES 55000
## 9 MEDICAL TECHNOLOGIES TECHNICIANS 60000
## 10 TREATMENT THERAPY PROFESSIONS 61000
## 11 NURSING 62000
## 12 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 106000
barplot(all_ages_ia$Median, names.arg = all_ages_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_ia %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 7 x 2
## Major Median
## <chr> <int>
## 1 COSMETOLOGY SERVICES AND CULINARY ARTS 40000
## 2 FAMILY AND CONSUMER SCIENCES 40500
## 3 PHYSICAL FITNESS PARKS RECREATION AND LEISURE 44000
## 4 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION 48000
## 5 MILITARY TECHNOLOGIES 64000
## 6 CONSTRUCTION SERVICES 65000
## 7 TRANSPORTATION SCIENCES AND TECHNOLOGIES 67000
barplot(all_ages_la$Median, names.arg = all_ages_la$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_la %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 16 x 2
## Major Median
## <chr> <int>
## 1 COMPOSITION AND RHETORIC 40000
## 2 THEOLOGY AND RELIGIOUS VOCATIONS 40000
## 3 ANTHROPOLOGY AND ARCHEOLOGY 43000
## 4 MULTI/INTERDISCIPLINARY STUDIES 43000
## 5 ART HISTORY AND CRITICISM 44500
## 6 OTHER FOREIGN LANGUAGES 45000
## 7 INTERCULTURAL AND INTERNATIONAL STUDIES 45000
## 8 PHILOSOPHY AND RELIGIOUS STUDIES 45000
## 9 AREA ETHNIC AND CIVILIZATION STUDIES 46000
## 10 HUMANITIES 46700
## 11 LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE 48000
## 12 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES 48000
## 13 ENGLISH LANGUAGE AND LITERATURE 50000
## 14 LIBERAL ARTS 50000
## 15 HISTORY 50000
## 16 UNITED STATES HISTORY 50000
barplot(all_ages_law$Median, names.arg = all_ages_law$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_law %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 5 x 2
## Major Median
## <chr> <int>
## 1 PRE-LAW AND LEGAL STUDIES 48000
## 2 COURT REPORTING 50000
## 3 CRIMINAL JUSTICE AND FIRE PROTECTION 50000
## 4 PUBLIC ADMINISTRATION 56000
## 5 PUBLIC POLICY 60000
barplot(all_ages_sci$Median, names.arg = all_ages_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_sci %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 10 x 2
## Major Median
## <chr> <int>
## 1 OCEANOGRAPHY 55000
## 2 MULTI-DISCIPLINARY OR GENERAL SCIENCE 56000
## 3 GEOSCIENCES 57000
## 4 CHEMISTRY 59000
## 5 PHYSICAL SCIENCES 60000
## 6 ATMOSPHERIC SCIENCES AND METEOROLOGY 60000
## 7 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES 62000
## 8 GEOLOGY AND EARTH SCIENCE 65000
## 9 PHYSICS 70000
## 10 ASTRONOMY AND ASTROPHYSICS 80000
barplot(rct_ag$Median, names.arg = rct_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_ag %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 10 x 2
## Major Median
## <chr> <int>
## 1 MISCELLANEOUS AGRICULTURE 29000
## 2 ANIMAL SCIENCES 30000
## 3 PLANT SCIENCE AND AGRONOMY 32000
## 4 NATURAL RESOURCES MANAGEMENT 35000
## 5 FORESTRY 35000
## 6 SOIL SCIENCE 35000
## 7 AGRICULTURE PRODUCTION AND MANAGEMENT 40000
## 8 GENERAL AGRICULTURE 40000
## 9 AGRICULTURAL ECONOMICS 40000
## 10 FOOD SCIENCE 53000
barplot(rct_art$Median, names.arg = rct_art$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_art %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 8 x 2
## Major Median
## <chr> <int>
## 1 DRAMA AND THEATER ARTS 27000
## 2 STUDIO ARTS 29000
## 3 VISUAL AND PERFORMING ARTS 30000
## 4 FINE ARTS 30500
## 5 MUSIC 31000
## 6 FILM VIDEO AND PHOTOGRAPHIC ARTS 32000
## 7 COMMERCIAL ART AND GRAPHIC DESIGN 35000
## 8 MISCELLANEOUS FINE ARTS 50000
barplot(rct_bio$Median, names.arg = rct_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_bio %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 14 x 2
## Major Median
## <chr> <int>
## 1 ZOOLOGY 26000
## 2 ECOLOGY 33000
## 3 BIOLOGY 33400
## 4 MISCELLANEOUS BIOLOGY 33500
## 5 PHYSIOLOGY 35000
## 6 NEUROSCIENCE 35000
## 7 ENVIRONMENTAL SCIENCE 35600
## 8 BOTANY 37000
## 9 BIOCHEMICAL SCIENCES 37400
## 10 MICROBIOLOGY 38000
## 11 MOLECULAR BIOLOGY 40000
## 12 GENETICS 40000
## 13 COGNITIVE SCIENCE AND BIOPSYCHOLOGY 41000
## 14 PHARMACOLOGY 45000
barplot(rct_bsn$Median, names.arg = rct_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_bsn %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 13 x 2
## Major Median
## <chr> <int>
## 1 HOSPITALITY MANAGEMENT 33000
## 2 HUMAN RESOURCES AND PERSONNEL MANAGEMENT 36000
## 3 BUSINESS MANAGEMENT AND ADMINISTRATION 38000
## 4 MARKETING AND MARKETING RESEARCH 38000
## 5 GENERAL BUSINESS 40000
## 6 INTERNATIONAL BUSINESS 40000
## 7 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION 40000
## 8 ACCOUNTING 45000
## 9 BUSINESS ECONOMICS 46000
## 10 FINANCE 47000
## 11 OPERATIONS LOGISTICS AND E-COMMERCE 50000
## 12 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS 51000
## 13 ACTUARIAL SCIENCE 62000
barplot(rct_cj$Median, names.arg = rct_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_cj %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 4 x 2
## Major Median
## <chr> <int>
## 1 MASS MEDIA 33000
## 2 COMMUNICATIONS 35000
## 3 JOURNALISM 35000
## 4 ADVERTISING AND PUBLIC RELATIONS 35000
barplot(rct_com$Median, names.arg = rct_com$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_com %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 11 x 2
## Major Median
## <chr> <int>
## 1 COMMUNICATION TECHNOLOGIES 35000
## 2 COMPUTER NETWORKING AND TELECOMMUNICATIONS 36400
## 3 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY 37500
## 4 COMPUTER PROGRAMMING AND DATA PROCESSING 41300
## 5 MATHEMATICS AND COMPUTER SCIENCE 42000
## 6 MATHEMATICS 45000
## 7 COMPUTER AND INFORMATION SYSTEMS 45000
## 8 INFORMATION SCIENCES 45000
## 9 STATISTICS AND DECISION SCIENCE 45000
## 10 APPLIED MATHEMATICS 45000
## 11 COMPUTER SCIENCE 53000
barplot(rct_ed$Median, names.arg = rct_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_ed %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 16 x 2
## Major Median
## <chr> <int>
## 1 LIBRARY SCIENCE 22000
## 2 EARLY CHILDHOOD EDUCATION 28000
## 3 TEACHER EDUCATION: MULTIPLE LEVELS 30000
## 4 PHYSICAL AND HEALTH EDUCATION TEACHING 31000
## 5 ELEMENTARY EDUCATION 32000
## 6 SCIENCE AND COMPUTER TEACHER EDUCATION 32000
## 7 ART AND MUSIC EDUCATION 32100
## 8 SECONDARY TEACHER EDUCATION 32500
## 9 LANGUAGE AND DRAMA EDUCATION 33000
## 10 MISCELLANEOUS EDUCATION 33000
## 11 GENERAL EDUCATION 34000
## 12 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION 34000
## 13 MATHEMATICS TEACHER EDUCATION 34000
## 14 EDUCATIONAL ADMINISTRATION AND SUPERVISION 34000
## 15 SPECIAL NEEDS EDUCATION 35000
## 16 SCHOOL STUDENT COUNSELING 41000
barplot(rct_eng$Median, names.arg = rct_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_eng %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 29 x 2
## Major Median
## <chr> <int>
## 1 ARCHITECTURE 40000
## 2 MISCELLANEOUS ENGINEERING TECHNOLOGIES 40000
## 3 MECHANICAL ENGINEERING RELATED TECHNOLOGIES 40000
## 4 ENGINEERING AND INDUSTRIAL MANAGEMENT 44000
## 5 INDUSTRIAL PRODUCTION TECHNOLOGIES 46000
## 6 CIVIL ENGINEERING 50000
## 7 MISCELLANEOUS ENGINEERING 50000
## 8 ENVIRONMENTAL ENGINEERING 50000
## 9 ENGINEERING TECHNOLOGIES 50000
## 10 GEOLOGICAL AND GEOPHYSICAL ENGINEERING 50000
## # ... with 19 more rows
barplot(rct_hlt$Median, names.arg = rct_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_hlt %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 12 x 2
## Major Median
## <chr> <int>
## 1 COMMUNICATION DISORDERS SCIENCES AND SERVICES 28000
## 2 GENERAL MEDICAL AND HEALTH SERVICES 32400
## 3 TREATMENT THERAPY PROFESSIONS 33000
## 4 HEALTH AND MEDICAL PREPARATORY PROGRAMS 33500
## 5 COMMUNITY AND PUBLIC HEALTH 34000
## 6 NUTRITION SCIENCES 35000
## 7 HEALTH AND MEDICAL ADMINISTRATIVE SERVICES 35000
## 8 MISCELLANEOUS HEALTH MEDICAL PROFESSIONS 36000
## 9 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 40000
## 10 MEDICAL ASSISTING SERVICES 42000
## 11 MEDICAL TECHNOLOGIES TECHNICIANS 45000
## 12 NURSING 48000
barplot(rct_ia$Median, names.arg = rct_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_ia %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 7 x 2
## Major Median
## <chr> <int>
## 1 COSMETOLOGY SERVICES AND CULINARY ARTS 29000
## 2 FAMILY AND CONSUMER SCIENCES 30000
## 3 PHYSICAL FITNESS PARKS RECREATION AND LEISURE 32000
## 4 TRANSPORTATION SCIENCES AND TECHNOLOGIES 35000
## 5 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION 38400
## 6 MILITARY TECHNOLOGIES 40000
## 7 CONSTRUCTION SERVICES 50000
barplot(rct_la$Median, names.arg = rct_la$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_la %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 16 x 2
## Major Median
## <chr> <int>
## 1 COMPOSITION AND RHETORIC 27000
## 2 OTHER FOREIGN LANGUAGES 27500
## 3 ANTHROPOLOGY AND ARCHEOLOGY 28000
## 4 THEOLOGY AND RELIGIOUS VOCATIONS 29000
## 5 HUMANITIES 30000
## 6 ART HISTORY AND CRITICISM 31000
## 7 ENGLISH LANGUAGE AND LITERATURE 32000
## 8 LIBERAL ARTS 32000
## 9 PHILOSOPHY AND RELIGIOUS STUDIES 32200
## 10 LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE 33000
## 11 HISTORY 34000
## 12 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES 34000
## 13 INTERCULTURAL AND INTERNATIONAL STUDIES 34000
## 14 AREA ETHNIC AND CIVILIZATION STUDIES 35000
## 15 MULTI/INTERDISCIPLINARY STUDIES 35000
## 16 UNITED STATES HISTORY 40000
barplot(rct_law$Median, names.arg = rct_law$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_law %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 5 x 2
## Major Median
## <chr> <int>
## 1 CRIMINAL JUSTICE AND FIRE PROTECTION 35000
## 2 PRE-LAW AND LEGAL STUDIES 36000
## 3 PUBLIC ADMINISTRATION 36000
## 4 PUBLIC POLICY 50000
## 5 COURT REPORTING 54000
barplot(rct_sci$Median, names.arg = rct_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_sci %>% dplyr::select(Major, Median) %>% arrange(Median)
## # A tibble: 10 x 2
## Major Median
## <chr> <int>
## 1 MULTI-DISCIPLINARY OR GENERAL SCIENCE 35000
## 2 ATMOSPHERIC SCIENCES AND METEOROLOGY 35000
## 3 GEOSCIENCES 36000
## 4 GEOLOGY AND EARTH SCIENCE 36200
## 5 CHEMISTRY 39000
## 6 PHYSICAL SCIENCES 40000
## 7 OCEANOGRAPHY 44700
## 8 PHYSICS 45000
## 9 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES 46000
## 10 ASTRONOMY AND ASTROPHYSICS 62000
barplot(grad_ag$Grad_median, names.arg = grad_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_ag %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 10 x 2
## Major Grad_median
## <chr> <dbl>
## 1 MISCELLANEOUS AGRICULTURE 54000
## 2 SOIL SCIENCE 65000
## 3 AGRICULTURE PRODUCTION AND MANAGEMENT 67000
## 4 PLANT SCIENCE AND AGRONOMY 67000
## 5 GENERAL AGRICULTURE 68000
## 6 NATURAL RESOURCES MANAGEMENT 70000
## 7 ANIMAL SCIENCES 70300
## 8 FOOD SCIENCE 72000
## 9 FORESTRY 78000
## 10 AGRICULTURAL ECONOMICS 80000
barplot(grad_art$Grad_median, names.arg = grad_art$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_art %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 8 x 2
## Major Grad_median
## <chr> <dbl>
## 1 STUDIO ARTS 50750
## 2 VISUAL AND PERFORMING ARTS 53000
## 3 MISCELLANEOUS FINE ARTS 55000
## 4 FILM VIDEO AND PHOTOGRAPHIC ARTS 57000
## 5 FINE ARTS 58000
## 6 DRAMA AND THEATER ARTS 58600
## 7 COMMERCIAL ART AND GRAPHIC DESIGN 60000
## 8 MUSIC 60000
barplot(grad_bio$Grad_median, names.arg = grad_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_bio %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 14 x 2
## Major Grad_median
## <chr> <dbl>
## 1 NEUROSCIENCE 58000
## 2 ECOLOGY 62000
## 3 MISCELLANEOUS BIOLOGY 65000
## 4 ENVIRONMENTAL SCIENCE 68000
## 5 BOTANY 70000
## 6 GENETICS 78000
## 7 MICROBIOLOGY 85000
## 8 MOLECULAR BIOLOGY 85000
## 9 PHYSIOLOGY 90000
## 10 COGNITIVE SCIENCE AND BIOPSYCHOLOGY 95000
## 11 BIOLOGY 95000
## 12 BIOCHEMICAL SCIENCES 96000
## 13 PHARMACOLOGY 105000
## 14 ZOOLOGY 110000
barplot(grad_bsn$Grad_median, names.arg = grad_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_bsn %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 13 x 2
## Major Grad_median
## <chr> <dbl>
## 1 HOSPITALITY MANAGEMENT 65000
## 2 HUMAN RESOURCES AND PERSONNEL MANAGEMENT 70000
## 3 INTERNATIONAL BUSINESS 72000
## 4 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION 75000
## 5 BUSINESS MANAGEMENT AND ADMINISTRATION 77000
## 6 MARKETING AND MARKETING RESEARCH 80000
## 7 GENERAL BUSINESS 85000
## 8 ACCOUNTING 88000
## 9 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS 89000
## 10 OPERATIONS LOGISTICS AND E-COMMERCE 94000
## 11 BUSINESS ECONOMICS 94000
## 12 FINANCE 95000
## 13 ACTUARIAL SCIENCE 110000
barplot(grad_cj$Grad_median, names.arg = grad_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_cj %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 4 x 2
## Major Grad_median
## <chr> <dbl>
## 1 MASS MEDIA 57000
## 2 ADVERTISING AND PUBLIC RELATIONS 60000
## 3 COMMUNICATIONS 65000
## 4 JOURNALISM 70000
barplot(grad_com$Grad_median, names.arg = grad_com$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_com %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 11 x 2
## Major Grad_median
## <chr> <dbl>
## 1 COMMUNICATION TECHNOLOGIES 57000
## 2 COMPUTER NETWORKING AND TELECOMMUNICATIONS 80000
## 3 COMPUTER AND INFORMATION SYSTEMS 80000
## 4 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY 81000
## 5 INFORMATION SCIENCES 84000
## 6 COMPUTER PROGRAMMING AND DATA PROCESSING 85000
## 7 MATHEMATICS 89000
## 8 STATISTICS AND DECISION SCIENCE 92000
## 9 COMPUTER SCIENCE 95000
## 10 MATHEMATICS AND COMPUTER SCIENCE 98000
## 11 APPLIED MATHEMATICS 100000
barplot(grad_ed$Grad_median, names.arg = grad_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_ed %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 16 x 2
## Major Grad_median
## <chr> <dbl>
## 1 EARLY CHILDHOOD EDUCATION 50000
## 2 LIBRARY SCIENCE 52000
## 3 ELEMENTARY EDUCATION 55000
## 4 TEACHER EDUCATION: MULTIPLE LEVELS 55000
## 5 SCHOOL STUDENT COUNSELING 56000
## 6 GENERAL EDUCATION 58000
## 7 LANGUAGE AND DRAMA EDUCATION 58000
## 8 SPECIAL NEEDS EDUCATION 58000
## 9 ART AND MUSIC EDUCATION 59000
## 10 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION 60000
## 11 MATHEMATICS TEACHER EDUCATION 60000
## 12 MISCELLANEOUS EDUCATION 61000
## 13 SECONDARY TEACHER EDUCATION 61000
## 14 SCIENCE AND COMPUTER TEACHER EDUCATION 62000
## 15 PHYSICAL AND HEALTH EDUCATION TEACHING 65000
## 16 EDUCATIONAL ADMINISTRATION AND SUPERVISION 65000
barplot(grad_eng$Grad_median, names.arg = grad_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_eng %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 29 x 2
## Major Grad_median
## <chr> <dbl>
## 1 ARCHITECTURE 72000
## 2 ENGINEERING TECHNOLOGIES 74000
## 3 MECHANICAL ENGINEERING RELATED TECHNOLOGIES 78000
## 4 ARCHITECTURAL ENGINEERING 78000
## 5 MISCELLANEOUS ENGINEERING TECHNOLOGIES 80000
## 6 BIOLOGICAL ENGINEERING 80000
## 7 ENVIRONMENTAL ENGINEERING 81000
## 8 INDUSTRIAL PRODUCTION TECHNOLOGIES 84500
## 9 ELECTRICAL ENGINEERING TECHNOLOGY 85000
## 10 MISCELLANEOUS ENGINEERING 90000
## # ... with 19 more rows
barplot(grad_hlt$Grad_median, names.arg = grad_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_hlt %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 12 x 2
## Major Grad_median
## <chr> <dbl>
## 1 MISCELLANEOUS HEALTH MEDICAL PROFESSIONS 60000
## 2 NUTRITION SCIENCES 65000
## 3 COMMUNICATION DISORDERS SCIENCES AND SERVICES 65000
## 4 COMMUNITY AND PUBLIC HEALTH 68500
## 5 TREATMENT THERAPY PROFESSIONS 70000
## 6 GENERAL MEDICAL AND HEALTH SERVICES 70000
## 7 MEDICAL TECHNOLOGIES TECHNICIANS 76000
## 8 HEALTH AND MEDICAL ADMINISTRATIVE SERVICES 79000
## 9 MEDICAL ASSISTING SERVICES 80000
## 10 NURSING 84000
## 11 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 111000
## 12 HEALTH AND MEDICAL PREPARATORY PROGRAMS 135000
barplot(grad_ia$Grad_median, names.arg = grad_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_ia %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 7 x 2
## Major
## <chr>
## 1 COSMETOLOGY SERVICES AND CULINARY ARTS
## 2 FAMILY AND CONSUMER SCIENCES
## 3 PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 4 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## 5 MILITARY TECHNOLOGIES
## 6 CONSTRUCTION SERVICES
## 7 TRANSPORTATION SCIENCES AND TECHNOLOGIES
## # ... with 1 more variables: Grad_median <dbl>
barplot(grad_la$Grad_median, names.arg = grad_la$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_la %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 16 x 2
## Major
## <chr>
## 1 THEOLOGY AND RELIGIOUS VOCATIONS
## 2 MULTI/INTERDISCIPLINARY STUDIES
## 3 COMPOSITION AND RHETORIC
## 4 HUMANITIES
## 5 ART HISTORY AND CRITICISM
## 6 LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
## 7 ANTHROPOLOGY AND ARCHEOLOGY
## 8 PHILOSOPHY AND RELIGIOUS STUDIES
## 9 ENGLISH LANGUAGE AND LITERATURE
## 10 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
## 11 OTHER FOREIGN LANGUAGES
## 12 LIBERAL ARTS
## 13 INTERCULTURAL AND INTERNATIONAL STUDIES
## 14 AREA ETHNIC AND CIVILIZATION STUDIES
## 15 HISTORY
## 16 UNITED STATES HISTORY
## # ... with 1 more variables: Grad_median <dbl>
barplot(grad_law$Grad_median, names.arg = grad_law$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_law %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 5 x 2
## Major Grad_median
## <chr> <dbl>
## 1 CRIMINAL JUSTICE AND FIRE PROTECTION 68000
## 2 COURT REPORTING 75000
## 3 PUBLIC ADMINISTRATION 75000
## 4 PRE-LAW AND LEGAL STUDIES 76000
## 5 PUBLIC POLICY 89000
barplot(grad_sci$Grad_median, names.arg = grad_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_sci %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)
## # A tibble: 10 x 2
## Major Grad_median
## <chr> <dbl>
## 1 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES 80000
## 2 PHYSICAL SCIENCES 80000
## 3 ATMOSPHERIC SCIENCES AND METEOROLOGY 82000
## 4 GEOLOGY AND EARTH SCIENCE 84000
## 5 MULTI-DISCIPLINARY OR GENERAL SCIENCE 86000
## 6 OCEANOGRAPHY 90000
## 7 GEOSCIENCES 90000
## 8 ASTRONOMY AND ASTROPHYSICS 96000
## 9 CHEMISTRY 100000
## 10 PHYSICS 100000
barplot(all_ages_ag$Unemployment_rate, names.arg = all_ages_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_ag %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 10 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 GENERAL AGRICULTURE 0.02614711
## 2 AGRICULTURE PRODUCTION AND MANAGEMENT 0.02863606
## 3 AGRICULTURAL ECONOMICS 0.03024832
## 4 PLANT SCIENCE AND AGRONOMY 0.03179089
## 5 MISCELLANEOUS AGRICULTURE 0.03923042
## 6 FORESTRY 0.04256333
## 7 ANIMAL SCIENCES 0.04267890
## 8 FOOD SCIENCE 0.04918845
## 9 SOIL SCIENCE 0.05086705
## 10 NATURAL RESOURCES MANAGEMENT 0.05434128
barplot(all_ages_art$Unemployment_rate, names.arg = all_ages_art$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_art %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 8 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 MUSIC 0.05471919
## 2 FINE ARTS 0.07175327
## 3 COMMERCIAL ART AND GRAPHIC DESIGN 0.07391972
## 4 DRAMA AND THEATER ARTS 0.08027373
## 5 STUDIO ARTS 0.08371383
## 6 FILM VIDEO AND PHOTOGRAPHIC ARTS 0.08561891
## 7 VISUAL AND PERFORMING ARTS 0.09465800
## 8 MISCELLANEOUS FINE ARTS 0.15614749
barplot(all_ages_bio$Unemployment_rate, names.arg = all_ages_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_bio %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 14 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 PHARMACOLOGY 0.01611080
## 2 BOTANY 0.03402351
## 3 GENETICS 0.04159095
## 4 MISCELLANEOUS BIOLOGY 0.04758244
## 5 ZOOLOGY 0.04836260
## 6 COGNITIVE SCIENCE AND BIOPSYCHOLOGY 0.04887283
## 7 ECOLOGY 0.04891699
## 8 MICROBIOLOGY 0.05088075
## 9 PHYSIOLOGY 0.05113946
## 10 ENVIRONMENTAL SCIENCE 0.05128983
## 11 BIOLOGY 0.05930117
## 12 MOLECULAR BIOLOGY 0.06053708
## 13 NEUROSCIENCE 0.06889764
## 14 BIOCHEMICAL SCIENCES 0.07159753
barplot(all_ages_bsn$Unemployment_rate, names.arg = all_ages_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_bsn %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 13 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 OPERATIONS LOGISTICS AND E-COMMERCE 0.04326826
## 2 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS 0.04397714
## 3 FINANCE 0.04847293
## 4 GENERAL BUSINESS 0.05137753
## 5 HOSPITALITY MANAGEMENT 0.05144698
## 6 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION 0.05267856
## 7 ACCOUNTING 0.05341467
## 8 MARKETING AND MARKETING RESEARCH 0.05503289
## 9 ACTUARIAL SCIENCE 0.05606352
## 10 BUSINESS MANAGEMENT AND ADMINISTRATION 0.05886534
## 11 HUMAN RESOURCES AND PERSONNEL MANAGEMENT 0.06074809
## 12 BUSINESS ECONOMICS 0.06174857
## 13 INTERNATIONAL BUSINESS 0.07135371
barplot(all_ages_cj$Unemployment_rate, names.arg = all_ages_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_cj %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 4 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 JOURNALISM 0.06191675
## 2 COMMUNICATIONS 0.06436031
## 3 ADVERTISING AND PUBLIC RELATIONS 0.06721626
## 4 MASS MEDIA 0.08300476
barplot(all_ages_com$Unemployment_rate, names.arg = all_ages_com$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_com %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 11 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 MATHEMATICS AND COMPUTER SCIENCE 0.02490040
## 2 COMPUTER SCIENCE 0.04951866
## 3 COMPUTER AND INFORMATION SYSTEMS 0.05189124
## 4 INFORMATION SCIENCES 0.05284106
## 5 MATHEMATICS 0.05293608
## 6 APPLIED MATHEMATICS 0.05565261
## 7 STATISTICS AND DECISION SCIENCE 0.05705405
## 8 COMPUTER NETWORKING AND TELECOMMUNICATIONS 0.05869412
## 9 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY 0.07504572
## 10 COMMUNICATION TECHNOLOGIES 0.08500867
## 11 COMPUTER PROGRAMMING AND DATA PROCESSING 0.09026422
barplot(all_ages_ed$Unemployment_rate, names.arg = all_ages_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_ed %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 16 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 EDUCATIONAL ADMINISTRATION AND SUPERVISION 0.00000000
## 2 MATHEMATICS TEACHER EDUCATION 0.03298302
## 3 TEACHER EDUCATION: MULTIPLE LEVELS 0.03335686
## 4 ELEMENTARY EDUCATION 0.03835916
## 5 MISCELLANEOUS EDUCATION 0.03921524
## 6 ART AND MUSIC EDUCATION 0.04097337
## 7 SCIENCE AND COMPUTER TEACHER EDUCATION 0.04219989
## 8 SECONDARY TEACHER EDUCATION 0.04375568
## 9 GENERAL EDUCATION 0.04390352
## 10 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION 0.04601320
## 11 PHYSICAL AND HEALTH EDUCATION TEACHING 0.04626696
## 12 SPECIAL NEEDS EDUCATION 0.04714466
## 13 LANGUAGE AND DRAMA EDUCATION 0.04808029
## 14 EARLY CHILDHOOD EDUCATION 0.04935065
## 15 LIBRARY SCIENCE 0.09484299
## 16 SCHOOL STUDENT COUNSELING 0.10174594
barplot(all_ages_eng$Unemployment_rate, names.arg = all_ages_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_eng %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 29 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 GEOLOGICAL AND GEOPHYSICAL ENGINEERING 0.00000000
## 2 MATERIALS SCIENCE 0.02233333
## 3 NAVAL ARCHITECTURE AND MARINE ENGINEERING 0.04030882
## 4 AEROSPACE ENGINEERING 0.04197131
## 5 PETROLEUM ENGINEERING 0.04220535
## 6 MECHANICAL ENGINEERING RELATED TECHNOLOGIES 0.04353327
## 7 ENGINEERING MECHANICS PHYSICS AND SCIENCE 0.04380452
## 8 MECHANICAL ENGINEERING 0.04384386
## 9 METALLURGICAL ENGINEERING 0.04487268
## 10 ENVIRONMENTAL ENGINEERING 0.04573200
## # ... with 19 more rows
barplot(all_ages_hlt$Unemployment_rate, names.arg = all_ages_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_hlt %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 12 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 TREATMENT THERAPY PROFESSIONS 0.02629160
## 2 NURSING 0.02679682
## 3 MEDICAL ASSISTING SERVICES 0.03135685
## 4 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 0.03435768
## 5 MEDICAL TECHNOLOGIES TECHNICIANS 0.03620987
## 6 COMMUNICATION DISORDERS SCIENCES AND SERVICES 0.04646718
## 7 MISCELLANEOUS HEALTH MEDICAL PROFESSIONS 0.05357271
## 8 GENERAL MEDICAL AND HEALTH SERVICES 0.05470063
## 9 HEALTH AND MEDICAL ADMINISTRATIVE SERVICES 0.05700398
## 10 NUTRITION SCIENCES 0.06321655
## 11 COMMUNITY AND PUBLIC HEALTH 0.06652770
## 12 HEALTH AND MEDICAL PREPARATORY PROGRAMS 0.07000979
barplot(all_ages_ia$Unemployment_rate, names.arg = all_ages_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_ia %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 7 x 2
## Major
## <chr>
## 1 PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 2 TRANSPORTATION SCIENCES AND TECHNOLOGIES
## 3 CONSTRUCTION SERVICES
## 4 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## 5 COSMETOLOGY SERVICES AND CULINARY ARTS
## 6 FAMILY AND CONSUMER SCIENCES
## 7 MILITARY TECHNOLOGIES
## # ... with 1 more variables: Unemployment_rate <dbl>
barplot(all_ages_la$Unemployment_rate, names.arg = all_ages_la$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_la %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 16 x 2
## Major
## <chr>
## 1 THEOLOGY AND RELIGIOUS VOCATIONS
## 2 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
## 3 ART HISTORY AND CRITICISM
## 4 HISTORY
## 5 LIBERAL ARTS
## 6 AREA ETHNIC AND CIVILIZATION STUDIES
## 7 ENGLISH LANGUAGE AND LITERATURE
## 8 OTHER FOREIGN LANGUAGES
## 9 UNITED STATES HISTORY
## 10 COMPOSITION AND RHETORIC
## 11 INTERCULTURAL AND INTERNATIONAL STUDIES
## 12 PHILOSOPHY AND RELIGIOUS STUDIES
## 13 MULTI/INTERDISCIPLINARY STUDIES
## 14 HUMANITIES
## 15 ANTHROPOLOGY AND ARCHEOLOGY
## 16 LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
## # ... with 1 more variables: Unemployment_rate <dbl>
barplot(all_ages_law$Unemployment_rate, names.arg = all_ages_law$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_law %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 5 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 CRIMINAL JUSTICE AND FIRE PROTECTION 0.05403559
## 2 COURT REPORTING 0.06651258
## 3 PUBLIC ADMINISTRATION 0.06965492
## 4 PRE-LAW AND LEGAL STUDIES 0.06984780
## 5 PUBLIC POLICY 0.07921692
barplot(all_ages_sci$Unemployment_rate, names.arg = all_ages_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)
all_ages_sci %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 10 x 2
## Major
## <chr>
## 1 ATMOSPHERIC SCIENCES AND METEOROLOGY
## 2 PHYSICAL SCIENCES
## 3 GEOSCIENCES
## 4 MULTI-DISCIPLINARY OR GENERAL SCIENCE
## 5 PHYSICS
## 6 OCEANOGRAPHY
## 7 CHEMISTRY
## 8 GEOLOGY AND EARTH SCIENCE
## 9 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
## 10 ASTRONOMY AND ASTROPHYSICS
## # ... with 1 more variables: Unemployment_rate <dbl>
barplot(rct_ag$Unemployment_rate, names.arg = rct_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_ag %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 10 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 SOIL SCIENCE 0.00000000
## 2 GENERAL AGRICULTURE 0.01964246
## 3 PLANT SCIENCE AND AGRONOMY 0.04545454
## 4 AGRICULTURE PRODUCTION AND MANAGEMENT 0.05003084
## 5 ANIMAL SCIENCES 0.05086250
## 6 MISCELLANEOUS AGRICULTURE 0.05976676
## 7 NATURAL RESOURCES MANAGEMENT 0.06661920
## 8 AGRICULTURAL ECONOMICS 0.07724958
## 9 FORESTRY 0.09672574
## 10 FOOD SCIENCE 0.09693146
barplot(rct_art$Unemployment_rate, names.arg = rct_art$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_art %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 8 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 MUSIC 0.07595967
## 2 DRAMA AND THEATER ARTS 0.07754113
## 3 FINE ARTS 0.08418630
## 4 MISCELLANEOUS FINE ARTS 0.08937500
## 5 STUDIO ARTS 0.08955224
## 6 COMMERCIAL ART AND GRAPHIC DESIGN 0.09679758
## 7 VISUAL AND PERFORMING ARTS 0.10219742
## 8 FILM VIDEO AND PHOTOGRAPHIC ARTS 0.10577224
barplot(rct_bio$Unemployment_rate, names.arg = rct_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_bio %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 14 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 BOTANY 0.00000000
## 2 GENETICS 0.03411765
## 3 ZOOLOGY 0.04632028
## 4 NEUROSCIENCE 0.04848168
## 5 ECOLOGY 0.05447519
## 6 MISCELLANEOUS BIOLOGY 0.05854546
## 7 MICROBIOLOGY 0.06677587
## 8 PHYSIOLOGY 0.06916280
## 9 BIOLOGY 0.07072473
## 10 COGNITIVE SCIENCE AND BIOPSYCHOLOGY 0.07523617
## 11 ENVIRONMENTAL SCIENCE 0.07858468
## 12 BIOCHEMICAL SCIENCES 0.08053138
## 13 MOLECULAR BIOLOGY 0.08436116
## 14 PHARMACOLOGY 0.08553157
barplot(rct_bsn$Unemployment_rate, names.arg = rct_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_bsn %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 13 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 OPERATIONS LOGISTICS AND E-COMMERCE 0.04785870
## 2 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS 0.05823961
## 3 HUMAN RESOURCES AND PERSONNEL MANAGEMENT 0.05956965
## 4 FINANCE 0.06068636
## 5 HOSPITALITY MANAGEMENT 0.06116919
## 6 MARKETING AND MARKETING RESEARCH 0.06121506
## 7 ACCOUNTING 0.06974901
## 8 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION 0.07198297
## 9 BUSINESS MANAGEMENT AND ADMINISTRATION 0.07221834
## 10 GENERAL BUSINESS 0.07286147
## 11 ACTUARIAL SCIENCE 0.09565217
## 12 INTERNATIONAL BUSINESS 0.09617506
## 13 BUSINESS ECONOMICS 0.09644838
barplot(rct_cj$Unemployment_rate, names.arg = rct_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_cj %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 4 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 ADVERTISING AND PUBLIC RELATIONS 0.06796077
## 2 JOURNALISM 0.06917644
## 3 COMMUNICATIONS 0.07517698
## 4 MASS MEDIA 0.08983683
barplot(rct_com$Unemployment_rate, names.arg = rct_com$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_com %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 11 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 MATHEMATICS AND COMPUTER SCIENCE 0.00000000
## 2 MATHEMATICS 0.04727714
## 3 INFORMATION SCIENCES 0.06074144
## 4 COMPUTER SCIENCE 0.06317277
## 5 STATISTICS AND DECISION SCIENCE 0.08627367
## 6 APPLIED MATHEMATICS 0.09082331
## 7 COMPUTER AND INFORMATION SYSTEMS 0.09346033
## 8 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY 0.09972338
## 9 COMPUTER PROGRAMMING AND DATA PROCESSING 0.11398259
## 10 COMMUNICATION TECHNOLOGIES 0.11951147
## 11 COMPUTER NETWORKING AND TELECOMMUNICATIONS 0.15184981
barplot(rct_ed$Unemployment_rate, names.arg = rct_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_ed %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 16 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 EDUCATIONAL ADMINISTRATION AND SUPERVISION 0.00000000
## 2 MATHEMATICS TEACHER EDUCATION 0.01620283
## 3 TEACHER EDUCATION: MULTIPLE LEVELS 0.03654583
## 4 ART AND MUSIC EDUCATION 0.03863775
## 5 EARLY CHILDHOOD EDUCATION 0.04010498
## 6 SPECIAL NEEDS EDUCATION 0.04150782
## 7 ELEMENTARY EDUCATION 0.04658571
## 8 SCIENCE AND COMPUTER TEACHER EDUCATION 0.04726368
## 9 LANGUAGE AND DRAMA EDUCATION 0.05030643
## 10 SECONDARY TEACHER EDUCATION 0.05222898
## 11 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION 0.05408294
## 12 GENERAL EDUCATION 0.05735993
## 13 MISCELLANEOUS EDUCATION 0.05921195
## 14 PHYSICAL AND HEALTH EDUCATION TEACHING 0.07466750
## 15 LIBRARY SCIENCE 0.10494572
## 16 SCHOOL STUDENT COUNSELING 0.10757946
barplot(rct_eng$Unemployment_rate, names.arg = rct_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_eng %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 29 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 ENGINEERING MECHANICS PHYSICS AND SCIENCE 0.006334343
## 2 PETROLEUM ENGINEERING 0.018380527
## 3 MATERIALS SCIENCE 0.023042836
## 4 METALLURGICAL ENGINEERING 0.024096386
## 5 MATERIALS ENGINEERING AND MATERIALS SCIENCE 0.027788805
## 6 INDUSTRIAL PRODUCTION TECHNOLOGIES 0.028308097
## 7 ENGINEERING AND INDUSTRIAL MANAGEMENT 0.033651660
## 8 INDUSTRIAL AND MANUFACTURING ENGINEERING 0.042875544
## 9 NAVAL ARCHITECTURE AND MARINE ENGINEERING 0.050125313
## 10 MISCELLANEOUS ENGINEERING TECHNOLOGIES 0.052538520
## # ... with 19 more rows
barplot(rct_hlt$Unemployment_rate, names.arg = rct_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_hlt %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 12 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 MEDICAL TECHNOLOGIES TECHNICIANS 0.03698279
## 2 MEDICAL ASSISTING SERVICES 0.04250653
## 3 NURSING 0.04486272
## 4 COMMUNICATION DISORDERS SCIENCES AND SERVICES 0.04758400
## 5 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 0.05552083
## 6 TREATMENT THERAPY PROFESSIONS 0.05982121
## 7 NUTRITION SCIENCES 0.06870068
## 8 HEALTH AND MEDICAL PREPARATORY PROGRAMS 0.06977971
## 9 MISCELLANEOUS HEALTH MEDICAL PROFESSIONS 0.08141125
## 10 GENERAL MEDICAL AND HEALTH SERVICES 0.08210162
## 11 HEALTH AND MEDICAL ADMINISTRATIVE SERVICES 0.08962626
## 12 COMMUNITY AND PUBLIC HEALTH 0.11214439
barplot(rct_ia$Unemployment_rate, names.arg = rct_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_ia %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 7 x 2
## Major
## <chr>
## 1 MILITARY TECHNOLOGIES
## 2 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## 3 PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 4 COSMETOLOGY SERVICES AND CULINARY ARTS
## 5 CONSTRUCTION SERVICES
## 6 FAMILY AND CONSUMER SCIENCES
## 7 TRANSPORTATION SCIENCES AND TECHNOLOGIES
## # ... with 1 more variables: Unemployment_rate <dbl>
barplot(rct_la$Unemployment_rate, names.arg = rct_la$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_la %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 16 x 2
## Major
## <chr>
## 1 UNITED STATES HISTORY
## 2 ART HISTORY AND CRITICISM
## 3 THEOLOGY AND RELIGIOUS VOCATIONS
## 4 AREA ETHNIC AND CIVILIZATION STUDIES
## 5 HUMANITIES
## 6 MULTI/INTERDISCIPLINARY STUDIES
## 7 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
## 8 LIBERAL ARTS
## 9 COMPOSITION AND RHETORIC
## 10 INTERCULTURAL AND INTERNATIONAL STUDIES
## 11 ENGLISH LANGUAGE AND LITERATURE
## 12 HISTORY
## 13 PHILOSOPHY AND RELIGIOUS STUDIES
## 14 ANTHROPOLOGY AND ARCHEOLOGY
## 15 LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
## 16 OTHER FOREIGN LANGUAGES
## # ... with 1 more variables: Unemployment_rate <dbl>
barplot(rct_law$Unemployment_rate, names.arg = rct_law$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_law %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 5 x 2
## Major Unemployment_rate
## <chr> <dbl>
## 1 COURT REPORTING 0.01168969
## 2 PRE-LAW AND LEGAL STUDIES 0.07196502
## 3 CRIMINAL JUSTICE AND FIRE PROTECTION 0.08245220
## 4 PUBLIC POLICY 0.12842630
## 5 PUBLIC ADMINISTRATION 0.15949060
barplot(rct_sci$Unemployment_rate, names.arg = rct_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)
rct_sci %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)
## # A tibble: 10 x 2
## Major
## <chr>
## 1 ASTRONOMY AND ASTROPHYSICS
## 2 ATMOSPHERIC SCIENCES AND METEOROLOGY
## 3 GEOSCIENCES
## 4 PHYSICAL SCIENCES
## 5 PHYSICS
## 6 CHEMISTRY
## 7 MULTI-DISCIPLINARY OR GENERAL SCIENCE
## 8 OCEANOGRAPHY
## 9 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
## 10 GEOLOGY AND EARTH SCIENCE
## # ... with 1 more variables: Unemployment_rate <dbl>
barplot(grad_ag$Grad_unemployment_rate, names.arg = grad_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_ag %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 10 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 ANIMAL SCIENCES 0.01232653
## 2 SOIL SCIENCE 0.01466782
## 3 AGRICULTURAL ECONOMICS 0.01998520
## 4 GENERAL AGRICULTURE 0.02932492
## 5 NATURAL RESOURCES MANAGEMENT 0.02949596
## 6 PLANT SCIENCE AND AGRONOMY 0.03125399
## 7 FOOD SCIENCE 0.03295627
## 8 AGRICULTURE PRODUCTION AND MANAGEMENT 0.03483833
## 9 FORESTRY 0.04129642
## 10 MISCELLANEOUS AGRICULTURE 0.08645247
barplot(grad_art$Grad_unemployment_rate, names.arg = grad_art$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_art %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 8 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 VISUAL AND PERFORMING ARTS 0.03842167
## 2 MISCELLANEOUS FINE ARTS 0.03846154
## 3 MUSIC 0.04147201
## 4 STUDIO ARTS 0.05036224
## 5 COMMERCIAL ART AND GRAPHIC DESIGN 0.05775585
## 6 FINE ARTS 0.06100455
## 7 DRAMA AND THEATER ARTS 0.06766724
## 8 FILM VIDEO AND PHOTOGRAPHIC ARTS 0.09647293
barplot(grad_bio$Grad_unemployment_rate, names.arg = grad_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_bio %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 14 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 NEUROSCIENCE 0.01761115
## 2 GENETICS 0.01942385
## 3 PHYSIOLOGY 0.02092066
## 4 ZOOLOGY 0.02092797
## 5 BIOLOGY 0.02110471
## 6 BIOCHEMICAL SCIENCES 0.02421187
## 7 PHARMACOLOGY 0.02422993
## 8 MISCELLANEOUS BIOLOGY 0.02586095
## 9 MOLECULAR BIOLOGY 0.02820749
## 10 MICROBIOLOGY 0.03215909
## 11 BOTANY 0.03321006
## 12 ENVIRONMENTAL SCIENCE 0.03527623
## 13 ECOLOGY 0.03665403
## 14 COGNITIVE SCIENCE AND BIOPSYCHOLOGY 0.04534244
barplot(grad_bsn$Grad_unemployment_rate, names.arg = grad_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_bsn %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 13 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 OPERATIONS LOGISTICS AND E-COMMERCE 0.02284832
## 2 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS 0.03871464
## 3 GENERAL BUSINESS 0.04089493
## 4 ACCOUNTING 0.04185735
## 5 FINANCE 0.04409251
## 6 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION 0.04661565
## 7 BUSINESS MANAGEMENT AND ADMINISTRATION 0.04903231
## 8 BUSINESS ECONOMICS 0.05125341
## 9 MARKETING AND MARKETING RESEARCH 0.05205949
## 10 INTERNATIONAL BUSINESS 0.05488957
## 11 HUMAN RESOURCES AND PERSONNEL MANAGEMENT 0.06471597
## 12 HOSPITALITY MANAGEMENT 0.07386679
## 13 ACTUARIAL SCIENCE 0.07424381
barplot(grad_cj$Grad_unemployment_rate, names.arg = grad_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_cj %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 4 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 ADVERTISING AND PUBLIC RELATIONS 0.03056160
## 2 JOURNALISM 0.04202330
## 3 COMMUNICATIONS 0.04865767
## 4 MASS MEDIA 0.05164133
barplot(grad_com$Grad_unemployment_rate, names.arg = grad_com$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_com %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 11 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 COMPUTER PROGRAMMING AND DATA PROCESSING 0.02461220
## 2 APPLIED MATHEMATICS 0.02863239
## 3 COMPUTER SCIENCE 0.03619812
## 4 MATHEMATICS 0.03764496
## 5 COMPUTER AND INFORMATION SYSTEMS 0.04004921
## 6 STATISTICS AND DECISION SCIENCE 0.04235759
## 7 INFORMATION SCIENCES 0.04999576
## 8 COMMUNICATION TECHNOLOGIES 0.05841063
## 9 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY 0.05958663
## 10 COMPUTER NETWORKING AND TELECOMMUNICATIONS 0.08160569
## 11 MATHEMATICS AND COMPUTER SCIENCE 0.10289017
barplot(grad_ed$Grad_unemployment_rate, names.arg = grad_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_ed %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 16 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 MATHEMATICS TEACHER EDUCATION 0.01424816
## 2 EDUCATIONAL ADMINISTRATION AND SUPERVISION 0.01676074
## 3 TEACHER EDUCATION: MULTIPLE LEVELS 0.01875968
## 4 ELEMENTARY EDUCATION 0.02036289
## 5 SCIENCE AND COMPUTER TEACHER EDUCATION 0.02218759
## 6 SECONDARY TEACHER EDUCATION 0.02229106
## 7 SPECIAL NEEDS EDUCATION 0.02246486
## 8 EARLY CHILDHOOD EDUCATION 0.02716594
## 9 PHYSICAL AND HEALTH EDUCATION TEACHING 0.02788530
## 10 ART AND MUSIC EDUCATION 0.02840349
## 11 LIBRARY SCIENCE 0.02993678
## 12 LANGUAGE AND DRAMA EDUCATION 0.03107441
## 13 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION 0.03175695
## 14 GENERAL EDUCATION 0.03334986
## 15 MISCELLANEOUS EDUCATION 0.03463734
## 16 SCHOOL STUDENT COUNSELING 0.05140030
barplot(grad_eng$Grad_unemployment_rate, names.arg = grad_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_eng %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 29 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 NUCLEAR ENGINEERING 0.01153802
## 2 BIOMEDICAL ENGINEERING 0.01844758
## 3 PETROLEUM ENGINEERING 0.01947149
## 4 COMPUTER ENGINEERING 0.02130079
## 5 MATERIALS SCIENCE 0.02177728
## 6 METALLURGICAL ENGINEERING 0.02290638
## 7 AEROSPACE ENGINEERING 0.02793169
## 8 GEOLOGICAL AND GEOPHYSICAL ENGINEERING 0.02870639
## 9 ENGINEERING MECHANICS PHYSICS AND SCIENCE 0.02896772
## 10 MISCELLANEOUS ENGINEERING TECHNOLOGIES 0.03169782
## # ... with 19 more rows
barplot(grad_hlt$Grad_unemployment_rate, names.arg = grad_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_hlt %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 12 x 2
## Major
## <chr>
## 1 COMMUNICATION DISORDERS SCIENCES AND SERVICES
## 2 TREATMENT THERAPY PROFESSIONS
## 3 NURSING
## 4 HEALTH AND MEDICAL PREPARATORY PROGRAMS
## 5 GENERAL MEDICAL AND HEALTH SERVICES
## 6 NUTRITION SCIENCES
## 7 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION
## 8 MEDICAL ASSISTING SERVICES
## 9 MEDICAL TECHNOLOGIES TECHNICIANS
## 10 MISCELLANEOUS HEALTH MEDICAL PROFESSIONS
## 11 COMMUNITY AND PUBLIC HEALTH
## 12 HEALTH AND MEDICAL ADMINISTRATIVE SERVICES
## # ... with 1 more variables: Grad_unemployment_rate <dbl>
barplot(grad_ia$Grad_unemployment_rate, names.arg = grad_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_ia %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 7 x 2
## Major
## <chr>
## 1 PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 2 FAMILY AND CONSUMER SCIENCES
## 3 TRANSPORTATION SCIENCES AND TECHNOLOGIES
## 4 MILITARY TECHNOLOGIES
## 5 COSMETOLOGY SERVICES AND CULINARY ARTS
## 6 CONSTRUCTION SERVICES
## 7 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## # ... with 1 more variables: Grad_unemployment_rate <dbl>
barplot(grad_la$Grad_unemployment_rate, names.arg = grad_la$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_la %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 16 x 2
## Major
## <chr>
## 1 MULTI/INTERDISCIPLINARY STUDIES
## 2 UNITED STATES HISTORY
## 3 THEOLOGY AND RELIGIOUS VOCATIONS
## 4 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
## 5 COMPOSITION AND RHETORIC
## 6 PHILOSOPHY AND RELIGIOUS STUDIES
## 7 HISTORY
## 8 ENGLISH LANGUAGE AND LITERATURE
## 9 LIBERAL ARTS
## 10 OTHER FOREIGN LANGUAGES
## 11 AREA ETHNIC AND CIVILIZATION STUDIES
## 12 ANTHROPOLOGY AND ARCHEOLOGY
## 13 LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
## 14 ART HISTORY AND CRITICISM
## 15 HUMANITIES
## 16 INTERCULTURAL AND INTERNATIONAL STUDIES
## # ... with 1 more variables: Grad_unemployment_rate <dbl>
barplot(grad_law$Grad_unemployment_rate, names.arg = grad_law$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_law %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 5 x 2
## Major Grad_unemployment_rate
## <chr> <dbl>
## 1 COURT REPORTING 0.00000000
## 2 PUBLIC POLICY 0.03122649
## 3 PRE-LAW AND LEGAL STUDIES 0.03891879
## 4 CRIMINAL JUSTICE AND FIRE PROTECTION 0.04100487
## 5 PUBLIC ADMINISTRATION 0.05884949
barplot(grad_sci$Grad_unemployment_rate, names.arg = grad_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)
grad_sci %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)
## # A tibble: 10 x 2
## Major
## <chr>
## 1 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
## 2 GEOSCIENCES
## 3 ASTRONOMY AND ASTROPHYSICS
## 4 OCEANOGRAPHY
## 5 ATMOSPHERIC SCIENCES AND METEOROLOGY
## 6 GEOLOGY AND EARTH SCIENCE
## 7 MULTI-DISCIPLINARY OR GENERAL SCIENCE
## 8 CHEMISTRY
## 9 PHYSICS
## 10 PHYSICAL SCIENCES
## # ... with 1 more variables: Grad_unemployment_rate <dbl>
https://trends.collegeboard.org/college-pricing/figures-tables/average-rates-growth-published-charges-decade↩
https://inflationdata.com/Inflation/Inflation_Rate/Long_Term_Inflation.asp↩
https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test↩
http://onlinelibrary.wiley.com/doi/10.1111/j.0950-0804.2005.00256.x/full↩
http://www.theatlantic.com/sponsored/prudential-sleeping-giants/millennials-and-gender-a-major-attitude-shift/467/↩
https://www.washingtonpost.com/local/education/the-gender-factor-in-college-admissions/2014/03/26/4996e988-b4e6-11e3-8020-b2d790b3c9e1_story.html?utm_term=.969f2cc21c83↩
https://www.forbes.com/sites/prestoncooper2/2017/07/13/new-york-fed-highlights-underemployment-among-college-graduates/#325a1adc40d8↩
https://www.theatlantic.com/business/archive/2016/09/fear-of-a-college-educated-barista/500792/↩
http://www.foxbusiness.com/features/2016/01/11/myth-college-grad-barista.html↩
https://www.forbes.com/sites/jeffreydorfman/2017/01/23/dispelling-the-myth-of-underemployed-college-graduates/#3044cccb502c↩
Social Sciences