Cooper DATA 606 Final Project

Introduction

College costs are rising. Anecdotally, I saw my own tuition at Purdue more than double in the 12 years that I attended ($3724/yr in 1999 to $9070/yr in 2010¹. However, anecdotes are the beginning of a scientific investigation. Tuition for a public 4 year school has increased at a rate averaging between 3.2% to 4.4% since 1987². The rate of inflation during this time has been between 3.2%³. Coming from a working class back ground, I found the cost of universities to be prohibitive without a scholarship, which motivated me to join the US Navy. However, military service should not be the only route for affordable education. Until public policy changes in such a way that makes higher education more affordable to more people, prospective students may have to make hard choices about how much debt they take on, and how quickly they can pay down that debt. Employment rates and salary statistics are important for cost-conscience students to choose a path that not encumber a student with unmanageable debt. To address these concerns I ask:

Which college majors offer the best opportunities in terms of unemployment rate and salary?

The Data

These Data were collated by the 538 website and was posted to their github page⁴. They in turn used data from:

“All data is from American Community Survey 2010-2012 Public Use Microdata Series. Download data here: http://www.census.gov/programs-surveys/acs/data/pums.html Documentation here: http://www.census.gov/programs-surveys/acs/technical-documentation/pums.html Major categories are from Carnevale et al,”What’s It Worth?: The Economic Value of College Majors." Georgetown University Center on Education and the Workforce, 2011.⁵" Details for the Georgetown data set can be found here: https://1gyhoq479ufd3yna29x7ubjn-wpengine.netdna-ssl.com/wp-content/uploads/2015/01/WIW1-Methodology.pdf

From the above methodology report:

Unique Data Characteristics 1)For the first time in this survey the Census Bureau asked individuals who indicated that their degree was a bachelor’s degree or higher, to supply their undergraduate major. Their responses were then coded and collapsedby the Census Bureau into 171 different degree majors.
2) Unlike other data sources focused on recent degree recipients, the Census data enable analysis across an individual’s full life cycle. 3) The Census data also result in robust estimates due to the very large sample involved. 531,337 persons surveyed who are representative of 51,547,518 people having Bachelor’s degrees (including those with graduate degrees), when weighted.

Since these data were collected by survey and lacks experimental features like a control group and blinding, this study is an observational study. We therefore cannot establish a causal link between variables. However the sample size is small enough compared to the population where cases are independent, and the data were collected at random. We can make inferences and predictions using these data.

In establishing the scope of inference we must bare in mind that the data were collected on college degree holders within the US. We can only make predictions on degree holders within the US. Predictions made here are not valid for degree holders in other countries, even if they obtained their degree in the US. Social and economic conditions in other countries that would invalidate any predictions made.

Cases

In many observational surveys the cases are individual people. That is not true for this study. Although the data was collected by asking individuals what their major, degree level, pay, and employment status was. The data is organized in such a way that the cases are the college majors. In the “All_ages” set," each case represents majors offered by colleges and universities in the US. These data include both undergrads and grad students. In the “Grad_students” set, each case represents majors offered by colleges and universities in the US. These data include only grad students aged 25+ years. Finally, in the “Recent_grad” set, each case represents majors offered by colleges and universities in the US. These data include only undergraduate students aged <28 years. “Recent_grad” also includes gender statistics. In all sets, the same 173 majors are used.

In asking what the economic outlook is for the college majors, the response variable are the college majors and are categorical. Results will take the form of ordered lists. These lists will be created using the explanatory variables such as the counts of employed and unemployed college degree holders and the statistics of their income. These data are numerical. We will also see what effect gender has on income and employment rate and that data is categorical.

Exploring the data.

The Appendix contains tables and graphs of median salary data and unemployment rate for each of the 173 majors at the three attainment categories, recent graduate, graduate degree and all ages. These data were relegated to an Appendix to make them available for the interested reader, but not in such a way that interrupts the flow of this paper, since these tables and graphs take up nearly 70 pages.

In this chapter, we will compare unemployment rate and median salary data based solely on attainment level.

First we will look at overall unemployment rate for the 3 categories: all ages, recent grads, and grad students.

summary(all_ages$Unemployment_rate)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.04626 0.05472 0.05736 0.06904 0.15615

summary(rct_grad$Unemployment_rate)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.05031 0.06796 0.06819 0.08756 0.17723

summary(grad_stdnt$Grad_unemployment_rate)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.02607 0.03665 0.03934 0.04805 0.13851

unempl <- cbind(all_ages$Unemployment_rate, rct_grad$Unemployment_rate, grad_stdnt$Grad_unemployment_rate)
boxplot(unempl,names = c("All", "Recent Grad", "Grad Student"), ylab = "Unemployment Rate")

It appears that people holding only a Bachelor’s degree have nearly twice as high median unemployment as those with higher degrees. This suggests that having a graduate degree improves a person’s chance at finding a job.

We will also look at median income for the three categories.

summary(all_ages$Median)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35000   46000   53000   56816   65000  125000

hist(all_ages$Median, main = "Histogram for Median Income All Ages", xlab = "Median Income by Major All Ages (USD)", col = "dark blue")

summary(rct_grad$Median)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22000   33000   36000   40151   45000  110000

hist(rct_grad$Median, main = "Histogram for Median Income Recent Grads", xlab = "Median Income by Major Recent Grads (USD)", col = "dark blue")

summary(grad_stdnt$Grad_median)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   47000   65000   75000   76756   90000  135000

hist(grad_stdnt$Grad_median, main = "Histogram for Median Income Grad Students", xlab = "Median Income by Major Grad Student (USD)", col = "dark blue")

medsal <- cbind(all_ages$Median, rct_grad$Median, grad_stdnt$Grad_median)
boxplot(medsal, names = c("All", "Recent Grad", "Grad Student"), ylab = "Median Salary USD")

We see from these graphs that the median salary of the graduate students is considered a high outlier for the recent graduate set, and the medial salary for the recent graduate data set is a low outlier for the graduate student set. This suggests that getting a graduate degree greatly improves earning potential.

The tables and bar plots in the appendix also show that majors that emphasize so-called “hard” skills, such as the Science Technology Engineering and Math (STEM) majors tend to out perform majors that emphasize the so-called “soft” skills, such as Fine Arts, Liberal Arts, and Social Sciences.

Chi Squared Tests for Independance for Employment Status

It is not enough to simply look at graphs and draw conclusions as to whether our hypothesis is correct. Further statistical tests need to be preformed to test if what the graphs tell us is actually significant. To that end, we perform $\chi^2$ tests for independence, as this test is used to check for significance of a categorical variable like employed vs. unemployed⁶. Our null hypothesis is that major choice is independent of employment status. The alternative hypothesis is that employment status depends on major choice.

First for all ages:

all_age_contin <- all_ages %>% dplyr::select(Major, Employed, Unemployed) # For user-freindliness we'll pull major, number employed, number unemployed. 
head(all_age_contin)

## # A tibble: 6 x 3
##                                   Major Employed Unemployed
##                                   <chr>    <int>      <int>
## 1                   GENERAL AGRICULTURE    90245       2423
## 2 AGRICULTURE PRODUCTION AND MANAGEMENT    76865       2266
## 3                AGRICULTURAL ECONOMICS    26321        821
## 4                       ANIMAL SCIENCES    81177       3619
## 5                          FOOD SCIENCE    17281        894
## 6            PLANT SCIENCE AND AGRONOMY    63043       2070

#barplot(as.matrix(all_age_contin), beside = TRUE)
chisq.test(all_age_contin[,-1]) #We remove the major names for the chi-squared test

## 
##  Pearson's Chi-squared test
## 
## data:  all_age_contin[, -1]
## X-squared = 96644, df = 172, p-value < 2.2e-16

Since the p-value is less than 0.05, we can reject the null hypothesis that the choice of major does not affects employment status, and we accept the alternative hypothesis that choice of major does affect employment status in the all ages category.

Next, we will test for grad students:

head(grad_stdnt)

## # A tibble: 6 x 22
##   Major_code                                 Major
##        <int>                                 <chr>
## 1       1101 AGRICULTURE PRODUCTION AND MANAGEMENT
## 2       1100                   GENERAL AGRICULTURE
## 3       1302                              FORESTRY
## 4       1303          NATURAL RESOURCES MANAGEMENT
## 5       1105            PLANT SCIENCE AND AGRONOMY
## 6       1102                AGRICULTURAL ECONOMICS
## # ... with 20 more variables: Major_category <chr>, Grad_total <int>,
## #   Grad_sample_size <int>, Grad_employed <int>,
## #   Grad_full_time_year_round <int>, Grad_unemployed <int>,
## #   Grad_unemployment_rate <dbl>, Grad_median <dbl>, Grad_P25 <int>,
## #   Grad_P75 <dbl>, Nongrad_total <int>, Nongrad_employed <int>,
## #   Nongrad_full_time_year_round <int>, Nongrad_unemployed <int>,
## #   Nongrad_unemployment_rate <dbl>, Nongrad_median <dbl>,
## #   Nongrad_P25 <int>, Nongrad_P75 <dbl>, Grad_share <dbl>,
## #   Grad_premium <dbl>

grd_st_contin <- grad_stdnt %>% dplyr::select(Major, Grad_employed, Grad_unemployed)# For user-freindliness we'll pull major, number employed, number unemployed. 
head(grd_st_contin)

## # A tibble: 6 x 3
##                                   Major Grad_employed Grad_unemployed
##                                   <chr>         <int>           <int>
## 1 AGRICULTURE PRODUCTION AND MANAGEMENT         13104             473
## 2                   GENERAL AGRICULTURE         28930             874
## 3                              FORESTRY         16831             725
## 4          NATURAL RESOURCES MANAGEMENT         23394             711
## 5            PLANT SCIENCE AND AGRONOMY         22782             735
## 6                AGRICULTURAL ECONOMICS         10592             216

#barplot(as.matrix(all_age_contin), beside = TRUE)
chisq.test(grd_st_contin[,-1]) #We remove the major names for the chi-squared test

## 
##  Pearson's Chi-squared test
## 
## data:  grd_st_contin[, -1]
## X-squared = 62013, df = 172, p-value < 2.2e-16

Again, p<0.05, we reject the null hypothesis and accept the alternative hypothesis that major choice at the grad level affects employment status.

Now for recent bachelor’s degree grads:

head(rct_grad)

## # A tibble: 6 x 21
##    Rank Major_code                                 Major Total   Men Women
##   <int>      <int>                                 <chr> <int> <int> <int>
## 1    22       1104                          FOOD SCIENCE    NA    NA    NA
## 2    64       1101 AGRICULTURE PRODUCTION AND MANAGEMENT 14240  9658  4582
## 3    65       1100                   GENERAL AGRICULTURE 10399  6053  4346
## 4    72       1102                AGRICULTURAL ECONOMICS  2439  1749   690
## 5   108       1303          NATURAL RESOURCES MANAGEMENT 13773  8617  5156
## 6   112       1302                              FORESTRY  3607  3156   451
## # ... with 15 more variables: Major_category <chr>, ShareWomen <dbl>,
## #   Sample_size <int>, Employed <int>, Full_time <int>, Part_time <int>,
## #   Full_time_year_round <int>, Unemployed <int>, Unemployment_rate <dbl>,
## #   Median <int>, P25th <int>, P75th <int>, College_jobs <int>,
## #   Non_college_jobs <int>, Low_wage_jobs <int>

rct_gr_contin <- rct_grad %>% dplyr::select(Major,Employed,Unemployed) %>% filter(Major != "MILITARY TECHNOLOGIES" ) # For user-freindliness we'll pull major, number employed, number unemployed.   One Major, military technology had 0 in both employed and unemployed columns, was excluded.
rct_gr_contin

## # A tibble: 172 x 3
##                                    Major Employed Unemployed
##                                    <chr>    <int>      <int>
##  1                          FOOD SCIENCE     3149        338
##  2 AGRICULTURE PRODUCTION AND MANAGEMENT    12323        649
##  3                   GENERAL AGRICULTURE     8884        178
##  4                AGRICULTURAL ECONOMICS     2174        182
##  5          NATURAL RESOURCES MANAGEMENT    11797        842
##  6                              FORESTRY     3007        322
##  7                          SOIL SCIENCE      613          0
##  8            PLANT SCIENCE AND AGRONOMY     6594        314
##  9                       ANIMAL SCIENCES    17112        917
## 10             MISCELLANEOUS AGRICULTURE     1290         82
## # ... with 162 more rows

#barplot(as.matrix(all_age_contin), beside = TRUE)
chisq.test(rct_gr_contin[,-1]) #We remove the major names for the chi-squared test

## 
##  Pearson's Chi-squared test
## 
## data:  rct_gr_contin[, -1]
## X-squared = 29941, df = 171, p-value < 2.2e-16

As with the other two cases,we reject the null and accept the alternative that choice of major affects unemployment rate. Thus, regardless of degree level your choice of major will affect your unemployment rate. Generally speaking you’ll have better chances of finding a job in certain majors as compared to other majors.

We can also compare grad vs. under grad:

#This will give proportions for making the bar plot.
a <- sum(grd_st_contin[,2])/(sum(grd_st_contin[,2])+sum(grd_st_contin[,3]))
b <- sum(grd_st_contin[,3])/(sum(grd_st_contin[,2])+sum(grd_st_contin[,3]))
c <- sum(rct_gr_contin[,2])/(sum(rct_gr_contin[,2])+sum(rct_gr_contin[,3]))
d <- sum(rct_gr_contin[,3])/(sum(rct_gr_contin[,2])+sum(rct_gr_contin[,3]))
#Now to make a matrix to plot
gr_ug_contin_prop <- matrix(c(a, c,b,d),byrow = TRUE, nrow = 2)
barplot(gr_ug_contin_prop,beside = TRUE, names.arg = c("Grad Students", "Undergrads"), ylab = "%",main = "Employment/Unemployment")

#For Ch-sq we will use absolute count instead of proportion.
e <- sum(grd_st_contin[,2])
f <- sum(grd_st_contin[,3])
g <- sum(rct_gr_contin[,2])
h <- sum(rct_gr_contin[,3])
gr_ug_contin <- matrix(c(e, f,g,h),byrow = TRUE, nrow = 2)
gr_ug_contin

##          [,1]   [,2]
## [1,] 16268407 606612
## [2,]  5396348 418025

chisq.test(gr_ug_contin)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  gr_ug_contin
## X-squared = 129590, df = 1, p-value < 2.2e-16

For level of attainment, we reject the null hypothesis that degree level does not affect employment status and accept the alternative hypothesis that degree attainment bachelor’s to graduate degree does affect unemployment rate.

Summary of $\chi^2$ Tests.

Choice of major and level of degree attainment affects unemployment rate at all age levels.

Student’s T and Kolmogorov-Smirnov Tests - Median Salary

As we stated in the Data section, exploration of the median salary data in the appendix shows that quantitative analysis majors, the STEM majors, appear to have more earning potential than qualitative analysis majors, such as Liberal Arts. Since median salary is a numerical measurement, it is appropriate to use a Student’s t-test⁷ or a Kolmogorov-Smirnov⁸ test to compare similarity between data sets. The Student’s t-test is a parametric test that compare’s against the t distribution. The Kolmogorov-Smirnov is a non-parametric test, in that it does not assume the survey data is drawn from a population with a given distribution, instead it measures likelihood of similarity by comparing the biggest difference in to data set’s continuous probability distribution. Since the salary data has a right-skew across all attainment levels, adding a non-parametric test will increase the robustness of this analysis.

To make these comparisons, we must bare in mind that we have (14 major categories x 3 attainment levels) 42 categories that have to be combined in groups of 2 for a total of $C(42,2) = \frac{42!}{(2! *40!)} = 861$ combinations. This is prohibitively long given the time constraints for this project. Therefore, We will analyze 4 major categories from the all ages set to bring us to $C(4,2) = \frac{4!}{2!*2!} = 6$ combinations. These major categories are, Engineering, Physical Sciences, Liberal Arts, and Psychology & Social Work. We selected these categories based on observation of the median salary tables in the Appendix.

Engineering vs Physical Sciences

The Null hypothesis is that there is no difference between median salaries of Engineering majors and Physical Science Majors. Initial two-sided tests, that only check that the distributions are different, and not that one is greater or less than the other, showed significance in all cases. We show below the results of single sided tests to definitely say that median salary of one degree category is greater than the other.

boxplot(all_ages_eng$Median, all_ages_sci$Median, names = c("Engineering", "Physical Sciences"), ylab = "Median Salary USD")

t.test(all_ages_eng$Median, all_ages_sci$Median, alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  all_ages_eng$Median and all_ages_sci$Median
## t = 4.3198, df = 29.522, p-value = 8.094e-05
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  9321.027      Inf
## sample estimates:
## mean of x mean of y 
##  77758.62  62400.00

ks.test(all_ages_eng$Median, all_ages_sci$Median, alternative = "less") #KS test has opposite sign convention than t test

## Warning in ks.test(all_ages_eng$Median, all_ages_sci$Median, alternative =
## "less"): cannot compute exact p-value with ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  all_ages_eng$Median and all_ages_sci$Median
## D^- = 0.63103, p-value = 0.00268
## alternative hypothesis: the CDF of x lies below that of y

The median salary of Engineering majors is higher than that of Physical Science majors at the 95% confidence level.

Liberal Arts vs. Psycology and Social Work.

The Null Hypothesis and Alternative are similar to above, albeit with Liberal Arts and Psycology and Social as major categories.

boxplot(all_ages_la$Median, all_ages_psy$Median, names = c("Liberal Arts", "Psycology & Social Work"), ylab = "Median Salary USD")

t.test(all_ages_la$Median, all_ages_psy$Median, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  all_ages_la$Median and all_ages_psy$Median
## t = 0.51776, df = 9.9244, p-value = 0.616
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4405.897  7069.785
## sample estimates:
## mean of x mean of y 
##  45887.50  44555.56

ks.test(all_ages_la$Median, all_ages_psy$Median, alternative = "two.sided") #KS test has opposite sign convention than t test

## Warning in ks.test(all_ages_la$Median, all_ages_psy$Median, alternative =
## "two.sided"): cannot compute exact p-value with ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  all_ages_la$Median and all_ages_psy$Median
## D = 0.31944, p-value = 0.5992
## alternative hypothesis: two-sided

As is, there is no statistical significant difference between Liberal Arts and Psycology & Social Work in either test. However, the Industrial and Organizational Psycology major is an outlier. Below we removed the outlier major and performed the analysis again.

psy_no_outl <- all_ages_psy %>% dplyr::select(Median) %>% filter(Median != max(Median)) #this removed the high outlier.
boxplot(all_ages_la$Median, psy_no_outl$Median, names = c("Liberal Arts", "Psycology & Social Work"), ylab = "Median Salary USD")

t.test(all_ages_la$Median, psy_no_outl$Median, alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  all_ages_la$Median and psy_no_outl$Median
## t = 2.3704, df = 13.661, p-value = 0.01653
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  897.9388      Inf
## sample estimates:
## mean of x mean of y 
##   45887.5   42375.0

ks.test(all_ages_la$Median, psy_no_outl$Median, alternative = "less") #KS test has opposite sign convention than t test

## Warning in ks.test(all_ages_la$Median, psy_no_outl$Median, alternative =
## "less"): cannot compute exact p-value with ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  all_ages_la$Median and psy_no_outl$Median
## D^- = 0.375, p-value = 0.2231
## alternative hypothesis: the CDF of x lies below that of y

shapiro.test(all_ages_la$Median) #Signifcance in KS test disagreed with the T-test. A Normality test is performed as a tie breaker.

## 
##  Shapiro-Wilk normality test
## 
## data:  all_ages_la$Median
## W = 0.91736, p-value = 0.1529

shapiro.test(psy_no_outl$Median) #Note that the T-test assumes a t-distribution which becomes a Normal distribution at high 'n'

## 
##  Shapiro-Wilk normality test
## 
## data:  psy_no_outl$Median
## W = 0.86167, p-value = 0.1248

Upon removing the outlier from the Psycology and Social Work set, we get a significant difference of p = 0.01653, but a non-significant difference of p = 0.2231. However the Shapiro-Wilk test for normality fails to reject the null hypothesis of that test, that the data comes from a Normally distributed population. Since the Student’s T-test tends to the Normal distribution for high N, we can trust the result of the student t-test.

The median salary of Liberal Arts majors is greater than Psycology and Social Work majors at the 95% confidence level, once the outlying major in Psycology and Social Work is removed.

Engineering vs Liberal Arts

This time we repeat the same Null and Alternative hypotheses with Engineering and Liberal Arts.

boxplot(all_ages_eng$Median, all_ages_la$Median, names = c("Engineering", "Liberal Arts"), ylab = "Median Salary USD")

t.test(all_ages_eng$Median, all_ages_la$Median, alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  all_ages_eng$Median and all_ages_la$Median
## t = 11.611, df = 33.361, p-value = 1.455e-13
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  27227.15      Inf
## sample estimates:
## mean of x mean of y 
##  77758.62  45887.50

ks.test(all_ages_eng$Median, all_ages_la$Median, alternative = "less") #KS test has opposite sign convention than t test

## Warning in ks.test(all_ages_eng$Median, all_ages_la$Median, alternative =
## "less"): cannot compute exact p-value with ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  all_ages_eng$Median and all_ages_la$Median
## D^- = 1, p-value = 1.106e-09
## alternative hypothesis: the CDF of x lies below that of y

The median salary of Engineering majors is higher than that of Liberal Arts majors at the 95% confidence level.

Engineering vs Psycology & Social Work

boxplot(all_ages_eng$Median, all_ages_psy$Median, names = c("Engineering", "Psycology & Social Work"), ylab = "Median Salary USD")

t.test(all_ages_eng$Median, all_ages_psy$Median, alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  all_ages_eng$Median and all_ages_psy$Median
## t = 9.2964, df = 26.898, p-value = 3.437e-10
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  27118.78      Inf
## sample estimates:
## mean of x mean of y 
##  77758.62  44555.56

ks.test(all_ages_eng$Median, all_ages_psy$Median, alternative = "less") #KS test has opposite sign convention than t test

## Warning in ks.test(all_ages_eng$Median, all_ages_psy$Median, alternative =
## "less"): cannot compute exact p-value with ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  all_ages_eng$Median and all_ages_psy$Median
## D^- = 0.93103, p-value = 6.74e-06
## alternative hypothesis: the CDF of x lies below that of y

Liberal Arts vs Physical Sciences

This time we repeat the same Null and Alternative hypotheses with Liberal Arts and Physical Sciences.

boxplot(all_ages_la$Median, all_ages_sci$Median, names = c("Liberal Arts", "Physical Sciences"), ylab = "Median Salary USD")

t.test(all_ages_la$Median, all_ages_sci$Median, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  all_ages_la$Median and all_ages_sci$Median
## t = -6.4755, df = 11.198, p-value = 2.107e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -11940.38
## sample estimates:
## mean of x mean of y 
##   45887.5   62400.0

ks.test(all_ages_la$Median, all_ages_sci$Median, alternative = "greater") #KS test has opposite sign convention than t test

## Warning in ks.test(all_ages_la$Median, all_ages_sci$Median, alternative =
## "greater"): cannot compute exact p-value with ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  all_ages_la$Median and all_ages_sci$Median
## D^+ = 1, p-value = 4.517e-06
## alternative hypothesis: the CDF of x lies above that of y

Physical Science median salary is higher than Liberal Arts median salary at the 95% confidence level.

Psycology & Social Work vs Physical Sciences

This time we repeat the same Null and Alternative hypotheses with Psycology & Social Work majors and Physical Sciences majors.

boxplot(all_ages_psy$Median, all_ages_sci$Median, names = c("Psycology & Social Work", "Physical Sciences"), ylab = "Median Salary USD")

t.test(all_ages_psy$Median, all_ages_sci$Median, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  all_ages_psy$Median and all_ages_sci$Median
## t = -5.2115, df = 16.92, p-value = 3.582e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf -11886.3
## sample estimates:
## mean of x mean of y 
##  44555.56  62400.00

ks.test(all_ages_psy$Median, all_ages_sci$Median, alternative = "greater") #KS test has opposite sign convention than t test

## Warning in ks.test(all_ages_psy$Median, all_ages_sci$Median, alternative =
## "greater"): cannot compute exact p-value with ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  all_ages_psy$Median and all_ages_sci$Median
## D^+ = 0.88889, p-value = 0.0005612
## alternative hypothesis: the CDF of x lies above that of y

Median salary of Physical Science majors is higher than Psycology & Social Work majors at the 95% confidence level. N.B., that the Industrial and Organizational Psychology Major, which is a high outlier in the Psycology & Social Work category is near the median salary of the Physical Sciences category.

Summary of T-tests an KS tests

In terms of median pay the ranking is as follows:

Engineering
Physical Science
Liberal Arts
Psycology and Social Work

Additionally, the Industrial and Organizational Psycology Major is similar in pay to Physical Sciences.

Linear Model - Unemployment rate vs median salary

Job market pressure can have an impact on both median salary and unemployment rate. If a field has low demand but high supply this can depress the salary and increase the unemployment rate. Conversely, a high demand/low supply field will see increased salaries and decreased unemployment rates. Another effect to consider is that people in over-subscribed field may spend a greater time looking for a job, which would also decrease median salary as they may be unemployed or underemployed during the job hunt. This effect could show in the data as a correlation between unemployment rate and salary.

To test if there is a connection between unemployment rate and median salary, we will take the “all_ages” data set and create linear regression models. If the residuals of the model do not show the necessary behavior of Normal Distribution and Constant Variance, we will perform a Box-Cox transformation on the data to get an exponential factor to improve the model.

fit1<-lm(all_ages$Median ~ all_ages$Unemployment_rate)
summary(fit1)

## 
## Call:
## lm(formula = all_ages$Median ~ all_ages$Unemployment_rate)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -23370  -8995  -3272   8079  64676 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   70097       3380  20.738  < 2e-16 ***
## all_ages$Unemployment_rate  -231551      55906  -4.142 5.41e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14060 on 171 degrees of freedom
## Multiple R-squared:  0.09117,    Adjusted R-squared:  0.08586 
## F-statistic: 17.15 on 1 and 171 DF,  p-value: 5.406e-05

ggplot(all_ages, aes(x = Unemployment_rate, y = Median)) +
  geom_point(color = 'blue')+
  geom_smooth(method = "lm", formula = y~x)

hist(resid(fit1))

plot(fitted(fit1), resid(fit1))

myt <- boxcox(fit1)

myt_df <- as.data.frame(myt)
optimal_lambda = myt_df[which.max(myt$y),1] #syntax from https://rpubs.com/FelipeRego/SimpleLinearRegression
optimal_lambda

## [1] -1.070707

fit2 <- lm(all_ages$Median^optimal_lambda ~ all_ages$Unemployment_rate)
summary(fit2)

## 
## Call:
## lm(formula = all_ages$Median^optimal_lambda ~ all_ages$Unemployment_rate)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.646e-06 -1.475e-06  2.614e-07  1.295e-06  5.141e-06 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                6.755e-06  4.666e-07  14.476  < 2e-16 ***
## all_ages$Unemployment_rate 3.272e-05  7.718e-06   4.239 3.66e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.941e-06 on 171 degrees of freedom
## Multiple R-squared:  0.09509,    Adjusted R-squared:  0.0898 
## F-statistic: 17.97 on 1 and 171 DF,  p-value: 3.664e-05

hist(resid(fit2))

plot(fitted(fit2), resid(fit2))

qqnorm(resid(fit2))
qqline(resid(fit2))

all_ages <- all_ages %>% mutate(transMedian  = Median^optimal_lambda)
head(all_ages)

## # A tibble: 6 x 12
##   Major_code                                 Major
##        <int>                                 <chr>
## 1       1100                   GENERAL AGRICULTURE
## 2       1101 AGRICULTURE PRODUCTION AND MANAGEMENT
## 3       1102                AGRICULTURAL ECONOMICS
## 4       1103                       ANIMAL SCIENCES
## 5       1104                          FOOD SCIENCE
## 6       1105            PLANT SCIENCE AND AGRONOMY
## # ... with 10 more variables: Major_category <chr>, Total <int>,
## #   Employed <int>, Employed_full_time_year_round <int>, Unemployed <int>,
## #   Unemployment_rate <dbl>, Median <int>, P25th <int>, P75th <dbl>,
## #   transMedian <dbl>

ggplot(all_ages, aes(x = Unemployment_rate, y = transMedian)) +
  geom_point(color = 'blueviolet')+
  geom_smooth(method = "lm", formula = y~x)

#The correlation seems to be due to outliers
all_ages_no_outlr <- all_ages %>% filter(Unemployment_rate != max(Unemployment_rate) & Unemployment_rate != 0)
fit3 <- lm(all_ages_no_outlr$transMedian ~ all_ages_no_outlr$Unemployment_rate)
summary(fit3)

## 
## Call:
## lm(formula = all_ages_no_outlr$transMedian ~ all_ages_no_outlr$Unemployment_rate)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.616e-06 -1.475e-06  1.990e-07  1.290e-06  5.153e-06 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)                         6.611e-06  5.382e-07  12.285  < 2e-16
## all_ages_no_outlr$Unemployment_rate 3.539e-05  8.999e-06   3.933 0.000122
##                                        
## (Intercept)                         ***
## all_ages_no_outlr$Unemployment_rate ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.949e-06 on 168 degrees of freedom
## Multiple R-squared:  0.08432,    Adjusted R-squared:  0.07887 
## F-statistic: 15.47 on 1 and 168 DF,  p-value: 0.0001225

hist(resid(fit3))

plot(fitted(fit3), resid(fit3))

qqnorm(resid(fit3))
qqline(resid(fit3))

ggplot(all_ages_no_outlr, aes(x = Unemployment_rate, y = transMedian)) +
  geom_point(color = 'firebrick')+
  geom_smooth(method = "lm", formula = y~x)

# We can also see what Majors have the most and least unemployment
mjr_umploy <- all_ages  %>% dplyr::select(Major,Unemployment_rate) %>% arrange(Unemployment_rate)
head(mjr_umploy, 10)

## # A tibble: 10 x 2
##                                         Major Unemployment_rate
##                                         <chr>             <dbl>
##  1 EDUCATIONAL ADMINISTRATION AND SUPERVISION        0.00000000
##  2     GEOLOGICAL AND GEOPHYSICAL ENGINEERING        0.00000000
##  3                               PHARMACOLOGY        0.01611080
##  4                          MATERIALS SCIENCE        0.02233333
##  5           MATHEMATICS AND COMPUTER SCIENCE        0.02490040
##  6                        GENERAL AGRICULTURE        0.02614711
##  7              TREATMENT THERAPY PROFESSIONS        0.02629160
##  8                                    NURSING        0.02679682
##  9      AGRICULTURE PRODUCTION AND MANAGEMENT        0.02863606
## 10                     AGRICULTURAL ECONOMICS        0.03024832

tail(mjr_umploy, 10)

## # A tibble: 10 x 2
##                                       Major Unemployment_rate
##                                       <chr>             <dbl>
##  1                             ARCHITECTURE        0.08599113
##  2               ASTRONOMY AND ASTROPHYSICS        0.08602150
##  3                        SOCIAL PSYCHOLOGY        0.08733625
##  4 COMPUTER PROGRAMMING AND DATA PROCESSING        0.09026422
##  5               VISUAL AND PERFORMING ARTS        0.09465800
##  6                          LIBRARY SCIENCE        0.09484299
##  7                SCHOOL STUDENT COUNSELING        0.10174594
##  8                    MILITARY TECHNOLOGIES        0.10179641
##  9                      CLINICAL PSYCHOLOGY        0.10271216
## 10                  MISCELLANEOUS FINE ARTS        0.15614749

mjr_salary <- all_ages  %>% dplyr::select(Major,Median) %>% arrange(Median)
head(mjr_salary, 10)

## # A tibble: 10 x 2
##                                        Major Median
##                                        <chr>  <int>
##  1                              NEUROSCIENCE  35000
##  2                 EARLY CHILDHOOD EDUCATION  35300
##  3                               STUDIO ARTS  37600
##  4 HUMAN SERVICES AND COMMUNITY ORGANIZATION  38000
##  5                     COUNSELING PSYCHOLOGY  39000
##  6                VISUAL AND PERFORMING ARTS  40000
##  7                      ELEMENTARY EDUCATION  40000
##  8        TEACHER EDUCATION: MULTIPLE LEVELS  40000
##  9                           LIBRARY SCIENCE  40000
## 10                  COMPOSITION AND RHETORIC  40000

tail(mjr_salary, 10)

## # A tibble: 10 x 2
##                                                  Major Median
##                                                  <chr>  <int>
##  1              GEOLOGICAL AND GEOPHYSICAL ENGINEERING  85000
##  2                                CHEMICAL ENGINEERING  86000
##  3                              ELECTRICAL ENGINEERING  88000
##  4                    MATHEMATICS AND COMPUTER SCIENCE  92000
##  5                      MINING AND MINERAL ENGINEERING  92000
##  6                                 NUCLEAR ENGINEERING  95000
##  7                           METALLURGICAL ENGINEERING  96000
##  8           NAVAL ARCHITECTURE AND MARINE ENGINEERING  97000
##  9 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 106000
## 10                               PETROLEUM ENGINEERING 125000

Initially the data had marginal behavior regarding the residuals. The Box cox transformation did make the residuals Normal and Homoskedacstic. In that regard the transformed model is fit to make predictions. Both the initial slope of the linear regression model of -231551 and the Box Cox exponent of -1.07 shows that unemployment rate and median salary are inversely related. That is low unemployment rates tend to have higher median salaries and high unemployment rates tend to lower salaries. This relationship is statistically significant, with a p-value of 0.0001225, even after influencing outliers were removed. However, the effect is weak with an R$^2$ of 0.08432 after outleirs are removed. This means that only about 8.432% of the variability of median salary can be explained by unemployment rate.

We suggest to students who are researching the prospects of college majors is to treat underemployment rates and salary statistics separately. Do not just go off of advise like, “You’ll make a mint in this field” or “They’re hiring a lot of people in that field”. It does no good if a student accrues $100,000 in debt to be virtually guaranteed a job where they can’t pay the debt off, or they could pay it off if they get a job in that field, but the chances of that are small.

Linear Model- Gender Wage Gap

The wage gap that exists between men and women in the labor force is well documented⁹. Millenials have also been noted for redefining gender roles¹⁰. The recent grads data set includes data on the number of males and females earning degrees in each major. Millenials are defined as those between the ages of 14-34 as of the time of this writing, and are represented in the recent graduates data set. It may be of interest to prospective college students to choose majors with high gender inequity to correct those inequities through positive action. We will preform analysis of the data to that end so perspective students can make an informed discussion.

For defining male or female majority majors, we must account for the fact that 57% of college students are female¹¹. Therefore a gender balanced major would be 57% female, 43% male, which would represent the underlying student population. We will use $\pm 10\%$: 67% female / 47% female as the threshold for gender imbalance.

We begin by identifying top gender-unequal majors and with t-tests and KS-tests to verify the gender gap in pay.

gend_rct_grad <- rct_grad %>% dplyr::select(Major, ShareWomen, Median) %>% filter(Major != "FOOD SCIENCE")%>%arrange(ShareWomen)
head(gend_rct_grad, 10)

## # A tibble: 10 x 3
##                                          Major ShareWomen Median
##                                          <chr>      <dbl>  <int>
##  1                       MILITARY TECHNOLOGIES 0.00000000  40000
##  2 MECHANICAL ENGINEERING RELATED TECHNOLOGIES 0.07745303  40000
##  3                       CONSTRUCTION SERVICES 0.09071251  50000
##  4              MINING AND MINERAL ENGINEERING 0.10185185  75000
##  5   NAVAL ARCHITECTURE AND MARINE ENGINEERING 0.10731320  70000
##  6                      MECHANICAL ENGINEERING 0.11955890  60000
##  7                       PETROLEUM ENGINEERING 0.12056434 110000
##  8    TRANSPORTATION SCIENCES AND TECHNOLOGIES 0.12495049  35000
##  9                                    FORESTRY 0.12503465  35000
## 10                       AEROSPACE ENGINEERING 0.13979280  60000

tail(gend_rct_grad, 10)

## # A tibble: 10 x 3
##                                            Major ShareWomen Median
##                                            <chr>      <dbl>  <int>
##  1      MISCELLANEOUS HEALTH MEDICAL PROFESSIONS  0.8812939  36000
##  2                                       NURSING  0.8960190  48000
##  3                                   SOCIAL WORK  0.9040745  30000
##  4     HUMAN SERVICES AND COMMUNITY ORGANIZATION  0.9055899  30000
##  5                       SPECIAL NEEDS EDUCATION  0.9066773  35000
##  6                  FAMILY AND CONSUMER SCIENCES  0.9109326  30000
##  7                          ELEMENTARY EDUCATION  0.9237455  32000
##  8                    MEDICAL ASSISTING SERVICES  0.9278072  42000
##  9 COMMUNICATION DISORDERS SCIENCES AND SERVICES  0.9679981  28000
## 10                     EARLY CHILDHOOD EDUCATION  0.9689537  28000

male_major_salary <- rct_grad %>% dplyr::select(Major, ShareWomen, Median) %>% filter(ShareWomen <= 0.47)
female_major_salary <- rct_grad %>% dplyr::select(Major, ShareWomen, Median) %>% filter(ShareWomen >= 0.67)
boxplot(male_major_salary$Median, female_major_salary$Median, names = c("Majority Male","Majority Female"), ylab = "Median Salary USD")

t.test(male_major_salary$Median, female_major_salary$Median, alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  male_major_salary$Median and female_major_salary$Median
## t = 8.4657, df = 97.588, p-value = 1.309e-13
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  11758.75      Inf
## sample estimates:
## mean of x mean of y 
##  47898.57  33270.37

ks.test(male_major_salary$Median, female_major_salary$Median, alternative = "less") #Sign Convention Different

## Warning in ks.test(male_major_salary$Median, female_major_salary$Median, :
## cannot compute exact p-value with ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  male_major_salary$Median and female_major_salary$Median
## D^- = 0.69894, p-value = 1.161e-13
## alternative hypothesis: the CDF of x lies below that of y

eng_sci <- bind_rows(rct_eng, rct_sci)
la_ssc <- bind_rows(rct_la, rct_ssc)
boxplot(eng_sci$ShareWomen, la_ssc$ShareWomen, names = c("STEM", "L.A. & Social Work"), ylab = "% Women")

t.test(eng_sci$ShareWomen, la_ssc$ShareWomen, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  eng_sci$ShareWomen and la_ssc$ShareWomen
## t = -7.7065, df = 56.33, p-value = 1.136e-10
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.2358929
## sample estimates:
## mean of x mean of y 
## 0.3080668 0.6093368

ks.test(eng_sci$ShareWomen, la_ssc$ShareWomen, alternative = "greater")

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  eng_sci$ShareWomen and la_ssc$ShareWomen
## D^+ = 0.74359, p-value = 4.825e-08
## alternative hypothesis: the CDF of x lies above that of y

Male Majority salary is higher than female majority salary at the 95% confidence interval. Engineer and Physical Sciences have fewer women by percent than Liberal Arts, Psychology & Social Work at the 95% confidence level. We will now make a linear regression model to understand this phenomena further.

fit4 <- lm(rct_grad$Median ~ rct_grad$ShareWomen)
summary(fit4)

## 
## Call:
## lm(formula = rct_grad$Median ~ rct_grad$ShareWomen)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -17261  -5474  -1007   3502  57604 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            56093       1705   32.90   <2e-16 ***
## rct_grad$ShareWomen   -30670       2987  -10.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9031 on 170 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3828, Adjusted R-squared:  0.3791 
## F-statistic: 105.4 on 1 and 170 DF,  p-value: < 2.2e-16

hist(resid(fit4))

plot(resid(fit4)~fitted(fit4))

qqnorm(resid(fit4))
qqline(resid(fit4))

#An outlier is effecting our linear regression. Box Cox will be used to correct.
myt2 <- boxcox(fit4)

myt2_df <- as.data.frame(myt2)
optimal_lambda2 = myt2_df[which.max(myt2$y),1] #syntax from https://rpubs.com/FelipeRego/SimpleLinearRegression
optimal_lambda2

## [1] -0.9494949

fit5 <- lm(rct_grad$Median^optimal_lambda2 ~ rct_grad$ShareWomen)
summary(fit5)

## 
## Call:
## lm(formula = rct_grad$Median^optimal_lambda2 ~ rct_grad$ShareWomen)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.976e-05 -4.822e-06  1.294e-07  4.417e-06  2.016e-05 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         3.049e-05  1.414e-06   21.57   <2e-16 ***
## rct_grad$ShareWomen 2.810e-05  2.476e-06   11.35   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.487e-06 on 170 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.431,  Adjusted R-squared:  0.4276 
## F-statistic: 128.8 on 1 and 170 DF,  p-value: < 2.2e-16

hist(resid(fit5))

plot(resid(fit5)~fitted(fit5))

qqnorm(resid(fit5))
qqline(resid(fit5))

rct_grad_gend <- rct_grad %>% mutate(transMedian  = Median^optimal_lambda2) %>% filter(Major != "FOOD SCIENCE")
ggplot(rct_grad_gend, aes(x = ShareWomen, y = transMedian)) +
  geom_point(color = 'firebrick')+
  geom_smooth(method = "lm", formula = y~x)

Note that the power will transformed the median salary, -0.9494949, means that the positive slope in the graph actually shows an inverse relationship between the percent of women in a major and the median salary. The p-value of the slope is much less than 0.05, Gender effects median salary at the 95% confidence level. Furthermore the R$^2$ value of 0.431 is high for a social science study. The residuals are also Normally distributed and have constant variance, so this linear regression is predictive for recent college graduates in the US.

Under-employment

Unemployment rate gives an incomplete picture of the prospects for a given Major. Student may also be concerned with not being stuck in a low paying job¹². Recent college grads being stuck in low paying retail and food industry jobs has been greatly documented in recent years. There is some debate over whether this phenomena should really be of long-term concern, or if it is just a result of the Great Recession or just part of personal growth¹³ ¹⁴ ¹⁵. Regardless we highlight majors with high levels of low paying jobs, or people working in jobs that do not require a degree.

#low-wage jobs, we look at the % compared to total employment.
rct_grad_underemp <- rct_grad %>% mutate(Under_emp_rate = Low_wage_jobs/Total) %>% dplyr::select(Major, Under_emp_rate) %>% arrange(Under_emp_rate) %>% filter(Major != "FOOD SCIENCE") #Food science returns NA
hist(rct_grad_underemp$Under_emp_rate,  main =  "Histogram of Recent Graduate Low Wage Job Rate", 
     xlab = "Low Wage Job Rate")

head(rct_grad_underemp, 10)

## # A tibble: 10 x 2
##                                          Major Under_emp_rate
##                                          <chr>          <dbl>
##  1                                SOIL SCIENCE     0.00000000
##  2                   SCHOOL STUDENT COUNSELING     0.00000000
##  3                   METALLURGICAL ENGINEERING     0.00000000
##  4   NAVAL ARCHITECTURE AND MARINE ENGINEERING     0.00000000
##  5                       MILITARY TECHNOLOGIES     0.00000000
##  6                           MATERIALS SCIENCE     0.01892966
##  7                   MISCELLANEOUS AGRICULTURE     0.02083333
##  8 MATERIALS ENGINEERING AND MATERIALS SCIENCE     0.02338791
##  9                        COMPUTER ENGINEERING     0.02359058
## 10         OPERATIONS LOGISTICS AND E-COMMERCE     0.02429253

tail(rct_grad_underemp, 10)

## # A tibble: 10 x 2
##                                     Major Under_emp_rate
##                                     <chr>          <dbl>
##  1                        LIBRARY SCIENCE      0.1748634
##  2            ANTHROPOLOGY AND ARCHEOLOGY      0.1767583
##  3               COMPOSITION AND RHETORIC      0.1828734
##  4                      PHYSICAL SCIENCES      0.1873259
##  5                 HOSPITALITY MANAGEMENT      0.2076431
##  6                            STUDIO ARTS      0.2112270
##  7                    CLINICAL PSYCHOLOGY      0.2191684
##  8                MISCELLANEOUS FINE ARTS      0.2260479
##  9                 DRAMA AND THEATER ARTS      0.2559134
## 10 COSMETOLOGY SERVICES AND CULINARY ARTS      0.3009515

#Jobs that dont require a degree, also comapared to 
rct_grad_nodgr <- rct_grad %>% mutate(No_degree_rate = Non_college_jobs/Total) %>% dplyr::select(Major, No_degree_rate) %>% arrange(No_degree_rate) %>% filter(Major != "FOOD SCIENCE") #Food science returns NA
hist(rct_grad_nodgr$No_degree_rate,  main =  "Histogram of Recent Graduates % in Non-degree Jobs", 
     xlab = "Non-degree Job Rate")

head(rct_grad_nodgr, 10)

## # A tibble: 10 x 2
##                                          Major No_degree_rate
##                                          <chr>          <dbl>
##  1                       MILITARY TECHNOLOGIES     0.00000000
##  2      GEOLOGICAL AND GEOPHYSICAL ENGINEERING     0.06944444
##  3   NAVAL ARCHITECTURE AND MARINE ENGINEERING     0.08108108
##  4                           ACTUARIAL SCIENCE     0.08313476
##  5                           MATERIALS SCIENCE     0.09137649
##  6 MATERIALS ENGINEERING AND MATERIALS SCIENCE     0.10190444
##  7            MATHEMATICS AND COMPUTER SCIENCE     0.11001642
##  8                                     NURSING     0.12486509
##  9                     SPECIAL NEEDS EDUCATION     0.13212012
## 10                      ELECTRICAL ENGINEERING     0.13337913

tail(rct_grad_nodgr, 10)

## # A tibble: 10 x 2
##                                                                Major
##                                                                <chr>
##  1                                INDUSTRIAL PRODUCTION TECHNOLOGIES
##  2                                                       CRIMINOLOGY
##  3                                  FILM VIDEO AND PHOTOGRAPHIC ARTS
##  4                                            HOSPITALITY MANAGEMENT
##  5                              CRIMINAL JUSTICE AND FIRE PROTECTION
##  6                                            DRAMA AND THEATER ARTS
##  7                                        MEDICAL ASSISTING SERVICES
##  8 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
##  9        NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
## 10                            COSMETOLOGY SERVICES AND CULINARY ARTS
## # ... with 1 more variables: No_degree_rate <dbl>

For perspective students looking at majors that have high levels of employment in jobs that do not require a degree, alternatives like vocational certificates or apprenticeships may be a more economically viable alternative than a college degree.

Conclusions

We find that choice in college major has a significant effect on median salary and unemployment rate. This effect is seen at all age levels. Higher salaries and lower unemployment tend to favor STEM majors. Gender balance of majors also plays a significant effect on median salary. These findings that STEM and Gender affect median salary seem to be interrelated as the STEM majors tend to be male majority. There is a statistically, but not necessarily practically, significance between unemployment rate and median pay.

Future Work

These data are only represent a single point in time. Measuring trends is important for perspective college student, as they need to be able to predict what the job market is going to look like when they graduate. These trends may also influence choices in graduate study. Therefore it is necessary to repeat these surveys at regular intervals, and add time series analysis to the above analysis.

In the graduate student data, no differentiation is made between masters, doctorates or professional degrees. Adding a column to future surveys will be useful as more detailed analysis can be made in terms of how level of attainment will affect earnings an unemployment rates.

References

Appendix

As it is of interest to this paper will will list graphics of median salary by attainment (i.e., All, Recent Grad and Grad Student) and category (e.g., Humanities, Engineering).

Median Salary

All Ages

Agriculture

barplot(all_ages_ag$Median, names.arg = all_ages_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_ag %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 10 x 2
##                                    Major Median
##                                    <chr>  <int>
##  1                       ANIMAL SCIENCES  46000
##  2                   GENERAL AGRICULTURE  50000
##  3            PLANT SCIENCE AND AGRONOMY  50000
##  4             MISCELLANEOUS AGRICULTURE  52000
##  5          NATURAL RESOURCES MANAGEMENT  52000
##  6 AGRICULTURE PRODUCTION AND MANAGEMENT  54000
##  7                              FORESTRY  58000
##  8                          FOOD SCIENCE  62000
##  9                AGRICULTURAL ECONOMICS  63000
## 10                          SOIL SCIENCE  63000

Arts

barplot(all_ages_art$Median, names.arg = all_ages_art$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_art %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 8 x 2
##                               Major Median
##                               <chr>  <int>
## 1                       STUDIO ARTS  37600
## 2        VISUAL AND PERFORMING ARTS  40000
## 3            DRAMA AND THEATER ARTS  42000
## 4                         FINE ARTS  45000
## 5                             MUSIC  45000
## 6           MISCELLANEOUS FINE ARTS  45000
## 7 COMMERCIAL ART AND GRAPHIC DESIGN  46600
## 8  FILM VIDEO AND PHOTOGRAPHIC ARTS  47000

Biological Sciences

barplot(all_ages_bio$Median, names.arg = all_ages_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_bio %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 14 x 2
##                                  Major Median
##                                  <chr>  <int>
##  1                        NEUROSCIENCE  35000
##  2                   MOLECULAR BIOLOGY  45000
##  3                             ECOLOGY  47500
##  4                            GENETICS  48000
##  5                              BOTANY  50000
##  6                          PHYSIOLOGY  50000
##  7                             BIOLOGY  51000
##  8               ENVIRONMENTAL SCIENCE  52000
##  9               MISCELLANEOUS BIOLOGY  52000
## 10                BIOCHEMICAL SCIENCES  53000
## 11 COGNITIVE SCIENCE AND BIOPSYCHOLOGY  53000
## 12                             ZOOLOGY  55000
## 13                        MICROBIOLOGY  60000
## 14                        PHARMACOLOGY  60000

Business

barplot(all_ages_bsn$Median, names.arg = all_ages_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_bsn %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 13 x 2
##                                              Major Median
##                                              <chr>  <int>
##  1                          HOSPITALITY MANAGEMENT  49000
##  2 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION  53000
##  3        HUMAN RESOURCES AND PERSONNEL MANAGEMENT  54000
##  4                          INTERNATIONAL BUSINESS  54000
##  5                MARKETING AND MARKETING RESEARCH  56000
##  6          BUSINESS MANAGEMENT AND ADMINISTRATION  58000
##  7                                GENERAL BUSINESS  60000
##  8                                      ACCOUNTING  65000
##  9             OPERATIONS LOGISTICS AND E-COMMERCE  65000
## 10                              BUSINESS ECONOMICS  65000
## 11                                         FINANCE  65000
## 12                               ACTUARIAL SCIENCE  72000
## 13   MANAGEMENT INFORMATION SYSTEMS AND STATISTICS  72000

Communications & Jounralism

barplot(all_ages_cj$Median, names.arg = all_ages_cj$Major, horiz = TRUE, cex.names = 0.4, las =1)

all_ages_cj %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 4 x 2
##                              Major Median
##                              <chr>  <int>
## 1                       MASS MEDIA  48000
## 2                   COMMUNICATIONS  50000
## 3                       JOURNALISM  50000
## 4 ADVERTISING AND PUBLIC RELATIONS  50000

Computer Science & Mathematics

barplot(all_ages_com$Median, names.arg = all_ages_com$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_com %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 11 x 2
##                                              Major Median
##                                              <chr>  <int>
##  1                      COMMUNICATION TECHNOLOGIES  50000
##  2 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY  55000
##  3      COMPUTER NETWORKING AND TELECOMMUNICATIONS  55000
##  4        COMPUTER PROGRAMMING AND DATA PROCESSING  60000
##  5                COMPUTER AND INFORMATION SYSTEMS  65000
##  6                                     MATHEMATICS  66000
##  7                            INFORMATION SCIENCES  68000
##  8                             APPLIED MATHEMATICS  70000
##  9                 STATISTICS AND DECISION SCIENCE  70000
## 10                                COMPUTER SCIENCE  78000
## 11                MATHEMATICS AND COMPUTER SCIENCE  92000

Education

barplot(all_ages_ed$Median, names.arg = all_ages_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_ed %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 16 x 2
##                                          Major Median
##                                          <chr>  <int>
##  1                   EARLY CHILDHOOD EDUCATION  35300
##  2                        ELEMENTARY EDUCATION  40000
##  3          TEACHER EDUCATION: MULTIPLE LEVELS  40000
##  4                             LIBRARY SCIENCE  40000
##  5                   SCHOOL STUDENT COUNSELING  41000
##  6                     SPECIAL NEEDS EDUCATION  42000
##  7                LANGUAGE AND DRAMA EDUCATION  42000
##  8                     ART AND MUSIC EDUCATION  42600
##  9                           GENERAL EDUCATION  43000
## 10               MATHEMATICS TEACHER EDUCATION  43000
## 11                 SECONDARY TEACHER EDUCATION  45000
## 12 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION  45000
## 13      SCIENCE AND COMPUTER TEACHER EDUCATION  46000
## 14      PHYSICAL AND HEALTH EDUCATION TEACHING  48400
## 15                     MISCELLANEOUS EDUCATION  50000
## 16  EDUCATIONAL ADMINISTRATION AND SUPERVISION  58000

Engineering

barplot(all_ages_eng$Median, names.arg = all_ages_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_eng %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 29 x 2
##                                          Major Median
##                                          <chr>  <int>
##  1 MECHANICAL ENGINEERING RELATED TECHNOLOGIES  60000
##  2                      BIOLOGICAL ENGINEERING  62000
##  3                                ARCHITECTURE  63000
##  4                    ENGINEERING TECHNOLOGIES  63000
##  5      MISCELLANEOUS ENGINEERING TECHNOLOGIES  63000
##  6                      BIOMEDICAL ENGINEERING  65000
##  7   ENGINEERING MECHANICS PHYSICS AND SCIENCE  65000
##  8           ELECTRICAL ENGINEERING TECHNOLOGY  67000
##  9                   ENVIRONMENTAL ENGINEERING  70000
## 10                   MISCELLANEOUS ENGINEERING  70000
## # ... with 19 more rows

Health

barplot(all_ages_hlt$Median, names.arg = all_ages_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_hlt %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 12 x 2
##                                                  Major Median
##                                                  <chr>  <int>
##  1       COMMUNICATION DISORDERS SCIENCES AND SERVICES  42000
##  2            MISCELLANEOUS HEALTH MEDICAL PROFESSIONS  45000
##  3                         COMMUNITY AND PUBLIC HEALTH  47000
##  4                                  NUTRITION SCIENCES  49500
##  5                 GENERAL MEDICAL AND HEALTH SERVICES  50000
##  6          HEALTH AND MEDICAL ADMINISTRATIVE SERVICES  50000
##  7             HEALTH AND MEDICAL PREPARATORY PROGRAMS  50000
##  8                          MEDICAL ASSISTING SERVICES  55000
##  9                    MEDICAL TECHNOLOGIES TECHNICIANS  60000
## 10                       TREATMENT THERAPY PROFESSIONS  61000
## 11                                             NURSING  62000
## 12 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 106000

Industrial Arts and Consumer Services

barplot(all_ages_ia$Median, names.arg = all_ages_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_ia %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 7 x 2
##                                                               Major Median
##                                                               <chr>  <int>
## 1                            COSMETOLOGY SERVICES AND CULINARY ARTS  40000
## 2                                      FAMILY AND CONSUMER SCIENCES  40500
## 3                     PHYSICAL FITNESS PARKS RECREATION AND LEISURE  44000
## 4 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION  48000
## 5                                             MILITARY TECHNOLOGIES  64000
## 6                                             CONSTRUCTION SERVICES  65000
## 7                          TRANSPORTATION SCIENCES AND TECHNOLOGIES  67000

Liberal Arts

barplot(all_ages_la$Median, names.arg = all_ages_la$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_la %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 16 x 2
##                                                            Major Median
##                                                            <chr>  <int>
##  1                                      COMPOSITION AND RHETORIC  40000
##  2                              THEOLOGY AND RELIGIOUS VOCATIONS  40000
##  3                                   ANTHROPOLOGY AND ARCHEOLOGY  43000
##  4                               MULTI/INTERDISCIPLINARY STUDIES  43000
##  5                                     ART HISTORY AND CRITICISM  44500
##  6                                       OTHER FOREIGN LANGUAGES  45000
##  7                       INTERCULTURAL AND INTERNATIONAL STUDIES  45000
##  8                              PHILOSOPHY AND RELIGIOUS STUDIES  45000
##  9                          AREA ETHNIC AND CIVILIZATION STUDIES  46000
## 10                                                    HUMANITIES  46700
## 11           LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE  48000
## 12 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES  48000
## 13                               ENGLISH LANGUAGE AND LITERATURE  50000
## 14                                                  LIBERAL ARTS  50000
## 15                                                       HISTORY  50000
## 16                                         UNITED STATES HISTORY  50000

Law

barplot(all_ages_law$Median, names.arg = all_ages_law$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_law %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 5 x 2
##                                  Major Median
##                                  <chr>  <int>
## 1            PRE-LAW AND LEGAL STUDIES  48000
## 2                      COURT REPORTING  50000
## 3 CRIMINAL JUSTICE AND FIRE PROTECTION  50000
## 4                PUBLIC ADMINISTRATION  56000
## 5                        PUBLIC POLICY  60000

Psycology & Social Work

barplot(all_ages_psy$Median, names.arg = all_ages_psy$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_psy %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 9 x 2
##                                       Major Median
##                                       <chr>  <int>
## 1 HUMAN SERVICES AND COMMUNITY ORGANIZATION  38000
## 2                     COUNSELING PSYCHOLOGY  39000
## 3                    EDUCATIONAL PSYCHOLOGY  40000
## 4                               SOCIAL WORK  40000
## 5                                PSYCHOLOGY  45000
## 6                       CLINICAL PSYCHOLOGY  45000
## 7                  MISCELLANEOUS PSYCHOLOGY  45000
## 8                         SOCIAL PSYCHOLOGY  47000
## 9  INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY  62000

Sciences

barplot(all_ages_sci$Median, names.arg = all_ages_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_sci %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 10 x 2
##                                                         Major Median
##                                                         <chr>  <int>
##  1                                               OCEANOGRAPHY  55000
##  2                      MULTI-DISCIPLINARY OR GENERAL SCIENCE  56000
##  3                                                GEOSCIENCES  57000
##  4                                                  CHEMISTRY  59000
##  5                                          PHYSICAL SCIENCES  60000
##  6                       ATMOSPHERIC SCIENCES AND METEOROLOGY  60000
##  7 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES  62000
##  8                                  GEOLOGY AND EARTH SCIENCE  65000
##  9                                                    PHYSICS  70000
## 10                                 ASTRONOMY AND ASTROPHYSICS  80000

Social Sciences

barplot(all_ages_ssc$Median, names.arg = all_ages_ssc$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_ssc %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 9 x 2
##                               Major Median
##                               <chr>  <int>
## 1 INTERDISCIPLINARY SOCIAL SCIENCES  45000
## 2                         SOCIOLOGY  47000
## 3                       CRIMINOLOGY  49000
## 4           GENERAL SOCIAL SCIENCES  50000
## 5     MISCELLANEOUS SOCIAL SCIENCES  52000
## 6                         GEOGRAPHY  54000
## 7           INTERNATIONAL RELATIONS  55000
## 8  POLITICAL SCIENCE AND GOVERNMENT  58000
## 9                         ECONOMICS  69000

Recent Graduates

Agriculture

barplot(rct_ag$Median, names.arg = rct_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_ag %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 10 x 2
##                                    Major Median
##                                    <chr>  <int>
##  1             MISCELLANEOUS AGRICULTURE  29000
##  2                       ANIMAL SCIENCES  30000
##  3            PLANT SCIENCE AND AGRONOMY  32000
##  4          NATURAL RESOURCES MANAGEMENT  35000
##  5                              FORESTRY  35000
##  6                          SOIL SCIENCE  35000
##  7 AGRICULTURE PRODUCTION AND MANAGEMENT  40000
##  8                   GENERAL AGRICULTURE  40000
##  9                AGRICULTURAL ECONOMICS  40000
## 10                          FOOD SCIENCE  53000

Arts

barplot(rct_art$Median, names.arg = rct_art$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_art %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 8 x 2
##                               Major Median
##                               <chr>  <int>
## 1            DRAMA AND THEATER ARTS  27000
## 2                       STUDIO ARTS  29000
## 3        VISUAL AND PERFORMING ARTS  30000
## 4                         FINE ARTS  30500
## 5                             MUSIC  31000
## 6  FILM VIDEO AND PHOTOGRAPHIC ARTS  32000
## 7 COMMERCIAL ART AND GRAPHIC DESIGN  35000
## 8           MISCELLANEOUS FINE ARTS  50000

Biological Sciences

barplot(rct_bio$Median, names.arg = rct_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_bio %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 14 x 2
##                                  Major Median
##                                  <chr>  <int>
##  1                             ZOOLOGY  26000
##  2                             ECOLOGY  33000
##  3                             BIOLOGY  33400
##  4               MISCELLANEOUS BIOLOGY  33500
##  5                          PHYSIOLOGY  35000
##  6                        NEUROSCIENCE  35000
##  7               ENVIRONMENTAL SCIENCE  35600
##  8                              BOTANY  37000
##  9                BIOCHEMICAL SCIENCES  37400
## 10                        MICROBIOLOGY  38000
## 11                   MOLECULAR BIOLOGY  40000
## 12                            GENETICS  40000
## 13 COGNITIVE SCIENCE AND BIOPSYCHOLOGY  41000
## 14                        PHARMACOLOGY  45000

Business

barplot(rct_bsn$Median, names.arg = rct_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_bsn %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 13 x 2
##                                              Major Median
##                                              <chr>  <int>
##  1                          HOSPITALITY MANAGEMENT  33000
##  2        HUMAN RESOURCES AND PERSONNEL MANAGEMENT  36000
##  3          BUSINESS MANAGEMENT AND ADMINISTRATION  38000
##  4                MARKETING AND MARKETING RESEARCH  38000
##  5                                GENERAL BUSINESS  40000
##  6                          INTERNATIONAL BUSINESS  40000
##  7 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION  40000
##  8                                      ACCOUNTING  45000
##  9                              BUSINESS ECONOMICS  46000
## 10                                         FINANCE  47000
## 11             OPERATIONS LOGISTICS AND E-COMMERCE  50000
## 12   MANAGEMENT INFORMATION SYSTEMS AND STATISTICS  51000
## 13                               ACTUARIAL SCIENCE  62000

Communications & Jounralism

barplot(rct_cj$Median, names.arg = rct_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_cj %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 4 x 2
##                              Major Median
##                              <chr>  <int>
## 1                       MASS MEDIA  33000
## 2                   COMMUNICATIONS  35000
## 3                       JOURNALISM  35000
## 4 ADVERTISING AND PUBLIC RELATIONS  35000

Computer Science & Mathematics

barplot(rct_com$Median, names.arg = rct_com$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_com %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 11 x 2
##                                              Major Median
##                                              <chr>  <int>
##  1                      COMMUNICATION TECHNOLOGIES  35000
##  2      COMPUTER NETWORKING AND TELECOMMUNICATIONS  36400
##  3 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY  37500
##  4        COMPUTER PROGRAMMING AND DATA PROCESSING  41300
##  5                MATHEMATICS AND COMPUTER SCIENCE  42000
##  6                                     MATHEMATICS  45000
##  7                COMPUTER AND INFORMATION SYSTEMS  45000
##  8                            INFORMATION SCIENCES  45000
##  9                 STATISTICS AND DECISION SCIENCE  45000
## 10                             APPLIED MATHEMATICS  45000
## 11                                COMPUTER SCIENCE  53000

Education

barplot(rct_ed$Median, names.arg = rct_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_ed %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 16 x 2
##                                          Major Median
##                                          <chr>  <int>
##  1                             LIBRARY SCIENCE  22000
##  2                   EARLY CHILDHOOD EDUCATION  28000
##  3          TEACHER EDUCATION: MULTIPLE LEVELS  30000
##  4      PHYSICAL AND HEALTH EDUCATION TEACHING  31000
##  5                        ELEMENTARY EDUCATION  32000
##  6      SCIENCE AND COMPUTER TEACHER EDUCATION  32000
##  7                     ART AND MUSIC EDUCATION  32100
##  8                 SECONDARY TEACHER EDUCATION  32500
##  9                LANGUAGE AND DRAMA EDUCATION  33000
## 10                     MISCELLANEOUS EDUCATION  33000
## 11                           GENERAL EDUCATION  34000
## 12 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION  34000
## 13               MATHEMATICS TEACHER EDUCATION  34000
## 14  EDUCATIONAL ADMINISTRATION AND SUPERVISION  34000
## 15                     SPECIAL NEEDS EDUCATION  35000
## 16                   SCHOOL STUDENT COUNSELING  41000

Engineering

barplot(rct_eng$Median, names.arg = rct_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_eng %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 29 x 2
##                                          Major Median
##                                          <chr>  <int>
##  1                                ARCHITECTURE  40000
##  2      MISCELLANEOUS ENGINEERING TECHNOLOGIES  40000
##  3 MECHANICAL ENGINEERING RELATED TECHNOLOGIES  40000
##  4       ENGINEERING AND INDUSTRIAL MANAGEMENT  44000
##  5          INDUSTRIAL PRODUCTION TECHNOLOGIES  46000
##  6                           CIVIL ENGINEERING  50000
##  7                   MISCELLANEOUS ENGINEERING  50000
##  8                   ENVIRONMENTAL ENGINEERING  50000
##  9                    ENGINEERING TECHNOLOGIES  50000
## 10      GEOLOGICAL AND GEOPHYSICAL ENGINEERING  50000
## # ... with 19 more rows

Health

barplot(rct_hlt$Median, names.arg = rct_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_hlt %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 12 x 2
##                                                  Major Median
##                                                  <chr>  <int>
##  1       COMMUNICATION DISORDERS SCIENCES AND SERVICES  28000
##  2                 GENERAL MEDICAL AND HEALTH SERVICES  32400
##  3                       TREATMENT THERAPY PROFESSIONS  33000
##  4             HEALTH AND MEDICAL PREPARATORY PROGRAMS  33500
##  5                         COMMUNITY AND PUBLIC HEALTH  34000
##  6                                  NUTRITION SCIENCES  35000
##  7          HEALTH AND MEDICAL ADMINISTRATIVE SERVICES  35000
##  8            MISCELLANEOUS HEALTH MEDICAL PROFESSIONS  36000
##  9 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION  40000
## 10                          MEDICAL ASSISTING SERVICES  42000
## 11                    MEDICAL TECHNOLOGIES TECHNICIANS  45000
## 12                                             NURSING  48000

Industrial Arts and Consumer Services

barplot(rct_ia$Median, names.arg = rct_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_ia %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 7 x 2
##                                                               Major Median
##                                                               <chr>  <int>
## 1                            COSMETOLOGY SERVICES AND CULINARY ARTS  29000
## 2                                      FAMILY AND CONSUMER SCIENCES  30000
## 3                     PHYSICAL FITNESS PARKS RECREATION AND LEISURE  32000
## 4                          TRANSPORTATION SCIENCES AND TECHNOLOGIES  35000
## 5 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION  38400
## 6                                             MILITARY TECHNOLOGIES  40000
## 7                                             CONSTRUCTION SERVICES  50000

Liberal Arts

barplot(rct_la$Median, names.arg = rct_la$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_la %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 16 x 2
##                                                            Major Median
##                                                            <chr>  <int>
##  1                                      COMPOSITION AND RHETORIC  27000
##  2                                       OTHER FOREIGN LANGUAGES  27500
##  3                                   ANTHROPOLOGY AND ARCHEOLOGY  28000
##  4                              THEOLOGY AND RELIGIOUS VOCATIONS  29000
##  5                                                    HUMANITIES  30000
##  6                                     ART HISTORY AND CRITICISM  31000
##  7                               ENGLISH LANGUAGE AND LITERATURE  32000
##  8                                                  LIBERAL ARTS  32000
##  9                              PHILOSOPHY AND RELIGIOUS STUDIES  32200
## 10           LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE  33000
## 11                                                       HISTORY  34000
## 12 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES  34000
## 13                       INTERCULTURAL AND INTERNATIONAL STUDIES  34000
## 14                          AREA ETHNIC AND CIVILIZATION STUDIES  35000
## 15                               MULTI/INTERDISCIPLINARY STUDIES  35000
## 16                                         UNITED STATES HISTORY  40000

Law

barplot(rct_law$Median, names.arg = rct_law$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_law %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 5 x 2
##                                  Major Median
##                                  <chr>  <int>
## 1 CRIMINAL JUSTICE AND FIRE PROTECTION  35000
## 2            PRE-LAW AND LEGAL STUDIES  36000
## 3                PUBLIC ADMINISTRATION  36000
## 4                        PUBLIC POLICY  50000
## 5                      COURT REPORTING  54000

Psycology & Social Work

barplot(rct_psy$Median, names.arg = rct_psy$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_psy %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 9 x 2
##                                       Major Median
##                                       <chr>  <int>
## 1                     COUNSELING PSYCHOLOGY  23400
## 2                    EDUCATIONAL PSYCHOLOGY  25000
## 3                       CLINICAL PSYCHOLOGY  25000
## 4                               SOCIAL WORK  30000
## 5                  MISCELLANEOUS PSYCHOLOGY  30000
## 6 HUMAN SERVICES AND COMMUNITY ORGANIZATION  30000
## 7                                PSYCHOLOGY  31500
## 8                         SOCIAL PSYCHOLOGY  36000
## 9  INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY  40000

Sciences

barplot(rct_sci$Median, names.arg = rct_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_sci %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 10 x 2
##                                                         Major Median
##                                                         <chr>  <int>
##  1                      MULTI-DISCIPLINARY OR GENERAL SCIENCE  35000
##  2                       ATMOSPHERIC SCIENCES AND METEOROLOGY  35000
##  3                                                GEOSCIENCES  36000
##  4                                  GEOLOGY AND EARTH SCIENCE  36200
##  5                                                  CHEMISTRY  39000
##  6                                          PHYSICAL SCIENCES  40000
##  7                                               OCEANOGRAPHY  44700
##  8                                                    PHYSICS  45000
##  9 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES  46000
## 10                                 ASTRONOMY AND ASTROPHYSICS  62000

Social Sciences

barplot(rct_ssc$Median, names.arg = rct_ssc$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_ssc %>% dplyr::select(Major, Median) %>% arrange(Median)

## # A tibble: 9 x 2
##                               Major Median
##                               <chr>  <int>
## 1           GENERAL SOCIAL SCIENCES  32000
## 2                         SOCIOLOGY  33000
## 3 INTERDISCIPLINARY SOCIAL SCIENCES  33000
## 4                       CRIMINOLOGY  35000
## 5  POLITICAL SCIENCE AND GOVERNMENT  38000
## 6                         GEOGRAPHY  38000
## 7     MISCELLANEOUS SOCIAL SCIENCES  40000
## 8           INTERNATIONAL RELATIONS  40100
## 9                         ECONOMICS  47000

Graduate Students

Agriculture

barplot(grad_ag$Grad_median, names.arg = grad_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_ag %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 10 x 2
##                                    Major Grad_median
##                                    <chr>       <dbl>
##  1             MISCELLANEOUS AGRICULTURE       54000
##  2                          SOIL SCIENCE       65000
##  3 AGRICULTURE PRODUCTION AND MANAGEMENT       67000
##  4            PLANT SCIENCE AND AGRONOMY       67000
##  5                   GENERAL AGRICULTURE       68000
##  6          NATURAL RESOURCES MANAGEMENT       70000
##  7                       ANIMAL SCIENCES       70300
##  8                          FOOD SCIENCE       72000
##  9                              FORESTRY       78000
## 10                AGRICULTURAL ECONOMICS       80000

Arts

barplot(grad_art$Grad_median, names.arg = grad_art$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_art %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 8 x 2
##                               Major Grad_median
##                               <chr>       <dbl>
## 1                       STUDIO ARTS       50750
## 2        VISUAL AND PERFORMING ARTS       53000
## 3           MISCELLANEOUS FINE ARTS       55000
## 4  FILM VIDEO AND PHOTOGRAPHIC ARTS       57000
## 5                         FINE ARTS       58000
## 6            DRAMA AND THEATER ARTS       58600
## 7 COMMERCIAL ART AND GRAPHIC DESIGN       60000
## 8                             MUSIC       60000

Biological Sciences

barplot(grad_bio$Grad_median, names.arg = grad_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_bio %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 14 x 2
##                                  Major Grad_median
##                                  <chr>       <dbl>
##  1                        NEUROSCIENCE       58000
##  2                             ECOLOGY       62000
##  3               MISCELLANEOUS BIOLOGY       65000
##  4               ENVIRONMENTAL SCIENCE       68000
##  5                              BOTANY       70000
##  6                            GENETICS       78000
##  7                        MICROBIOLOGY       85000
##  8                   MOLECULAR BIOLOGY       85000
##  9                          PHYSIOLOGY       90000
## 10 COGNITIVE SCIENCE AND BIOPSYCHOLOGY       95000
## 11                             BIOLOGY       95000
## 12                BIOCHEMICAL SCIENCES       96000
## 13                        PHARMACOLOGY      105000
## 14                             ZOOLOGY      110000

Business

barplot(grad_bsn$Grad_median, names.arg = grad_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_bsn %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 13 x 2
##                                              Major Grad_median
##                                              <chr>       <dbl>
##  1                          HOSPITALITY MANAGEMENT       65000
##  2        HUMAN RESOURCES AND PERSONNEL MANAGEMENT       70000
##  3                          INTERNATIONAL BUSINESS       72000
##  4 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION       75000
##  5          BUSINESS MANAGEMENT AND ADMINISTRATION       77000
##  6                MARKETING AND MARKETING RESEARCH       80000
##  7                                GENERAL BUSINESS       85000
##  8                                      ACCOUNTING       88000
##  9   MANAGEMENT INFORMATION SYSTEMS AND STATISTICS       89000
## 10             OPERATIONS LOGISTICS AND E-COMMERCE       94000
## 11                              BUSINESS ECONOMICS       94000
## 12                                         FINANCE       95000
## 13                               ACTUARIAL SCIENCE      110000

Communications & Jounralism

barplot(grad_cj$Grad_median, names.arg = grad_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_cj %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 4 x 2
##                              Major Grad_median
##                              <chr>       <dbl>
## 1                       MASS MEDIA       57000
## 2 ADVERTISING AND PUBLIC RELATIONS       60000
## 3                   COMMUNICATIONS       65000
## 4                       JOURNALISM       70000

Computer Science & Mathematics

barplot(grad_com$Grad_median, names.arg = grad_com$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_com %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 11 x 2
##                                              Major Grad_median
##                                              <chr>       <dbl>
##  1                      COMMUNICATION TECHNOLOGIES       57000
##  2      COMPUTER NETWORKING AND TELECOMMUNICATIONS       80000
##  3                COMPUTER AND INFORMATION SYSTEMS       80000
##  4 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY       81000
##  5                            INFORMATION SCIENCES       84000
##  6        COMPUTER PROGRAMMING AND DATA PROCESSING       85000
##  7                                     MATHEMATICS       89000
##  8                 STATISTICS AND DECISION SCIENCE       92000
##  9                                COMPUTER SCIENCE       95000
## 10                MATHEMATICS AND COMPUTER SCIENCE       98000
## 11                             APPLIED MATHEMATICS      100000

Education

barplot(grad_ed$Grad_median, names.arg = grad_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_ed %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 16 x 2
##                                          Major Grad_median
##                                          <chr>       <dbl>
##  1                   EARLY CHILDHOOD EDUCATION       50000
##  2                             LIBRARY SCIENCE       52000
##  3                        ELEMENTARY EDUCATION       55000
##  4          TEACHER EDUCATION: MULTIPLE LEVELS       55000
##  5                   SCHOOL STUDENT COUNSELING       56000
##  6                           GENERAL EDUCATION       58000
##  7                LANGUAGE AND DRAMA EDUCATION       58000
##  8                     SPECIAL NEEDS EDUCATION       58000
##  9                     ART AND MUSIC EDUCATION       59000
## 10 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION       60000
## 11               MATHEMATICS TEACHER EDUCATION       60000
## 12                     MISCELLANEOUS EDUCATION       61000
## 13                 SECONDARY TEACHER EDUCATION       61000
## 14      SCIENCE AND COMPUTER TEACHER EDUCATION       62000
## 15      PHYSICAL AND HEALTH EDUCATION TEACHING       65000
## 16  EDUCATIONAL ADMINISTRATION AND SUPERVISION       65000

Engineering

barplot(grad_eng$Grad_median, names.arg = grad_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_eng %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 29 x 2
##                                          Major Grad_median
##                                          <chr>       <dbl>
##  1                                ARCHITECTURE       72000
##  2                    ENGINEERING TECHNOLOGIES       74000
##  3 MECHANICAL ENGINEERING RELATED TECHNOLOGIES       78000
##  4                   ARCHITECTURAL ENGINEERING       78000
##  5      MISCELLANEOUS ENGINEERING TECHNOLOGIES       80000
##  6                      BIOLOGICAL ENGINEERING       80000
##  7                   ENVIRONMENTAL ENGINEERING       81000
##  8          INDUSTRIAL PRODUCTION TECHNOLOGIES       84500
##  9           ELECTRICAL ENGINEERING TECHNOLOGY       85000
## 10                   MISCELLANEOUS ENGINEERING       90000
## # ... with 19 more rows

Health

barplot(grad_hlt$Grad_median, names.arg = grad_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_hlt %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 12 x 2
##                                                  Major Grad_median
##                                                  <chr>       <dbl>
##  1            MISCELLANEOUS HEALTH MEDICAL PROFESSIONS       60000
##  2                                  NUTRITION SCIENCES       65000
##  3       COMMUNICATION DISORDERS SCIENCES AND SERVICES       65000
##  4                         COMMUNITY AND PUBLIC HEALTH       68500
##  5                       TREATMENT THERAPY PROFESSIONS       70000
##  6                 GENERAL MEDICAL AND HEALTH SERVICES       70000
##  7                    MEDICAL TECHNOLOGIES TECHNICIANS       76000
##  8          HEALTH AND MEDICAL ADMINISTRATIVE SERVICES       79000
##  9                          MEDICAL ASSISTING SERVICES       80000
## 10                                             NURSING       84000
## 11 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION      111000
## 12             HEALTH AND MEDICAL PREPARATORY PROGRAMS      135000

Industrial Arts and Consumer Services

barplot(grad_ia$Grad_median, names.arg = grad_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_ia %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 7 x 2
##                                                               Major
##                                                               <chr>
## 1                            COSMETOLOGY SERVICES AND CULINARY ARTS
## 2                                      FAMILY AND CONSUMER SCIENCES
## 3                     PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 4 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## 5                                             MILITARY TECHNOLOGIES
## 6                                             CONSTRUCTION SERVICES
## 7                          TRANSPORTATION SCIENCES AND TECHNOLOGIES
## # ... with 1 more variables: Grad_median <dbl>

Liberal Arts

barplot(grad_la$Grad_median, names.arg = grad_la$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_la %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 16 x 2
##                                                            Major
##                                                            <chr>
##  1                              THEOLOGY AND RELIGIOUS VOCATIONS
##  2                               MULTI/INTERDISCIPLINARY STUDIES
##  3                                      COMPOSITION AND RHETORIC
##  4                                                    HUMANITIES
##  5                                     ART HISTORY AND CRITICISM
##  6           LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
##  7                                   ANTHROPOLOGY AND ARCHEOLOGY
##  8                              PHILOSOPHY AND RELIGIOUS STUDIES
##  9                               ENGLISH LANGUAGE AND LITERATURE
## 10 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
## 11                                       OTHER FOREIGN LANGUAGES
## 12                                                  LIBERAL ARTS
## 13                       INTERCULTURAL AND INTERNATIONAL STUDIES
## 14                          AREA ETHNIC AND CIVILIZATION STUDIES
## 15                                                       HISTORY
## 16                                         UNITED STATES HISTORY
## # ... with 1 more variables: Grad_median <dbl>

Law

barplot(grad_law$Grad_median, names.arg = grad_law$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_law %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 5 x 2
##                                  Major Grad_median
##                                  <chr>       <dbl>
## 1 CRIMINAL JUSTICE AND FIRE PROTECTION       68000
## 2                      COURT REPORTING       75000
## 3                PUBLIC ADMINISTRATION       75000
## 4            PRE-LAW AND LEGAL STUDIES       76000
## 5                        PUBLIC POLICY       89000

Psycology & Social Work

barplot(grad_psy$Grad_median, names.arg = grad_psy$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_psy %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 9 x 2
##                                       Major Grad_median
##                                       <chr>       <dbl>
## 1                     COUNSELING PSYCHOLOGY       50000
## 2 HUMAN SERVICES AND COMMUNITY ORGANIZATION       50100
## 3                               SOCIAL WORK       53000
## 4                    EDUCATIONAL PSYCHOLOGY       61000
## 5                                PSYCHOLOGY       64000
## 6                  MISCELLANEOUS PSYCHOLOGY       68000
## 7                       CLINICAL PSYCHOLOGY       70000
## 8                         SOCIAL PSYCHOLOGY       71000
## 9  INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY       75000

Sciences

barplot(grad_sci$Grad_median, names.arg = grad_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_sci %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 10 x 2
##                                                         Major Grad_median
##                                                         <chr>       <dbl>
##  1 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES       80000
##  2                                          PHYSICAL SCIENCES       80000
##  3                       ATMOSPHERIC SCIENCES AND METEOROLOGY       82000
##  4                                  GEOLOGY AND EARTH SCIENCE       84000
##  5                      MULTI-DISCIPLINARY OR GENERAL SCIENCE       86000
##  6                                               OCEANOGRAPHY       90000
##  7                                                GEOSCIENCES       90000
##  8                                 ASTRONOMY AND ASTROPHYSICS       96000
##  9                                                  CHEMISTRY      100000
## 10                                                    PHYSICS      100000

Social Sciences

barplot(grad_ssc$Grad_median, names.arg = grad_ssc$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_ssc %>% dplyr::select(Major, Grad_median) %>% arrange(Grad_median)

## # A tibble: 9 x 2
##                               Major Grad_median
##                               <chr>       <dbl>
## 1                         SOCIOLOGY       64000
## 2                       CRIMINOLOGY       65000
## 3 INTERDISCIPLINARY SOCIAL SCIENCES       66000
## 4           GENERAL SOCIAL SCIENCES       69000
## 5                         GEOGRAPHY       73000
## 6     MISCELLANEOUS SOCIAL SCIENCES       73000
## 7           INTERNATIONAL RELATIONS       86000
## 8  POLITICAL SCIENCE AND GOVERNMENT       92000
## 9                         ECONOMICS      100000

Unemployment Rate

All

Agriculture

barplot(all_ages_ag$Unemployment_rate, names.arg = all_ages_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_ag %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 10 x 2
##                                    Major Unemployment_rate
##                                    <chr>             <dbl>
##  1                   GENERAL AGRICULTURE        0.02614711
##  2 AGRICULTURE PRODUCTION AND MANAGEMENT        0.02863606
##  3                AGRICULTURAL ECONOMICS        0.03024832
##  4            PLANT SCIENCE AND AGRONOMY        0.03179089
##  5             MISCELLANEOUS AGRICULTURE        0.03923042
##  6                              FORESTRY        0.04256333
##  7                       ANIMAL SCIENCES        0.04267890
##  8                          FOOD SCIENCE        0.04918845
##  9                          SOIL SCIENCE        0.05086705
## 10          NATURAL RESOURCES MANAGEMENT        0.05434128

Arts

barplot(all_ages_art$Unemployment_rate, names.arg = all_ages_art$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_art %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 8 x 2
##                               Major Unemployment_rate
##                               <chr>             <dbl>
## 1                             MUSIC        0.05471919
## 2                         FINE ARTS        0.07175327
## 3 COMMERCIAL ART AND GRAPHIC DESIGN        0.07391972
## 4            DRAMA AND THEATER ARTS        0.08027373
## 5                       STUDIO ARTS        0.08371383
## 6  FILM VIDEO AND PHOTOGRAPHIC ARTS        0.08561891
## 7        VISUAL AND PERFORMING ARTS        0.09465800
## 8           MISCELLANEOUS FINE ARTS        0.15614749

Biological Sciences

barplot(all_ages_bio$Unemployment_rate, names.arg = all_ages_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_bio %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 14 x 2
##                                  Major Unemployment_rate
##                                  <chr>             <dbl>
##  1                        PHARMACOLOGY        0.01611080
##  2                              BOTANY        0.03402351
##  3                            GENETICS        0.04159095
##  4               MISCELLANEOUS BIOLOGY        0.04758244
##  5                             ZOOLOGY        0.04836260
##  6 COGNITIVE SCIENCE AND BIOPSYCHOLOGY        0.04887283
##  7                             ECOLOGY        0.04891699
##  8                        MICROBIOLOGY        0.05088075
##  9                          PHYSIOLOGY        0.05113946
## 10               ENVIRONMENTAL SCIENCE        0.05128983
## 11                             BIOLOGY        0.05930117
## 12                   MOLECULAR BIOLOGY        0.06053708
## 13                        NEUROSCIENCE        0.06889764
## 14                BIOCHEMICAL SCIENCES        0.07159753

Business

barplot(all_ages_bsn$Unemployment_rate, names.arg = all_ages_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_bsn %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 13 x 2
##                                              Major Unemployment_rate
##                                              <chr>             <dbl>
##  1             OPERATIONS LOGISTICS AND E-COMMERCE        0.04326826
##  2   MANAGEMENT INFORMATION SYSTEMS AND STATISTICS        0.04397714
##  3                                         FINANCE        0.04847293
##  4                                GENERAL BUSINESS        0.05137753
##  5                          HOSPITALITY MANAGEMENT        0.05144698
##  6 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION        0.05267856
##  7                                      ACCOUNTING        0.05341467
##  8                MARKETING AND MARKETING RESEARCH        0.05503289
##  9                               ACTUARIAL SCIENCE        0.05606352
## 10          BUSINESS MANAGEMENT AND ADMINISTRATION        0.05886534
## 11        HUMAN RESOURCES AND PERSONNEL MANAGEMENT        0.06074809
## 12                              BUSINESS ECONOMICS        0.06174857
## 13                          INTERNATIONAL BUSINESS        0.07135371

Communications & Jounralism

barplot(all_ages_cj$Unemployment_rate, names.arg = all_ages_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_cj %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 4 x 2
##                              Major Unemployment_rate
##                              <chr>             <dbl>
## 1                       JOURNALISM        0.06191675
## 2                   COMMUNICATIONS        0.06436031
## 3 ADVERTISING AND PUBLIC RELATIONS        0.06721626
## 4                       MASS MEDIA        0.08300476

Computer Science & Mathematics

barplot(all_ages_com$Unemployment_rate, names.arg = all_ages_com$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_com  %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 11 x 2
##                                              Major Unemployment_rate
##                                              <chr>             <dbl>
##  1                MATHEMATICS AND COMPUTER SCIENCE        0.02490040
##  2                                COMPUTER SCIENCE        0.04951866
##  3                COMPUTER AND INFORMATION SYSTEMS        0.05189124
##  4                            INFORMATION SCIENCES        0.05284106
##  5                                     MATHEMATICS        0.05293608
##  6                             APPLIED MATHEMATICS        0.05565261
##  7                 STATISTICS AND DECISION SCIENCE        0.05705405
##  8      COMPUTER NETWORKING AND TELECOMMUNICATIONS        0.05869412
##  9 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY        0.07504572
## 10                      COMMUNICATION TECHNOLOGIES        0.08500867
## 11        COMPUTER PROGRAMMING AND DATA PROCESSING        0.09026422

Education

barplot(all_ages_ed$Unemployment_rate, names.arg = all_ages_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_ed %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 16 x 2
##                                          Major Unemployment_rate
##                                          <chr>             <dbl>
##  1  EDUCATIONAL ADMINISTRATION AND SUPERVISION        0.00000000
##  2               MATHEMATICS TEACHER EDUCATION        0.03298302
##  3          TEACHER EDUCATION: MULTIPLE LEVELS        0.03335686
##  4                        ELEMENTARY EDUCATION        0.03835916
##  5                     MISCELLANEOUS EDUCATION        0.03921524
##  6                     ART AND MUSIC EDUCATION        0.04097337
##  7      SCIENCE AND COMPUTER TEACHER EDUCATION        0.04219989
##  8                 SECONDARY TEACHER EDUCATION        0.04375568
##  9                           GENERAL EDUCATION        0.04390352
## 10 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION        0.04601320
## 11      PHYSICAL AND HEALTH EDUCATION TEACHING        0.04626696
## 12                     SPECIAL NEEDS EDUCATION        0.04714466
## 13                LANGUAGE AND DRAMA EDUCATION        0.04808029
## 14                   EARLY CHILDHOOD EDUCATION        0.04935065
## 15                             LIBRARY SCIENCE        0.09484299
## 16                   SCHOOL STUDENT COUNSELING        0.10174594

Engineering

barplot(all_ages_eng$Unemployment_rate, names.arg = all_ages_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_eng %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 29 x 2
##                                          Major Unemployment_rate
##                                          <chr>             <dbl>
##  1      GEOLOGICAL AND GEOPHYSICAL ENGINEERING        0.00000000
##  2                           MATERIALS SCIENCE        0.02233333
##  3   NAVAL ARCHITECTURE AND MARINE ENGINEERING        0.04030882
##  4                       AEROSPACE ENGINEERING        0.04197131
##  5                       PETROLEUM ENGINEERING        0.04220535
##  6 MECHANICAL ENGINEERING RELATED TECHNOLOGIES        0.04353327
##  7   ENGINEERING MECHANICS PHYSICS AND SCIENCE        0.04380452
##  8                      MECHANICAL ENGINEERING        0.04384386
##  9                   METALLURGICAL ENGINEERING        0.04487268
## 10                   ENVIRONMENTAL ENGINEERING        0.04573200
## # ... with 19 more rows

Health

barplot(all_ages_hlt$Unemployment_rate, names.arg = all_ages_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_hlt %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 12 x 2
##                                                  Major Unemployment_rate
##                                                  <chr>             <dbl>
##  1                       TREATMENT THERAPY PROFESSIONS        0.02629160
##  2                                             NURSING        0.02679682
##  3                          MEDICAL ASSISTING SERVICES        0.03135685
##  4 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION        0.03435768
##  5                    MEDICAL TECHNOLOGIES TECHNICIANS        0.03620987
##  6       COMMUNICATION DISORDERS SCIENCES AND SERVICES        0.04646718
##  7            MISCELLANEOUS HEALTH MEDICAL PROFESSIONS        0.05357271
##  8                 GENERAL MEDICAL AND HEALTH SERVICES        0.05470063
##  9          HEALTH AND MEDICAL ADMINISTRATIVE SERVICES        0.05700398
## 10                                  NUTRITION SCIENCES        0.06321655
## 11                         COMMUNITY AND PUBLIC HEALTH        0.06652770
## 12             HEALTH AND MEDICAL PREPARATORY PROGRAMS        0.07000979

Industrial Arts and Consumer Services

barplot(all_ages_ia$Unemployment_rate, names.arg = all_ages_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_ia %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 7 x 2
##                                                               Major
##                                                               <chr>
## 1                     PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 2                          TRANSPORTATION SCIENCES AND TECHNOLOGIES
## 3                                             CONSTRUCTION SERVICES
## 4 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## 5                            COSMETOLOGY SERVICES AND CULINARY ARTS
## 6                                      FAMILY AND CONSUMER SCIENCES
## 7                                             MILITARY TECHNOLOGIES
## # ... with 1 more variables: Unemployment_rate <dbl>

Liberal Arts

barplot(all_ages_la$Unemployment_rate, names.arg = all_ages_la$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_la %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 16 x 2
##                                                            Major
##                                                            <chr>
##  1                              THEOLOGY AND RELIGIOUS VOCATIONS
##  2 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
##  3                                     ART HISTORY AND CRITICISM
##  4                                                       HISTORY
##  5                                                  LIBERAL ARTS
##  6                          AREA ETHNIC AND CIVILIZATION STUDIES
##  7                               ENGLISH LANGUAGE AND LITERATURE
##  8                                       OTHER FOREIGN LANGUAGES
##  9                                         UNITED STATES HISTORY
## 10                                      COMPOSITION AND RHETORIC
## 11                       INTERCULTURAL AND INTERNATIONAL STUDIES
## 12                              PHILOSOPHY AND RELIGIOUS STUDIES
## 13                               MULTI/INTERDISCIPLINARY STUDIES
## 14                                                    HUMANITIES
## 15                                   ANTHROPOLOGY AND ARCHEOLOGY
## 16           LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
## # ... with 1 more variables: Unemployment_rate <dbl>

Law

barplot(all_ages_law$Unemployment_rate, names.arg = all_ages_law$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_law %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 5 x 2
##                                  Major Unemployment_rate
##                                  <chr>             <dbl>
## 1 CRIMINAL JUSTICE AND FIRE PROTECTION        0.05403559
## 2                      COURT REPORTING        0.06651258
## 3                PUBLIC ADMINISTRATION        0.06965492
## 4            PRE-LAW AND LEGAL STUDIES        0.06984780
## 5                        PUBLIC POLICY        0.07921692

Psycology & Social Work

barplot(all_ages_psy$Unemployment_rate, names.arg = all_ages_psy$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_psy %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 9 x 2
##                                       Major Unemployment_rate
##                                       <chr>             <dbl>
## 1                               SOCIAL WORK        0.05937590
## 2                     COUNSELING PSYCHOLOGY        0.06802139
## 3                                PSYCHOLOGY        0.06966658
## 4 HUMAN SERVICES AND COMMUNITY ORGANIZATION        0.07242129
## 5                    EDUCATIONAL PSYCHOLOGY        0.07563114
## 6                  MISCELLANEOUS PSYCHOLOGY        0.08200936
## 7  INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY        0.08362907
## 8                         SOCIAL PSYCHOLOGY        0.08733625
## 9                       CLINICAL PSYCHOLOGY        0.10271216

Sciences

barplot(all_ages_sci$Unemployment_rate, names.arg = all_ages_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_sci %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 10 x 2
##                                                         Major
##                                                         <chr>
##  1                       ATMOSPHERIC SCIENCES AND METEOROLOGY
##  2                                          PHYSICAL SCIENCES
##  3                                                GEOSCIENCES
##  4                      MULTI-DISCIPLINARY OR GENERAL SCIENCE
##  5                                                    PHYSICS
##  6                                               OCEANOGRAPHY
##  7                                                  CHEMISTRY
##  8                                  GEOLOGY AND EARTH SCIENCE
##  9 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
## 10                                 ASTRONOMY AND ASTROPHYSICS
## # ... with 1 more variables: Unemployment_rate <dbl>

Social Sciences

barplot(all_ages_ssc$Unemployment_rate, names.arg = all_ages_ssc$Major, horiz = TRUE, cex.names = 0.3, las =1)

all_ages_ssc %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 9 x 2
##                               Major Unemployment_rate
##                               <chr>             <dbl>
## 1     MISCELLANEOUS SOCIAL SCIENCES        0.05439877
## 2                         ECONOMICS        0.06131272
## 3                       CRIMINOLOGY        0.06451917
## 4 INTERDISCIPLINARY SOCIAL SCIENCES        0.06538345
## 5                         SOCIOLOGY        0.06580430
## 6                         GEOGRAPHY        0.06900849
## 7  POLITICAL SCIENCE AND GOVERNMENT        0.06937385
## 8           INTERNATIONAL RELATIONS        0.07031327
## 9           GENERAL SOCIAL SCIENCES        0.07105693

Recent Graduates

Agriculture

barplot(rct_ag$Unemployment_rate, names.arg = rct_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_ag %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 10 x 2
##                                    Major Unemployment_rate
##                                    <chr>             <dbl>
##  1                          SOIL SCIENCE        0.00000000
##  2                   GENERAL AGRICULTURE        0.01964246
##  3            PLANT SCIENCE AND AGRONOMY        0.04545454
##  4 AGRICULTURE PRODUCTION AND MANAGEMENT        0.05003084
##  5                       ANIMAL SCIENCES        0.05086250
##  6             MISCELLANEOUS AGRICULTURE        0.05976676
##  7          NATURAL RESOURCES MANAGEMENT        0.06661920
##  8                AGRICULTURAL ECONOMICS        0.07724958
##  9                              FORESTRY        0.09672574
## 10                          FOOD SCIENCE        0.09693146

Arts

barplot(rct_art$Unemployment_rate, names.arg = rct_art$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_art %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 8 x 2
##                               Major Unemployment_rate
##                               <chr>             <dbl>
## 1                             MUSIC        0.07595967
## 2            DRAMA AND THEATER ARTS        0.07754113
## 3                         FINE ARTS        0.08418630
## 4           MISCELLANEOUS FINE ARTS        0.08937500
## 5                       STUDIO ARTS        0.08955224
## 6 COMMERCIAL ART AND GRAPHIC DESIGN        0.09679758
## 7        VISUAL AND PERFORMING ARTS        0.10219742
## 8  FILM VIDEO AND PHOTOGRAPHIC ARTS        0.10577224

Biological Sciences

barplot(rct_bio$Unemployment_rate, names.arg = rct_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_bio %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 14 x 2
##                                  Major Unemployment_rate
##                                  <chr>             <dbl>
##  1                              BOTANY        0.00000000
##  2                            GENETICS        0.03411765
##  3                             ZOOLOGY        0.04632028
##  4                        NEUROSCIENCE        0.04848168
##  5                             ECOLOGY        0.05447519
##  6               MISCELLANEOUS BIOLOGY        0.05854546
##  7                        MICROBIOLOGY        0.06677587
##  8                          PHYSIOLOGY        0.06916280
##  9                             BIOLOGY        0.07072473
## 10 COGNITIVE SCIENCE AND BIOPSYCHOLOGY        0.07523617
## 11               ENVIRONMENTAL SCIENCE        0.07858468
## 12                BIOCHEMICAL SCIENCES        0.08053138
## 13                   MOLECULAR BIOLOGY        0.08436116
## 14                        PHARMACOLOGY        0.08553157

Business

barplot(rct_bsn$Unemployment_rate, names.arg = rct_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_bsn %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 13 x 2
##                                              Major Unemployment_rate
##                                              <chr>             <dbl>
##  1             OPERATIONS LOGISTICS AND E-COMMERCE        0.04785870
##  2   MANAGEMENT INFORMATION SYSTEMS AND STATISTICS        0.05823961
##  3        HUMAN RESOURCES AND PERSONNEL MANAGEMENT        0.05956965
##  4                                         FINANCE        0.06068636
##  5                          HOSPITALITY MANAGEMENT        0.06116919
##  6                MARKETING AND MARKETING RESEARCH        0.06121506
##  7                                      ACCOUNTING        0.06974901
##  8 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION        0.07198297
##  9          BUSINESS MANAGEMENT AND ADMINISTRATION        0.07221834
## 10                                GENERAL BUSINESS        0.07286147
## 11                               ACTUARIAL SCIENCE        0.09565217
## 12                          INTERNATIONAL BUSINESS        0.09617506
## 13                              BUSINESS ECONOMICS        0.09644838

Communications & Jounralism

barplot(rct_cj$Unemployment_rate, names.arg = rct_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_cj %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 4 x 2
##                              Major Unemployment_rate
##                              <chr>             <dbl>
## 1 ADVERTISING AND PUBLIC RELATIONS        0.06796077
## 2                       JOURNALISM        0.06917644
## 3                   COMMUNICATIONS        0.07517698
## 4                       MASS MEDIA        0.08983683

Computer Science & Mathematics

barplot(rct_com$Unemployment_rate, names.arg = rct_com$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_com %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 11 x 2
##                                              Major Unemployment_rate
##                                              <chr>             <dbl>
##  1                MATHEMATICS AND COMPUTER SCIENCE        0.00000000
##  2                                     MATHEMATICS        0.04727714
##  3                            INFORMATION SCIENCES        0.06074144
##  4                                COMPUTER SCIENCE        0.06317277
##  5                 STATISTICS AND DECISION SCIENCE        0.08627367
##  6                             APPLIED MATHEMATICS        0.09082331
##  7                COMPUTER AND INFORMATION SYSTEMS        0.09346033
##  8 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY        0.09972338
##  9        COMPUTER PROGRAMMING AND DATA PROCESSING        0.11398259
## 10                      COMMUNICATION TECHNOLOGIES        0.11951147
## 11      COMPUTER NETWORKING AND TELECOMMUNICATIONS        0.15184981

Education

barplot(rct_ed$Unemployment_rate, names.arg = rct_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_ed %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 16 x 2
##                                          Major Unemployment_rate
##                                          <chr>             <dbl>
##  1  EDUCATIONAL ADMINISTRATION AND SUPERVISION        0.00000000
##  2               MATHEMATICS TEACHER EDUCATION        0.01620283
##  3          TEACHER EDUCATION: MULTIPLE LEVELS        0.03654583
##  4                     ART AND MUSIC EDUCATION        0.03863775
##  5                   EARLY CHILDHOOD EDUCATION        0.04010498
##  6                     SPECIAL NEEDS EDUCATION        0.04150782
##  7                        ELEMENTARY EDUCATION        0.04658571
##  8      SCIENCE AND COMPUTER TEACHER EDUCATION        0.04726368
##  9                LANGUAGE AND DRAMA EDUCATION        0.05030643
## 10                 SECONDARY TEACHER EDUCATION        0.05222898
## 11 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION        0.05408294
## 12                           GENERAL EDUCATION        0.05735993
## 13                     MISCELLANEOUS EDUCATION        0.05921195
## 14      PHYSICAL AND HEALTH EDUCATION TEACHING        0.07466750
## 15                             LIBRARY SCIENCE        0.10494572
## 16                   SCHOOL STUDENT COUNSELING        0.10757946

Engineering

barplot(rct_eng$Unemployment_rate, names.arg = rct_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_eng %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 29 x 2
##                                          Major Unemployment_rate
##                                          <chr>             <dbl>
##  1   ENGINEERING MECHANICS PHYSICS AND SCIENCE       0.006334343
##  2                       PETROLEUM ENGINEERING       0.018380527
##  3                           MATERIALS SCIENCE       0.023042836
##  4                   METALLURGICAL ENGINEERING       0.024096386
##  5 MATERIALS ENGINEERING AND MATERIALS SCIENCE       0.027788805
##  6          INDUSTRIAL PRODUCTION TECHNOLOGIES       0.028308097
##  7       ENGINEERING AND INDUSTRIAL MANAGEMENT       0.033651660
##  8    INDUSTRIAL AND MANUFACTURING ENGINEERING       0.042875544
##  9   NAVAL ARCHITECTURE AND MARINE ENGINEERING       0.050125313
## 10      MISCELLANEOUS ENGINEERING TECHNOLOGIES       0.052538520
## # ... with 19 more rows

Health

barplot(rct_hlt$Unemployment_rate, names.arg = rct_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_hlt %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 12 x 2
##                                                  Major Unemployment_rate
##                                                  <chr>             <dbl>
##  1                    MEDICAL TECHNOLOGIES TECHNICIANS        0.03698279
##  2                          MEDICAL ASSISTING SERVICES        0.04250653
##  3                                             NURSING        0.04486272
##  4       COMMUNICATION DISORDERS SCIENCES AND SERVICES        0.04758400
##  5 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION        0.05552083
##  6                       TREATMENT THERAPY PROFESSIONS        0.05982121
##  7                                  NUTRITION SCIENCES        0.06870068
##  8             HEALTH AND MEDICAL PREPARATORY PROGRAMS        0.06977971
##  9            MISCELLANEOUS HEALTH MEDICAL PROFESSIONS        0.08141125
## 10                 GENERAL MEDICAL AND HEALTH SERVICES        0.08210162
## 11          HEALTH AND MEDICAL ADMINISTRATIVE SERVICES        0.08962626
## 12                         COMMUNITY AND PUBLIC HEALTH        0.11214439

Industrial Arts and Consumer Services

barplot(rct_ia$Unemployment_rate, names.arg = rct_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_ia %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 7 x 2
##                                                               Major
##                                                               <chr>
## 1                                             MILITARY TECHNOLOGIES
## 2 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## 3                     PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 4                            COSMETOLOGY SERVICES AND CULINARY ARTS
## 5                                             CONSTRUCTION SERVICES
## 6                                      FAMILY AND CONSUMER SCIENCES
## 7                          TRANSPORTATION SCIENCES AND TECHNOLOGIES
## # ... with 1 more variables: Unemployment_rate <dbl>

Liberal Arts

barplot(rct_la$Unemployment_rate, names.arg = rct_la$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_la %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 16 x 2
##                                                            Major
##                                                            <chr>
##  1                                         UNITED STATES HISTORY
##  2                                     ART HISTORY AND CRITICISM
##  3                              THEOLOGY AND RELIGIOUS VOCATIONS
##  4                          AREA ETHNIC AND CIVILIZATION STUDIES
##  5                                                    HUMANITIES
##  6                               MULTI/INTERDISCIPLINARY STUDIES
##  7 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
##  8                                                  LIBERAL ARTS
##  9                                      COMPOSITION AND RHETORIC
## 10                       INTERCULTURAL AND INTERNATIONAL STUDIES
## 11                               ENGLISH LANGUAGE AND LITERATURE
## 12                                                       HISTORY
## 13                              PHILOSOPHY AND RELIGIOUS STUDIES
## 14                                   ANTHROPOLOGY AND ARCHEOLOGY
## 15           LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
## 16                                       OTHER FOREIGN LANGUAGES
## # ... with 1 more variables: Unemployment_rate <dbl>

Law

barplot(rct_law$Unemployment_rate, names.arg = rct_law$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_law %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 5 x 2
##                                  Major Unemployment_rate
##                                  <chr>             <dbl>
## 1                      COURT REPORTING        0.01168969
## 2            PRE-LAW AND LEGAL STUDIES        0.07196502
## 3 CRIMINAL JUSTICE AND FIRE PROTECTION        0.08245220
## 4                        PUBLIC POLICY        0.12842630
## 5                PUBLIC ADMINISTRATION        0.15949060

Psycology & Social Work

barplot(rct_psy$Unemployment_rate, names.arg = rct_psy$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_psy %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 9 x 2
##                                       Major Unemployment_rate
##                                       <chr>             <dbl>
## 1                         SOCIAL PSYCHOLOGY        0.02964960
## 2 HUMAN SERVICES AND COMMUNITY ORGANIZATION        0.03781903
## 3                  MISCELLANEOUS PSYCHOLOGY        0.05190783
## 4                     COUNSELING PSYCHOLOGY        0.05362065
## 5                    EDUCATIONAL PSYCHOLOGY        0.06511219
## 6                               SOCIAL WORK        0.06882792
## 7                                PSYCHOLOGY        0.08381087
## 8  INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY        0.10878661
## 9                       CLINICAL PSYCHOLOGY        0.14904820

Sciences

barplot(rct_sci$Unemployment_rate, names.arg = rct_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_sci %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 10 x 2
##                                                         Major
##                                                         <chr>
##  1                                 ASTRONOMY AND ASTROPHYSICS
##  2                       ATMOSPHERIC SCIENCES AND METEOROLOGY
##  3                                                GEOSCIENCES
##  4                                          PHYSICAL SCIENCES
##  5                                                    PHYSICS
##  6                                                  CHEMISTRY
##  7                      MULTI-DISCIPLINARY OR GENERAL SCIENCE
##  8                                               OCEANOGRAPHY
##  9 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
## 10                                  GEOLOGY AND EARTH SCIENCE
## # ... with 1 more variables: Unemployment_rate <dbl>

Social Sciences

barplot(rct_ssc$Unemployment_rate, names.arg = rct_ssc$Major, horiz = TRUE, cex.names = 0.3, las =1)

rct_ssc %>% dplyr::select(Major, Unemployment_rate) %>% arrange(Unemployment_rate)

## # A tibble: 9 x 2
##                               Major Unemployment_rate
##                               <chr>             <dbl>
## 1     MISCELLANEOUS SOCIAL SCIENCES        0.07307954
## 2                         SOCIOLOGY        0.08495100
## 3 INTERDISCIPLINARY SOCIAL SCIENCES        0.09230582
## 4           INTERNATIONAL RELATIONS        0.09679894
## 5                       CRIMINOLOGY        0.09724392
## 6                         ECONOMICS        0.09909232
## 7  POLITICAL SCIENCE AND GOVERNMENT        0.10117460
## 8           GENERAL SOCIAL SCIENCES        0.10345472
## 9                         GEOGRAPHY        0.11345863

Graduate Students

Agriculture

barplot(grad_ag$Grad_unemployment_rate, names.arg = grad_ag$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_ag %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 10 x 2
##                                    Major Grad_unemployment_rate
##                                    <chr>                  <dbl>
##  1                       ANIMAL SCIENCES             0.01232653
##  2                          SOIL SCIENCE             0.01466782
##  3                AGRICULTURAL ECONOMICS             0.01998520
##  4                   GENERAL AGRICULTURE             0.02932492
##  5          NATURAL RESOURCES MANAGEMENT             0.02949596
##  6            PLANT SCIENCE AND AGRONOMY             0.03125399
##  7                          FOOD SCIENCE             0.03295627
##  8 AGRICULTURE PRODUCTION AND MANAGEMENT             0.03483833
##  9                              FORESTRY             0.04129642
## 10             MISCELLANEOUS AGRICULTURE             0.08645247

Arts

barplot(grad_art$Grad_unemployment_rate, names.arg = grad_art$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_art %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 8 x 2
##                               Major Grad_unemployment_rate
##                               <chr>                  <dbl>
## 1        VISUAL AND PERFORMING ARTS             0.03842167
## 2           MISCELLANEOUS FINE ARTS             0.03846154
## 3                             MUSIC             0.04147201
## 4                       STUDIO ARTS             0.05036224
## 5 COMMERCIAL ART AND GRAPHIC DESIGN             0.05775585
## 6                         FINE ARTS             0.06100455
## 7            DRAMA AND THEATER ARTS             0.06766724
## 8  FILM VIDEO AND PHOTOGRAPHIC ARTS             0.09647293

Biological Sciences

barplot(grad_bio$Grad_unemployment_rate, names.arg = grad_bio$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_bio %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 14 x 2
##                                  Major Grad_unemployment_rate
##                                  <chr>                  <dbl>
##  1                        NEUROSCIENCE             0.01761115
##  2                            GENETICS             0.01942385
##  3                          PHYSIOLOGY             0.02092066
##  4                             ZOOLOGY             0.02092797
##  5                             BIOLOGY             0.02110471
##  6                BIOCHEMICAL SCIENCES             0.02421187
##  7                        PHARMACOLOGY             0.02422993
##  8               MISCELLANEOUS BIOLOGY             0.02586095
##  9                   MOLECULAR BIOLOGY             0.02820749
## 10                        MICROBIOLOGY             0.03215909
## 11                              BOTANY             0.03321006
## 12               ENVIRONMENTAL SCIENCE             0.03527623
## 13                             ECOLOGY             0.03665403
## 14 COGNITIVE SCIENCE AND BIOPSYCHOLOGY             0.04534244

Business

barplot(grad_bsn$Grad_unemployment_rate, names.arg = grad_bsn$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_bsn %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 13 x 2
##                                              Major Grad_unemployment_rate
##                                              <chr>                  <dbl>
##  1             OPERATIONS LOGISTICS AND E-COMMERCE             0.02284832
##  2   MANAGEMENT INFORMATION SYSTEMS AND STATISTICS             0.03871464
##  3                                GENERAL BUSINESS             0.04089493
##  4                                      ACCOUNTING             0.04185735
##  5                                         FINANCE             0.04409251
##  6 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION             0.04661565
##  7          BUSINESS MANAGEMENT AND ADMINISTRATION             0.04903231
##  8                              BUSINESS ECONOMICS             0.05125341
##  9                MARKETING AND MARKETING RESEARCH             0.05205949
## 10                          INTERNATIONAL BUSINESS             0.05488957
## 11        HUMAN RESOURCES AND PERSONNEL MANAGEMENT             0.06471597
## 12                          HOSPITALITY MANAGEMENT             0.07386679
## 13                               ACTUARIAL SCIENCE             0.07424381

Communications & Jounralism

barplot(grad_cj$Grad_unemployment_rate, names.arg = grad_cj$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_cj %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 4 x 2
##                              Major Grad_unemployment_rate
##                              <chr>                  <dbl>
## 1 ADVERTISING AND PUBLIC RELATIONS             0.03056160
## 2                       JOURNALISM             0.04202330
## 3                   COMMUNICATIONS             0.04865767
## 4                       MASS MEDIA             0.05164133

Computer Science & Mathematics

barplot(grad_com$Grad_unemployment_rate, names.arg = grad_com$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_com %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 11 x 2
##                                              Major Grad_unemployment_rate
##                                              <chr>                  <dbl>
##  1        COMPUTER PROGRAMMING AND DATA PROCESSING             0.02461220
##  2                             APPLIED MATHEMATICS             0.02863239
##  3                                COMPUTER SCIENCE             0.03619812
##  4                                     MATHEMATICS             0.03764496
##  5                COMPUTER AND INFORMATION SYSTEMS             0.04004921
##  6                 STATISTICS AND DECISION SCIENCE             0.04235759
##  7                            INFORMATION SCIENCES             0.04999576
##  8                      COMMUNICATION TECHNOLOGIES             0.05841063
##  9 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY             0.05958663
## 10      COMPUTER NETWORKING AND TELECOMMUNICATIONS             0.08160569
## 11                MATHEMATICS AND COMPUTER SCIENCE             0.10289017

Education

barplot(grad_ed$Grad_unemployment_rate, names.arg = grad_ed$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_ed %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 16 x 2
##                                          Major Grad_unemployment_rate
##                                          <chr>                  <dbl>
##  1               MATHEMATICS TEACHER EDUCATION             0.01424816
##  2  EDUCATIONAL ADMINISTRATION AND SUPERVISION             0.01676074
##  3          TEACHER EDUCATION: MULTIPLE LEVELS             0.01875968
##  4                        ELEMENTARY EDUCATION             0.02036289
##  5      SCIENCE AND COMPUTER TEACHER EDUCATION             0.02218759
##  6                 SECONDARY TEACHER EDUCATION             0.02229106
##  7                     SPECIAL NEEDS EDUCATION             0.02246486
##  8                   EARLY CHILDHOOD EDUCATION             0.02716594
##  9      PHYSICAL AND HEALTH EDUCATION TEACHING             0.02788530
## 10                     ART AND MUSIC EDUCATION             0.02840349
## 11                             LIBRARY SCIENCE             0.02993678
## 12                LANGUAGE AND DRAMA EDUCATION             0.03107441
## 13 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION             0.03175695
## 14                           GENERAL EDUCATION             0.03334986
## 15                     MISCELLANEOUS EDUCATION             0.03463734
## 16                   SCHOOL STUDENT COUNSELING             0.05140030

Engineering

barplot(grad_eng$Grad_unemployment_rate, names.arg = grad_eng$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_eng %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 29 x 2
##                                        Major Grad_unemployment_rate
##                                        <chr>                  <dbl>
##  1                       NUCLEAR ENGINEERING             0.01153802
##  2                    BIOMEDICAL ENGINEERING             0.01844758
##  3                     PETROLEUM ENGINEERING             0.01947149
##  4                      COMPUTER ENGINEERING             0.02130079
##  5                         MATERIALS SCIENCE             0.02177728
##  6                 METALLURGICAL ENGINEERING             0.02290638
##  7                     AEROSPACE ENGINEERING             0.02793169
##  8    GEOLOGICAL AND GEOPHYSICAL ENGINEERING             0.02870639
##  9 ENGINEERING MECHANICS PHYSICS AND SCIENCE             0.02896772
## 10    MISCELLANEOUS ENGINEERING TECHNOLOGIES             0.03169782
## # ... with 19 more rows

Health

barplot(grad_hlt$Grad_unemployment_rate, names.arg = grad_hlt$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_hlt %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 12 x 2
##                                                  Major
##                                                  <chr>
##  1       COMMUNICATION DISORDERS SCIENCES AND SERVICES
##  2                       TREATMENT THERAPY PROFESSIONS
##  3                                             NURSING
##  4             HEALTH AND MEDICAL PREPARATORY PROGRAMS
##  5                 GENERAL MEDICAL AND HEALTH SERVICES
##  6                                  NUTRITION SCIENCES
##  7 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION
##  8                          MEDICAL ASSISTING SERVICES
##  9                    MEDICAL TECHNOLOGIES TECHNICIANS
## 10            MISCELLANEOUS HEALTH MEDICAL PROFESSIONS
## 11                         COMMUNITY AND PUBLIC HEALTH
## 12          HEALTH AND MEDICAL ADMINISTRATIVE SERVICES
## # ... with 1 more variables: Grad_unemployment_rate <dbl>

Industrial Arts and Consumer Services

barplot(grad_ia$Grad_unemployment_rate, names.arg = grad_ia$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_ia %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 7 x 2
##                                                               Major
##                                                               <chr>
## 1                     PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 2                                      FAMILY AND CONSUMER SCIENCES
## 3                          TRANSPORTATION SCIENCES AND TECHNOLOGIES
## 4                                             MILITARY TECHNOLOGIES
## 5                            COSMETOLOGY SERVICES AND CULINARY ARTS
## 6                                             CONSTRUCTION SERVICES
## 7 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## # ... with 1 more variables: Grad_unemployment_rate <dbl>

Liberal Arts

barplot(grad_la$Grad_unemployment_rate, names.arg = grad_la$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_la %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 16 x 2
##                                                            Major
##                                                            <chr>
##  1                               MULTI/INTERDISCIPLINARY STUDIES
##  2                                         UNITED STATES HISTORY
##  3                              THEOLOGY AND RELIGIOUS VOCATIONS
##  4 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
##  5                                      COMPOSITION AND RHETORIC
##  6                              PHILOSOPHY AND RELIGIOUS STUDIES
##  7                                                       HISTORY
##  8                               ENGLISH LANGUAGE AND LITERATURE
##  9                                                  LIBERAL ARTS
## 10                                       OTHER FOREIGN LANGUAGES
## 11                          AREA ETHNIC AND CIVILIZATION STUDIES
## 12                                   ANTHROPOLOGY AND ARCHEOLOGY
## 13           LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
## 14                                     ART HISTORY AND CRITICISM
## 15                                                    HUMANITIES
## 16                       INTERCULTURAL AND INTERNATIONAL STUDIES
## # ... with 1 more variables: Grad_unemployment_rate <dbl>

Law

barplot(grad_law$Grad_unemployment_rate, names.arg = grad_law$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_law %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 5 x 2
##                                  Major Grad_unemployment_rate
##                                  <chr>                  <dbl>
## 1                      COURT REPORTING             0.00000000
## 2                        PUBLIC POLICY             0.03122649
## 3            PRE-LAW AND LEGAL STUDIES             0.03891879
## 4 CRIMINAL JUSTICE AND FIRE PROTECTION             0.04100487
## 5                PUBLIC ADMINISTRATION             0.05884949

Psycology & Social Work

barplot(grad_psy$Grad_unemployment_rate, names.arg = grad_psy$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_psy %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 9 x 2
##                                       Major Grad_unemployment_rate
##                                       <chr>                  <dbl>
## 1                    EDUCATIONAL PSYCHOLOGY             0.02560025
## 2                  MISCELLANEOUS PSYCHOLOGY             0.03055566
## 3                     COUNSELING PSYCHOLOGY             0.03559968
## 4                                PSYCHOLOGY             0.03755603
## 5                               SOCIAL WORK             0.03757089
## 6                       CLINICAL PSYCHOLOGY             0.04495803
## 7  INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY             0.04530453
## 8 HUMAN SERVICES AND COMMUNITY ORGANIZATION             0.06006150
## 9                         SOCIAL PSYCHOLOGY             0.06108066

Sciences

barplot(grad_sci$Grad_unemployment_rate, names.arg = grad_sci$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_sci %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 10 x 2
##                                                         Major
##                                                         <chr>
##  1 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
##  2                                                GEOSCIENCES
##  3                                 ASTRONOMY AND ASTROPHYSICS
##  4                                               OCEANOGRAPHY
##  5                       ATMOSPHERIC SCIENCES AND METEOROLOGY
##  6                                  GEOLOGY AND EARTH SCIENCE
##  7                      MULTI-DISCIPLINARY OR GENERAL SCIENCE
##  8                                                  CHEMISTRY
##  9                                                    PHYSICS
## 10                                          PHYSICAL SCIENCES
## # ... with 1 more variables: Grad_unemployment_rate <dbl>

Social Sciences

barplot(grad_ssc$Grad_unemployment_rate, names.arg = grad_ssc$Major, horiz = TRUE, cex.names = 0.3, las =1)

grad_ssc %>% dplyr::select(Major, Grad_unemployment_rate) %>% arrange(Grad_unemployment_rate)

## # A tibble: 9 x 2
##                               Major Grad_unemployment_rate
##                               <chr>                  <dbl>
## 1                         GEOGRAPHY             0.03113024
## 2  POLITICAL SCIENCE AND GOVERNMENT             0.03880891
## 3                         SOCIOLOGY             0.04187831
## 4                         ECONOMICS             0.04376842
## 5           INTERNATIONAL RELATIONS             0.04528078
## 6                       CRIMINOLOGY             0.04529982
## 7     MISCELLANEOUS SOCIAL SCIENCES             0.04805093
## 8           GENERAL SOCIAL SCIENCES             0.05847484
## 9 INTERDISCIPLINARY SOCIAL SCIENCES             0.05967189

Cooper DATA 606 Final Project

Nathan Cooper

November 30, 2017

Introduction

The Data

Cases

Exploring the data.

Chi Squared Tests for Independance for Employment Status

Summary of \(\chi^2\) Tests.

Student’s T and Kolmogorov-Smirnov Tests - Median Salary

Engineering vs Physical Sciences

Liberal Arts vs. Psycology and Social Work.

Engineering vs Liberal Arts

Engineering vs Psycology & Social Work

Liberal Arts vs Physical Sciences

Psycology & Social Work vs Physical Sciences

Summary of T-tests an KS tests

Linear Model - Unemployment rate vs median salary

Linear Model- Gender Wage Gap

Under-employment

Conclusions

Future Work

References

Appendix

Median Salary

All Ages

Agriculture

Arts

Biological Sciences

Business

Communications & Jounralism

Computer Science & Mathematics

Education

Engineering

Health

Industrial Arts and Consumer Services

Liberal Arts

Law

Psycology & Social Work

Sciences

Social Sciences

Recent Graduates

Agriculture

Arts

Biological Sciences

Business

Communications & Jounralism

Computer Science & Mathematics

Education

Engineering

Health

Industrial Arts and Consumer Services

Liberal Arts

Law

Psycology & Social Work

Sciences

Social Sciences

Graduate Students

Agriculture

Arts

Biological Sciences

Business

Communications & Jounralism

Computer Science & Mathematics

Education

Engineering

Health

Industrial Arts and Consumer Services

Liberal Arts

Law

Psycology & Social Work

Sciences

Social Sciences

Unemployment Rate

All

Agriculture

Arts

Biological Sciences

Business

Communications & Jounralism