Whats it Worth: The Economic Value of College Education

Setup the Clean Environment

All done behind the scene. :)

Data Preparation

Load Libraries

# Load Libraries
library("DT")
library("knitr")
library("dplyr")
library("tidyr")
library("stringr")
library("psych")

Load All Student Data

# load data

# All Students
url1 = "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/all-ages.csv"
all.students <- read.csv(url1, sep=",",  header=T, stringsAsFactors = FALSE)
all.students <- all.students %>% 
    tbl_df() %>% 
    arrange(Major_category)

head(all.students, 2)
## # A tibble: 2 x 11
##   Major_code Major Majo~  Total Empl~ Empl~ Unem~ Unemp~ Medi~ P25th P75th
##        <int> <chr> <chr>  <int> <int> <int> <int>  <dbl> <int> <int> <dbl>
## 1       1100 GENE~ Agri~ 128148 90245 74078  2423 0.0261 50000 34000 80000
## 2       1101 AGRI~ Agri~  95326 76865 64240  2266 0.0286 54000 36000 80000

Load Graduate Student Data

# Grad Students
url2 = "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/grad-students.csv"
grad.students <- read.csv(url2, sep=",",  header=T, stringsAsFactors = FALSE)
grad.students <- grad.students %>% 
    tbl_df() %>% 
    arrange(Major_category)

head(grad.students, 2)
## # A tibble: 2 x 22
##   Major~ Major   Major_~ Grad_~ Grad~ Grad~ Grad~ Grad~ Grad_~ Grad~ Grad~
##    <int> <chr>   <chr>    <int> <int> <int> <int> <int>  <dbl> <dbl> <int>
## 1   1101 AGRICU~ Agricu~  17488   386 13104 11207   473 0.0348 67000 41600
## 2   1100 GENERA~ Agricu~  44306   764 28930 23024   874 0.0293 68000 45000
## # ... with 11 more variables: Grad_P75 <dbl>, Nongrad_total <int>,
## #   Nongrad_employed <int>, Nongrad_full_time_year_round <int>,
## #   Nongrad_unemployed <int>, Nongrad_unemployment_rate <dbl>,
## #   Nongrad_median <dbl>, Nongrad_P25 <int>, Nongrad_P75 <dbl>, Grad_share
## #   <dbl>, Grad_premium <dbl>

Load Under Graduate Student Data

# UnderGrad Students
url3 = "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv"
ungrad.students <- read.csv(url3, sep=",",  header=T, stringsAsFactors = FALSE)
ungrad.students <- ungrad.students %>% 
    tbl_df() %>% 
    arrange(Major_category)

head(ungrad.students, 2)
## # A tibble: 2 x 21
##    Rank Majo~ Major Total   Men Women Majo~ Share~ Samp~ Empl~ Full~ Part~
##   <int> <int> <chr> <int> <int> <int> <chr>  <dbl> <int> <int> <int> <int>
## 1    22  1104 FOOD~    NA    NA    NA Agri~ NA        36  3149  2558  1121
## 2    64  1101 AGRI~ 14240  9658  4582 Agri~  0.322   273 12323 11119  2196
## # ... with 9 more variables: Full_time_year_round <int>, Unemployed <int>,
## #   Unemployment_rate <dbl>, Median <int>, P25th <int>, P75th <int>,
## #   College_jobs <int>, Non_college_jobs <int>, Low_wage_jobs <int>

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

College Majors The Economic Guide to picking a College Major

I come from India. There is a joke in my country which ges like this, “Indian parents give their kids FULL freedom to select a career of their choice, as long as, it is an engineer, doctor or a lawyer!”

I have always been curious on how does selection of a college major influence a person’s success? I want to examine which fields can guarantee financial success by performing hypothesis testing after analyzing the employability and median incomes.

Cases

What are the cases, and how many are there?

All_ages: This data represents a case of both undergrads and grad students from 173 majors offered by colleges in USA. Grad Students: This data is subset of above and each case represents majoes offered from list of 173 majors offered by colleges in USA for grad students over 25+ years of age. Under Grad Students: This data is subset of above and each case represents majoes offered from list of 173 majors offered by colleges in USA for undergrad students under 28 years of age.

Data collection

Describe the method of data collection.

The data was obtained from 538 data set https://github.com/fivethirtyeight/data/tree/master/college-majors

The data and code behind the story The Economic Guide To Picking A College Major. All data is from American Community Survey 2010-2012 Public Use Microdata Series. Download data here: http://www.census.gov/programs-surveys/acs/data/pums.html Documentation here: http://www.census.gov/programs-surveys/acs/technical-documentation/pums.html

majors-list.csv List of majors with their FOD1P codes and major categories. Major categories are from Carnevale et al, “What’s It Worth?: The Economic Value of College Majors.” Georgetown University Center on Education and the Workforce, 2011. http://cew.georgetown.edu/whatsitworth

Three main data files: all-ages.csv recent-grads.csv (ages <28) grad-students.csv (ages 25+)

All contain basic earnings and labor force information. recent-grads.csv contains a more detailed breakdown, including by sex and by the type of job they got. grad-students.csv contains details on graduate school attendees. Additionally, women-stem.csv contains data for scatter plot in associated DataLab post on women in science/technology jobs. It is a subset of recent-grads.csv.

Type of study

What type of study is this (observational/experiment)?

This is an observational study

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

The data was obtained from 538 data set https://github.com/fivethirtyeight/data/tree/master/college-majors

Response

What is the response variable, and what type is it (numerical/categorical)?

The response variables is College Majors and it is categorical

Explanatory

What is the explanatory variable, and what type is it (numerical/categorival)?

Explanatory variables is median income and also number of employed and unemployed degree holders

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you are comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

Various Summaries

summary(all.students)
##    Major_code      Major           Major_category         Total        
##  Min.   :1100   Length:173         Length:173         Min.   :   2396  
##  1st Qu.:2403   Class :character   Class :character   1st Qu.:  24280  
##  Median :3608   Mode  :character   Mode  :character   Median :  75791  
##  Mean   :3880                                         Mean   : 230257  
##  3rd Qu.:5503                                         3rd Qu.: 205763  
##  Max.   :6403                                         Max.   :3123510  
##     Employed       Employed_full_time_year_round   Unemployed    
##  Min.   :   1492   Min.   :   1093               Min.   :     0  
##  1st Qu.:  17281   1st Qu.:  12722               1st Qu.:  1101  
##  Median :  56564   Median :  39613               Median :  3619  
##  Mean   : 166162   Mean   : 126308               Mean   :  9725  
##  3rd Qu.: 142879   3rd Qu.: 111025               3rd Qu.:  8862  
##  Max.   :2354398   Max.   :1939384               Max.   :147261  
##  Unemployment_rate     Median           P25th           P75th       
##  Min.   :0.00000   Min.   : 35000   Min.   :24900   Min.   : 45800  
##  1st Qu.:0.04626   1st Qu.: 46000   1st Qu.:32000   1st Qu.: 70000  
##  Median :0.05472   Median : 53000   Median :36000   Median : 80000  
##  Mean   :0.05736   Mean   : 56816   Mean   :38697   Mean   : 82506  
##  3rd Qu.:0.06904   3rd Qu.: 65000   3rd Qu.:42000   3rd Qu.: 95000  
##  Max.   :0.15615   Max.   :125000   Max.   :78000   Max.   :210000
summary(grad.students)
##    Major_code      Major           Major_category       Grad_total     
##  Min.   :1100   Length:173         Length:173         Min.   :   1542  
##  1st Qu.:2403   Class :character   Class :character   1st Qu.:  15284  
##  Median :3608   Mode  :character   Mode  :character   Median :  37872  
##  Mean   :3880                                         Mean   : 127672  
##  3rd Qu.:5503                                         3rd Qu.: 148255  
##  Max.   :6403                                         Max.   :1184158  
##  Grad_sample_size Grad_employed    Grad_full_time_year_round
##  Min.   :   22    Min.   :  1008   Min.   :   770           
##  1st Qu.:  314    1st Qu.: 12659   1st Qu.:  9894           
##  Median :  688    Median : 28930   Median : 22523           
##  Mean   : 2251    Mean   : 94037   Mean   : 72861           
##  3rd Qu.: 2528    3rd Qu.:109944   3rd Qu.: 80794           
##  Max.   :21994    Max.   :915341   Max.   :703347           
##  Grad_unemployed Grad_unemployment_rate  Grad_median        Grad_P25    
##  Min.   :    0   Min.   :0.00000        Min.   : 47000   Min.   :24500  
##  1st Qu.:  453   1st Qu.:0.02607        1st Qu.: 65000   1st Qu.:45000  
##  Median : 1179   Median :0.03665        Median : 75000   Median :50000  
##  Mean   : 3506   Mean   :0.03934        Mean   : 76756   Mean   :52597  
##  3rd Qu.: 3329   3rd Qu.:0.04805        3rd Qu.: 90000   3rd Qu.:60000  
##  Max.   :35718   Max.   :0.13851        Max.   :135000   Max.   :85000  
##     Grad_P75      Nongrad_total     Nongrad_employed 
##  Min.   : 65000   Min.   :   2232   Min.   :   1328  
##  1st Qu.: 93000   1st Qu.:  20564   1st Qu.:  15914  
##  Median :108000   Median :  68993   Median :  50092  
##  Mean   :112087   Mean   : 214720   Mean   : 154554  
##  3rd Qu.:130000   3rd Qu.: 184971   3rd Qu.: 129179  
##  Max.   :294000   Max.   :2996892   Max.   :2253649  
##  Nongrad_full_time_year_round Nongrad_unemployed Nongrad_unemployment_rate
##  Min.   :    980              Min.   :     0     Min.   :0.00000          
##  1st Qu.:  11755              1st Qu.:   880     1st Qu.:0.04198          
##  Median :  38384              Median :  3157     Median :0.05103          
##  Mean   : 120737              Mean   :  8486     Mean   :0.05395          
##  3rd Qu.: 103629              3rd Qu.:  7409     3rd Qu.:0.06439          
##  Max.   :1882507              Max.   :136978     Max.   :0.16091          
##  Nongrad_median    Nongrad_P25     Nongrad_P75       Grad_share     
##  Min.   : 37000   Min.   :25000   Min.   : 48000   Min.   :0.09632  
##  1st Qu.: 48700   1st Qu.:34000   1st Qu.: 72000   1st Qu.:0.26757  
##  Median : 55000   Median :38000   Median : 80000   Median :0.39875  
##  Mean   : 58584   Mean   :40078   Mean   : 84333   Mean   :0.40059  
##  3rd Qu.: 65000   3rd Qu.:44000   3rd Qu.: 97000   3rd Qu.:0.49912  
##  Max.   :126000   Max.   :80000   Max.   :215000   Max.   :0.93117  
##   Grad_premium    
##  Min.   :-0.0250  
##  1st Qu.: 0.2308  
##  Median : 0.3208  
##  Mean   : 0.3285  
##  3rd Qu.: 0.4000  
##  Max.   : 1.6471
summary(ungrad.students)
##       Rank       Major_code      Major               Total       
##  Min.   :  1   Min.   :1100   Length:173         Min.   :   124  
##  1st Qu.: 44   1st Qu.:2403   Class :character   1st Qu.:  4550  
##  Median : 87   Median :3608   Mode  :character   Median : 15104  
##  Mean   : 87   Mean   :3880                      Mean   : 39370  
##  3rd Qu.:130   3rd Qu.:5503                      3rd Qu.: 38910  
##  Max.   :173   Max.   :6403                      Max.   :393735  
##                                                  NA's   :1       
##       Men             Women        Major_category       ShareWomen    
##  Min.   :   119   Min.   :     0   Length:173         Min.   :0.0000  
##  1st Qu.:  2178   1st Qu.:  1778   Class :character   1st Qu.:0.3360  
##  Median :  5434   Median :  8386   Mode  :character   Median :0.5340  
##  Mean   : 16723   Mean   : 22647                      Mean   :0.5222  
##  3rd Qu.: 14631   3rd Qu.: 22554                      3rd Qu.:0.7033  
##  Max.   :173809   Max.   :307087                      Max.   :0.9690  
##  NA's   :1        NA's   :1                           NA's   :1       
##   Sample_size        Employed        Full_time        Part_time     
##  Min.   :   2.0   Min.   :     0   Min.   :   111   Min.   :     0  
##  1st Qu.:  39.0   1st Qu.:  3608   1st Qu.:  3154   1st Qu.:  1030  
##  Median : 130.0   Median : 11797   Median : 10048   Median :  3299  
##  Mean   : 356.1   Mean   : 31193   Mean   : 26029   Mean   :  8832  
##  3rd Qu.: 338.0   3rd Qu.: 31433   3rd Qu.: 25147   3rd Qu.:  9948  
##  Max.   :4212.0   Max.   :307933   Max.   :251540   Max.   :115172  
##                                                                     
##  Full_time_year_round   Unemployed    Unemployment_rate     Median      
##  Min.   :   111       Min.   :    0   Min.   :0.00000   Min.   : 22000  
##  1st Qu.:  2453       1st Qu.:  304   1st Qu.:0.05031   1st Qu.: 33000  
##  Median :  7413       Median :  893   Median :0.06796   Median : 36000  
##  Mean   : 19694       Mean   : 2416   Mean   :0.06819   Mean   : 40151  
##  3rd Qu.: 16891       3rd Qu.: 2393   3rd Qu.:0.08756   3rd Qu.: 45000  
##  Max.   :199897       Max.   :28169   Max.   :0.17723   Max.   :110000  
##                                                                         
##      P25th           P75th         College_jobs    Non_college_jobs
##  Min.   :18500   Min.   : 22000   Min.   :     0   Min.   :     0  
##  1st Qu.:24000   1st Qu.: 42000   1st Qu.:  1675   1st Qu.:  1591  
##  Median :27000   Median : 47000   Median :  4390   Median :  4595  
##  Mean   :29501   Mean   : 51494   Mean   : 12323   Mean   : 13284  
##  3rd Qu.:33000   3rd Qu.: 60000   3rd Qu.: 14444   3rd Qu.: 11783  
##  Max.   :95000   Max.   :125000   Max.   :151643   Max.   :148395  
##                                                                    
##  Low_wage_jobs  
##  Min.   :    0  
##  1st Qu.:  340  
##  Median : 1231  
##  Mean   : 3859  
##  3rd Qu.: 3466  
##  Max.   :48207  
## 

All student median income

hist(all.students$Median, main = "All Student Median Income", xlab = "Median Incomes (USD)", col = "blue")

Combined unemployement analysis

combine.unemployment <- cbind(all.students$Unemployment_rate, ungrad.students$Unemployment_rate, grad.students$Grad_unemployment_rate)

barplot(combine.unemployment/nrow(combine.unemployment), names.arg = c("All", "Recent Grad", "Grad Student"), xlab = "Unemployment Rate", col = heat.colors(nrow(combine.unemployment)))

The above graphs already give us a flavor of the data that graduate students clearly have much higher median income compared to students who recently completed under grads.