Introduction

With the rapid development of science and technology STEM majors are becoming more popular than other majors. Under the growing needs in science related jobs, I would like to study the relationship between median income and different major categories in STEM, business, and liberal arts.

Also, I would like to study the relationship between median income (Median) and the percentage of job requirement of being college graduates (Jobindex) by different majors and major categories in STEM, business, and liberal arts, among the recent graduates.

Data

Data Collection

Every year, the U.S. Census Bureau contacts over 3.5 million households across the country to participate in the American Community Survey (ACS). ACS is an ongoing survey that provides vital information on a yearly basis about our nation and its people. Information from the survey generates data that help determine how more than $675 billion in federal and state funds are distributed each year.

Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. (https://www.census.gov/programs-surveys/acs/about.html)

All three main .csv datasets are from American Community Survey 2010-2012 Public Use Microdata Series. They contain basic earnings and labor force information.

Download data here: http://www.census.gov/programs-surveys/acs/data/pums.html

Documentation here: http://www.census.gov/programs-surveys/acs/technical-documentation/pums.html

The datasets I used were obtained from github: https://github.com/fivethirtyeight/data/tree/master/college-majors.

##  [1] "Agriculture & Natural Resources"    
##  [2] "Biology & Life Science"             
##  [3] "Engineering"                        
##  [4] "Humanities & Liberal Arts"          
##  [5] "Communications & Journalism"        
##  [6] "Computers & Mathematics"            
##  [7] "Industrial Arts & Consumer Services"
##  [8] "Education"                          
##  [9] "Law & Public Policy"                
## [10] "Interdisciplinary"                  
## [11] "Health"                             
## [12] "Social Science"                     
## [13] "Physical Sciences"                  
## [14] "Psychology & Social Work"           
## [15] "Arts"                               
## [16] "Business"

Cases

What are the cases, and how many are there?

All_ages: There are 173 cases in total, with no age limits in each case.

Grad_students: There are 173 cases in total, with age >25.

Recent_grads: There are 173 cases in total, with age <28. Although one row has some NA values, I do not need the columns with NA, therefore I do not have to drop that row.

Each case represents one unique major_code and major offered by colleges in United States.

Variables

I will be studying the median income Median, type of majors Type and the percentage of job requirement of being college graduates Jobindex.

Median and Jobindex are numerical and Type is categorical.

My dependent variables are: Major_category (qualitative) and Jobindex (quantitative).

My response variable is Median income Median. It is quantitative.

Type of study

The type of study is observational. There is no experiment conducted and the data are collected from surveys.

Scope of Inference - generalizability

The data are collected by American Community Survey (ACS) from households all over the country.
Although there are only 173 cases, this analysis can be generalized to the population.
However, there may be different combination of students, for example, different age groups of students from different states or cities, which may lead to a small differences when generlize the results to the population.

Scope of Inference - causality

This is an observational study. It is not suitable to establish causal links between the variables of interest.

Exploratory Data Analysis

Listed below are summary statistics and visualizations of the data.

##  Major_category         Type              Employed         Unemployed    
##  Length:173         Length:173         Min.   :   1492   Min.   :     0  
##  Class :character   Class :character   1st Qu.:  17281   1st Qu.:  1101  
##  Mode  :character   Mode  :character   Median :  56564   Median :  3619  
##                                        Mean   : 166162   Mean   :  9725  
##                                        3rd Qu.: 142879   3rd Qu.:  8862  
##                                        Max.   :2354398   Max.   :147261  
##  Unemployment_rate     Median           P25th           P75th       
##  Min.   :0.00000   Min.   : 35000   Min.   :24900   Min.   : 45800  
##  1st Qu.:0.04626   1st Qu.: 46000   1st Qu.:32000   1st Qu.: 70000  
##  Median :0.05472   Median : 53000   Median :36000   Median : 80000  
##  Mean   :0.05736   Mean   : 56816   Mean   :38697   Mean   : 82506  
##  3rd Qu.:0.06904   3rd Qu.: 65000   3rd Qu.:42000   3rd Qu.: 95000  
##  Max.   :0.15615   Max.   :125000   Max.   :78000   Max.   :210000  
##   Mean_median   
##  Min.   :43000  
##  1st Qu.:46080  
##  Median :53222  
##  Mean   :56816  
##  3rd Qu.:62400  
##  Max.   :77759

##  Major_category         Type           Grad_employed    Grad_unemployed
##  Length:173         Length:173         Min.   :  1008   Min.   :    0  
##  Class :character   Class :character   1st Qu.: 12659   1st Qu.:  453  
##  Mode  :character   Mode  :character   Median : 28930   Median : 1179  
##                                        Mean   : 94037   Mean   : 3506  
##                                        3rd Qu.:109944   3rd Qu.: 3329  
##                                        Max.   :915341   Max.   :35718  
##  Grad_unemployment_rate  Grad_median        Grad_P25        Grad_P75     
##  Min.   :0.00000        Min.   : 47000   Min.   :24500   Min.   : 65000  
##  1st Qu.:0.02607        1st Qu.: 65000   1st Qu.:45000   1st Qu.: 93000  
##  Median :0.03665        Median : 75000   Median :50000   Median :108000  
##  Mean   :0.03934        Mean   : 76756   Mean   :52597   Mean   :112087  
##  3rd Qu.:0.04805        3rd Qu.: 90000   3rd Qu.:60000   3rd Qu.:130000  
##  Max.   :0.13851        Max.   :135000   Max.   :85000   Max.   :294000  
##  Mean_grad_median
##  Min.   :55000   
##  1st Qu.:66571   
##  Median :80292   
##  Mean   :76756   
##  3rd Qu.:85545   
##  Max.   :94328

##       Rank     Major_category         Type              Employed     
##  Min.   :  1   Length:173         Length:173         Min.   :     0  
##  1st Qu.: 44   Class :character   Class :character   1st Qu.:  3608  
##  Median : 87   Mode  :character   Mode  :character   Median : 11797  
##  Mean   : 87                                         Mean   : 31193  
##  3rd Qu.:130                                         3rd Qu.: 31433  
##  Max.   :173                                         Max.   :307933  
##    Unemployed    Unemployment_rate     Median           P25th      
##  Min.   :    0   Min.   :0.00000   Min.   : 22000   Min.   :18500  
##  1st Qu.:  304   1st Qu.:0.05031   1st Qu.: 33000   1st Qu.:24000  
##  Median :  893   Median :0.06796   Median : 36000   Median :27000  
##  Mean   : 2416   Mean   :0.06819   Mean   : 40151   Mean   :29501  
##  3rd Qu.: 2393   3rd Qu.:0.08756   3rd Qu.: 45000   3rd Qu.:33000  
##  Max.   :28169   Max.   :0.17723   Max.   :110000   Max.   :95000  
##      P75th         College_jobs    Non_college_jobs    Jobindex     
##  Min.   : 22000   Min.   :     0   Min.   :     0   Min.   :0.0708  
##  1st Qu.: 42000   1st Qu.:  1675   1st Qu.:  1591   1st Qu.:0.3635  
##  Median : 47000   Median :  4390   Median :  4595   Median :0.4674  
##  Mean   : 51494   Mean   : 12323   Mean   : 13284   Mean   :0.5106  
##  3rd Qu.: 60000   3rd Qu.: 14444   3rd Qu.: 11783   3rd Qu.:0.6963  
##  Max.   :125000   Max.   :151643   Max.   :148395   Max.   :1.0000  
##  Mean_jobindex     Mean_median   
##  Min.   :0.2973   Min.   :30100  
##  1st Qu.:0.3855   1st Qu.:33062  
##  Median :0.5161   Median :36900  
##  Mean   :0.5106   Mean   :40151  
##  3rd Qu.:0.6799   3rd Qu.:42745  
##  Max.   :0.7137   Max.   :57383

Inference

This study will investigate if salary (median income) is independent of major categories (STEM, business, and liberal arts) and the percentage of job requirement of being college students (Mean_jobindex) among different major categories.

Check Conditions

As all 173 observations are collected by surveys from all over the country, I assume that all of them are randomly collected and are independent of each other.

The sample size 173 is larger than 30, therefore it is considered sufficiently large for the following tests.

Hypothesis Test

I will use hypothesis test to investigate the relationship between median income and major categories.

  • H0: Median incomes for STEM, Business and Liberal Arts majors are the same.

  • H1: Median incomes for STEM, Business and Liberal Arts majors are different.

As stated in the Exploratory section, we can see that the top three median incomes of graduate students are STEM majors and the fourth is business, while the top median income of recent graduates is Engineering and is about $14k higher than the second major Business. And among the three bar charts, the bottom incomes are mostly from Liberal Arts majors. It is highly possible that there is a relationship between median income and major categories. STEM majors tends to have higher salaries from recent grads, and even higher when becoming graduate students with more experiences. Business majors tends to have high salaries from recent grads, but the increase of salary does not match with the increase of years of experiences. Liberal Arts majors tends to have lower salaries when compared to the other two.

  • For People from All Ages

Results from the inference function :

  • (Ht_all_ages.png)
  • For Graduate Students

Results from the inference function :

  • (Ht_grad_students.png)
  • For Recent Graduates

Results from the inference function :

  • (Ht_recent_grads.png)

Wrap-Up

According to the boxplots and numbers above, STEM majors have the highest mean of median incomes and Liberal Arts majors have the lowest for all 3 sets of data. Also with the inference function, we have the p-values approximately equal to zero. Therefore we reject the null hypothesis. There is an obvious relationship between median income and major categories.

Chi-Squared Tests for Independence for Graduate Status

Next I will study the Independence for Graduate Status. Chi-Square test, which is used to determine whether our hypothesis should be rejected or not between two variables. Through individual p-values, an inferential statistic used to determine if there is a significant different between the means of two variables, we will then be able to conclude our study.

  • Job Requirement of Being College Graduates for Recent Graduates
##      Type            College_jobs    Non_college_jobs
##  Length:172         Min.   :   162   Min.   :    50  
##  Class :character   1st Qu.:  1745   1st Qu.:  1594  
##  Mode  :character   Median :  4468   Median :  4604  
##                     Mean   : 12394   Mean   : 13362  
##                     3rd Qu.: 14596   3rd Qu.: 11792  
##                     Max.   :151643   Max.   :148395
## 
##  Pearson's Chi-squared test
## 
## data:  recent_grads_test1[, -1]
## X-squared = 729289, df = 171, p-value < 2.2e-16

Since the p-value is less than 0.05, we can reject the null hypothesis that the choice of major does not affects the rate of job requirement of being a college graduate. We accept the alternative hypothesis that choice of major does affect the rate of job requirement of being a college graduate, which means that, some majors may have lower job requirements (not a must to be a college graduate) and some may have higher job requirements (must be a college graduate).

Linear Model - Job Index vs Median Income

(Job Index is the rate of job requirement of being a college graduate of each major.)

Job market has different requirement of candidates’ education levels and some job fields may have high demands on that. Next, I will study the relationship between median income and jobindex. A high demand in education level may lead to higher salaries. In our data, we only have the number of college jobs and number of non-college jobs information available in the recent_grads table. I will use these data to create a linear model to study the relationship between the rate of job requirement and median income of each major. We will also see if the residuals of the model show necessary behavior of Normal Distribution and Constant Variance.

  • Median vs Jobindex
## 
## Call:
## lm(formula = Median ~ Jobindex, data = recent_grads_test2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20037  -6555  -2255   5815  63309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    29101       2256  12.902  < 2e-16 ***
## Jobindex       21763       4141   5.255 4.39e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10700 on 170 degrees of freedom
## Multiple R-squared:  0.1397, Adjusted R-squared:  0.1347 
## F-statistic: 27.62 on 1 and 170 DF,  p-value: 4.387e-07

  • Wrap-Up

Since the p-values are approximately equal to 0, the relationship between job requirement of being college graduates and median incomes is statistically significant. However, the correlation between those two variables are weak with \(R^2\) being 0.1397. This means that only around 13.97% of the variability of median income can be explained by the rate of job requirement of being college graduates. Therefore, we cannot conclude their relationship as “job with higher education requirement provides higher salaries”.

Conclusion

We found that major choices in college has significant effects in their median incomes. This effect is seen at all age levels, also among recent graduates and graduates. These findings show us that most STEM majors have higher salaries than Business and Liberal Arts majors. The result matches with my initial thoughts. Higher salaries may be one of the reason why STEM majors are becoming more popular among students than other majors besides the rapid development of science and technology. However, although there is statically significant relationship between the job requirement of being a college graduate and median incomes, there is no practically significance between them.

Future Analysis

Trends in number of students in each major categories and their corresponding incomes can be tracked and measured by further surveys over time. Through tracking of our sample, we may study the affects of having higher education towards the rate of salary increase in each major categories.

We may also look at the relationship between unemployment rates and major categories using our current data, and also between gender and median salary if we can get our hands on the raw data.