Introduction:

After reading a recent article and watching a debate on the topic, I chose to look at the correlation between the pursuit of continuing higher education and salary of an individual. It is true that our family members, close friends, and colleagues will only encourage to pursue higher education for the greater benefits of obtaning better future career options. However, there is a push from many successful individuals who have decided to drop out or create a hybrid type of career without pursuing higher educaiton. My research question is as follows: What significance does higher education play in an individual’s payscale? I am hoping to obtain some analysis using data to come up with an answer and to continue this vital conversation.

Data:

I have used the data from the US Census Bureau. This is an observational study and includes data from individuals residing in NY in the year 2010. From the data, both “Total Personal Income” and “Total Personal Earnings” are the response variables and are numerical. The explanatory variables are “Worker Class” (categorical), “Sex” (categorical), “Citizenship Status” (categorical), “Age” (numerical discrete), “Educational Attainment” (categorical), “School Attending” (categorical). I wanted to specifically look at this data to see how one’s education plays a role in how much they earn while residing in New York. The data contains 279 variables and 188,767 cases. As a next step, I continued to subset the data with these variables.

##   Age Citizenship_Status Educational_Attainment SEX
## 1  79                  4                     15   1
## 2  75                  3                      1   2
## 3  68                  4                     22   2
## 4  68                  1                     22   2
## 5  69                  1                     23   1
## 6  46                  4                     16   2
##   Total_Personal_Earnings Total_Personal_Income
## 1                       0                  9300
## 2                       0                  3600
## 3                       0                  3800
## 4                       0                 84200
## 5                       0                 92500
## 6                    3800                  3800
US Census Bureau-summary
Age Citizen_Stat Edu_Attainment SEX Total_Per_Earn Total_Per_Income
Min. : 0.00 Min. :1.000 Min. : 1.00 Min. :1.000 Min. : -7400 Min. : -13200
1st Qu.:20.00 1st Qu.:1.000 1st Qu.:13.00 1st Qu.:1.000 1st Qu.: 0 1st Qu.: 7000
Median :41.00 Median :1.000 Median :17.00 Median :2.000 Median : 12000 Median : 22400
Mean :40.04 Mean :1.632 Mean :15.88 Mean :1.521 Mean : 31315 Mean : 38837
3rd Qu.:58.00 3rd Qu.:1.000 3rd Qu.:20.00 3rd Qu.:2.000 3rd Qu.: 42100 3rd Qu.: 50000
Max. :94.00 Max. :5.000 Max. :24.00 Max. :2.000 Max. :957000 Max. :1225000
NA NA NA’s :6108 NA NA’s :35877 NA’s :33251

Lets see what relationships hold if any, between an individual’s education and how much they can earn residing in New York.

Exploratory Data Analysis:

Let’s first see the summary for the chosen variables as shown below:

vars n mean sd median trimmed mad min max range skew kurtosis se
Citizen_Stat 1 97349 1.75 1.42 1 1.45 0.00 1 5 4 1.46 0.32 0.00
Edu_Attainment 2 97349 18.60 3.34 19 18.93 2.97 1 24 23 -1.58 4.98 0.01
SEX 3 97349 1.49 0.50 1 1.49 0.00 1 2 1 0.02 -2.00 0.00
Total_Per_Earn 4 97349 49189.42 69351.07 33000 37253.27 31134.60 500 957000 956500 5.35 39.03 222.27

Next, let’s visualize this data using histograms for both education and earnings.

In the visualizations above, one can see that the Education graph is certainly right skewed while the Earning graph is left skewed. We can further explore both variables with the following table by breaking down the level of education an individual has attained while residing in New York for this data.

Earnings and Education
group1 vars n mean sd median min max se
a. no diploma 1 9362 21280.27 27367.55 15400 500 591000 282.85
b. HS diploma or equivalent 1 23512 30863.62 31400.45 25000 500 957000 204.78
c. some college, no degree 1 19485 33868.34 38915.51 25000 500 691000 278.79
d. associate/bachelor degree 1 29547 57898.03 70645.20 43000 500 957000 410.98
e. higher than bachelor degree 1 15443 96678.85 114998.47 65000 500 957000 925.39

Inference:

In order to perform a null hypothesis it was important to establish that the sample size is greater than 30. Also, it was established that the individuals were randomly chosen and could be of either two categories- having high school level or lower education or associate degree and above.

Let’s perform null hypothesis test and see our results.

H0: No difference exists in earnings when comparing an individual having higher education and an individual having high school education or below.

HA: A difference exists between earnings when comparing an individual having higher education and an individual having high school education or below.

## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_associate degree or above = 46183, mean_associate degree or above = 69931.17, sd_associate degree or above = 89542.75
## n_high school diploma or below = 51166, mean_high school diploma or below = 30467.68, sd_high school diploma or below = 34277.59
## Observed difference between means (associate degree or above-high school diploma or below) = 39463.49
## H0: mu_associate degree or above - mu_high school diploma or below = 0 
## HA: mu_associate degree or above - mu_high school diploma or below != 0 
## Standard error = 443.368 
## Test statistic: Z =  89.008 
## p-value =  0

## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_associate degree or above = 46183, mean_associate degree or above = 69931.17, sd_associate degree or above = 89542.75
## n_high school diploma or below = 51166, mean_high school diploma or below = 30467.68, sd_high school diploma or below = 34277.59

## Observed difference between means (associate degree or above-high school diploma or below) = 39463.49
## Standard error = 443.368 
## 95 % Confidence interval = ( 38594.5079 , 40332.4783 )

Based on the hypothesis and confidence interval tests above, we can reject the null hypothesis. There is a clear difference between earnings when comparing an individual having higher education and an individual having high school education or below.

Conclusion:

In conclusion, individuals residing in New York who have obtained higher education will earn more when compared to individuals who have a high school education or lower based on the data analysis shown above. I enjoyed working on this project as I was able to build on top of concepts learned and present analysis on this interesting data. As a next step, I would like to continue resarching on this topic and look at various fields/domains and see where this might be an exception perhaps. Also, looking at which careers individuals can earn more when comparing two factors: an individual’s experience vs his/her higher education attainment.

Data Sources and References:

  1. http://www.census.gov/programs-surveys/acs/data/pums.html

  2. https://www.nytimes.com/2018/05/16/opinion/college-useful-cost-jobs.html

  3. https://www.cbsnews.com/news/when-higher-education-doesnt-mean-higher-pay/

  4. http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests-confidence-intervals-and-confidence-levels

  5. https://www.chronicle.com/article/Yes-College-Is-Worth/243450