Introduction:
After reading a recent article and watching a debate on the topic, I chose to look at the correlation between the pursuit of continuing higher education and salary of an individual. It is true that our family members, close friends, and colleagues will only encourage to pursue higher education for the greater benefits of obtaning better future career options. However, there is a push from many successful individuals who have decided to drop out or create a hybrid type of career without pursuing higher educaiton. My research question is as follows: What significance does higher education play in an individual’s payscale? I am hoping to obtain some analysis using data to come up with an answer and to continue this vital conversation.
Data:
I have used the data from the US Census Bureau. This is an observational study and includes data from individuals residing in NY in the year 2010. From the data, both “Total Personal Income” and “Total Personal Earnings” are the response variables and are numerical. The explanatory variables are “Worker Class” (categorical), “Sex” (categorical), “Citizenship Status” (categorical), “Age” (numerical discrete), “Educational Attainment” (categorical), “School Attending” (categorical). I wanted to specifically look at this data to see how one’s education plays a role in how much they earn while residing in New York. The data contains 279 variables and 188,767 cases. As a next step, I continued to subset the data with these variables.
## Age Citizenship_Status Educational_Attainment SEX
## 1 79 4 15 1
## 2 75 3 1 2
## 3 68 4 22 2
## 4 68 1 22 2
## 5 69 1 23 1
## 6 46 4 16 2
## Total_Personal_Earnings Total_Personal_Income
## 1 0 9300
## 2 0 3600
## 3 0 3800
## 4 0 84200
## 5 0 92500
## 6 3800 3800
| Age | Citizen_Stat | Edu_Attainment | SEX | Total_Per_Earn | Total_Per_Income | |
|---|---|---|---|---|---|---|
| Min. : 0.00 | Min. :1.000 | Min. : 1.00 | Min. :1.000 | Min. : -7400 | Min. : -13200 | |
| 1st Qu.:20.00 | 1st Qu.:1.000 | 1st Qu.:13.00 | 1st Qu.:1.000 | 1st Qu.: 0 | 1st Qu.: 7000 | |
| Median :41.00 | Median :1.000 | Median :17.00 | Median :2.000 | Median : 12000 | Median : 22400 | |
| Mean :40.04 | Mean :1.632 | Mean :15.88 | Mean :1.521 | Mean : 31315 | Mean : 38837 | |
| 3rd Qu.:58.00 | 3rd Qu.:1.000 | 3rd Qu.:20.00 | 3rd Qu.:2.000 | 3rd Qu.: 42100 | 3rd Qu.: 50000 | |
| Max. :94.00 | Max. :5.000 | Max. :24.00 | Max. :2.000 | Max. :957000 | Max. :1225000 | |
| NA | NA | NA’s :6108 | NA | NA’s :35877 | NA’s :33251 |
Lets see what relationships hold if any, between an individual’s education and how much they can earn residing in New York.
Exploratory Data Analysis:
Let’s first see the summary for the chosen variables as shown below:
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Citizen_Stat | 1 | 97349 | 1.75 | 1.42 | 1 | 1.45 | 0.00 | 1 | 5 | 4 | 1.46 | 0.32 | 0.00 |
| Edu_Attainment | 2 | 97349 | 18.60 | 3.34 | 19 | 18.93 | 2.97 | 1 | 24 | 23 | -1.58 | 4.98 | 0.01 |
| SEX | 3 | 97349 | 1.49 | 0.50 | 1 | 1.49 | 0.00 | 1 | 2 | 1 | 0.02 | -2.00 | 0.00 |
| Total_Per_Earn | 4 | 97349 | 49189.42 | 69351.07 | 33000 | 37253.27 | 31134.60 | 500 | 957000 | 956500 | 5.35 | 39.03 | 222.27 |
Next, let’s visualize this data using histograms for both education and earnings.
In the visualizations above, one can see that the Education graph is certainly right skewed while the Earning graph is left skewed. We can further explore both variables with the following table by breaking down the level of education an individual has attained while residing in New York for this data.
| group1 | vars | n | mean | sd | median | min | max | se |
|---|---|---|---|---|---|---|---|---|
| a. no diploma | 1 | 9362 | 21280.27 | 27367.55 | 15400 | 500 | 591000 | 282.85 |
| b. HS diploma or equivalent | 1 | 23512 | 30863.62 | 31400.45 | 25000 | 500 | 957000 | 204.78 |
| c. some college, no degree | 1 | 19485 | 33868.34 | 38915.51 | 25000 | 500 | 691000 | 278.79 |
| d. associate/bachelor degree | 1 | 29547 | 57898.03 | 70645.20 | 43000 | 500 | 957000 | 410.98 |
| e. higher than bachelor degree | 1 | 15443 | 96678.85 | 114998.47 | 65000 | 500 | 957000 | 925.39 |
Inference:
In order to perform a null hypothesis it was important to establish that the sample size is greater than 30. Also, it was established that the individuals were randomly chosen and could be of either two categories- having high school level or lower education or associate degree and above.
Let’s perform null hypothesis test and see our results.
H0: No difference exists in earnings when comparing an individual having higher education and an individual having high school education or below.
HA: A difference exists between earnings when comparing an individual having higher education and an individual having high school education or below.
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_associate degree or above = 46183, mean_associate degree or above = 69931.17, sd_associate degree or above = 89542.75
## n_high school diploma or below = 51166, mean_high school diploma or below = 30467.68, sd_high school diploma or below = 34277.59
## Observed difference between means (associate degree or above-high school diploma or below) = 39463.49
## H0: mu_associate degree or above - mu_high school diploma or below = 0
## HA: mu_associate degree or above - mu_high school diploma or below != 0
## Standard error = 443.368
## Test statistic: Z = 89.008
## p-value = 0
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_associate degree or above = 46183, mean_associate degree or above = 69931.17, sd_associate degree or above = 89542.75
## n_high school diploma or below = 51166, mean_high school diploma or below = 30467.68, sd_high school diploma or below = 34277.59
## Observed difference between means (associate degree or above-high school diploma or below) = 39463.49
## Standard error = 443.368
## 95 % Confidence interval = ( 38594.5079 , 40332.4783 )
Based on the hypothesis and confidence interval tests above, we can reject the null hypothesis. There is a clear difference between earnings when comparing an individual having higher education and an individual having high school education or below.
Conclusion:
In conclusion, individuals residing in New York who have obtained higher education will earn more when compared to individuals who have a high school education or lower based on the data analysis shown above. I enjoyed working on this project as I was able to build on top of concepts learned and present analysis on this interesting data. As a next step, I would like to continue resarching on this topic and look at various fields/domains and see where this might be an exception perhaps. Also, looking at which careers individuals can earn more when comparing two factors: an individual’s experience vs his/her higher education attainment.