In this analysis we are going to examine the relationship between education level and income. Are higher education levels related to higher incomes? An education beyond highschool can be extremely expensive and can take graduates years to repay once they enter the job market. An assurance of a higher salary after graduation can be a motivating factor in seeking to attain a higher education level before entering the job market. By looking at education levels and how they relate to income we can make the choice to continue education more clear as to the possibility of an increase in income that might come with that increase in education.
Data for this analysis comes from General Social Survey(GSS), a sociological survey that collects data on demographic characteristics and attitudes of people in the United states. Data was collected for the survey through computer-assisted personal interview, face-to-face interview, and telephone interview.
Data for this study was loaded into R with the following code:
load(url("http://bit.ly/dasi_gss_data"))
Data in the GSS include 57,061 cases and 114 variables which were collected between 1972-2012. Each case represents one person that participated in the survey.
Variables used from the GSS in this study are DEGREE and CONINC. DEGREE is an ordinal categorical variable with the levels LT HIGH SCHOOL, HIGH SCHOOL, JUNIOR COLLEGE, BACHELOR, GRADUATE, and represent the highest level of education attained by the individual. CONINC is a continuous numeric variable and represents the inflation adjusted family income of the individual.
Because data for this analysis was collected using a survey it is considered to be an observational study. Causal conclusions based on this study are not recommended and the conclusions made are only to show possible associations in the data.
Data for GSS was taken from individuals throughout the U.S. but generalizability of the results with the population of the U.S. might not be possible. Possible bias that could limit the generalizability of the survey are that many people took the survey over the telephone so the survey might only be generalizable to people with telephones. The GSS employs survey methods that produce a representative sample, and high quality data (Marsden 1994, p. 286). As a result, survey results are largely generalizable to the population of the United States.
A subset of the GSS data set was taken consisting of the variables DEGREE and CONINC. Observations with missing values were removed from the subset and descriptive statistics were computed with the complete observations. Listed below is the code used to subset the data, remove the missing values, and create the summary statistics data set.
library(xtable) #used to create formated html tables
library(plyr) #used to subset data set
library(ggplot2) #used for plots
library(scales) #used for dollar format
#subset of main data set GSS
subset <- data.frame(gss$degree, gss$coninc)
#removal of missing values
subset <- na.omit(subset)
#subset of discriptive statistics
summary <- ddply(subset, "gss.degree", summarise, N = length(gss.coninc),
MEAN = mean(gss.coninc), STD = sd(gss.coninc),
MIN = min(gss.coninc), MAX = max(gss.coninc),
MEDIAN = median(gss.coninc), IQR = IQR(gss.coninc))
#renaming column of summay matrix
names(summary)[1] <- "DEGREE"
In the table 1 below you can see descriptive statistics for income by degree. You can see that average income increases as the degree increases with the lowest average income of $23,427.32 for Lt High School and the highest average income of $78,132.97 for Graduate degrees, standard deviation also increases with degree type.
#converting values to dollar format
summary <- cbind(summary[,1:2],
apply(summary[c("MEAN","STD","MIN","MAX","MEDIAN","IQR")],2,dollar))
#creating and printing table of summary statistics
stat.table <- xtable(summary, caption = "Table 1")
print(stat.table, type = "html")
| DEGREE | N | MEAN | STD | MIN | MAX | MEDIAN | IQR | |
|---|---|---|---|---|---|---|---|---|
| 1 | Lt High School | 10268 | $25,439.47 | $23,427.32 | $383 | $180,386 | $18,519 | $25,090 |
| 2 | High School | 26381 | $42,025.70 | $31,262.84 | $383 | $180,386 | $35,471 | $34,913 |
| 3 | Junior College | 2804 | $50,036.01 | $34,414.48 | $383 | $180,386 | $42,130 | $39,398 |
| 4 | Bachelor | 7374 | $64,013.17 | $41,480.80 | $383 | $180,386 | $53,507 | $49,748 |
| 5 | Graduate | 3566 | $78,132.97 | $44,760.45 | $402 | $180,386 | $68,516 | $68,504 |
In the boxplot below again you can see that each groups income increases as you move to the left of the boxplot. You can also see that each group has observations that spread from the top of the range of incomes to the bottom. Average income differences are smallest between the High School and Jr College levels. ANOVA can be used to determine if there is a significant difference in average income between the different levels of education.
#code for boxplot
g <- ggplot(subset, aes(x = gss.degree, y = gss.coninc, fill = gss.degree))
g <- g + geom_boxplot()
g <- g + xlab("Degree") + ylab("Dollars") + labs(title = "Income by Degree")
g <- g + guides(fill=guide_legend(title=NULL))
print(g)
ANOVA is the best choice for analysing our data because we have a categorical variable with five level, DEGREE, and a numerical variable, CONINC. With ANOVA our null hypothesis is that all levels of education have the same mean income and any observed difference between education levels is due to chance. While the alternate hypothesis is that at least one education level has a mean income that is different than the other mean incomes.
Next we check the conditions for ANOVA. First we want to make sure the the observations are independent between and within groups and in the survey. Since the survey is conducted at random there is no reason to believe that observations are not independent of each other and there is no pairing between groups. Each level of education also contains less than 10% of the respective population of the U.S. Distribution of the observations in each education level are also approximately normally distributed. From the boxplots you can see the variability is approximately the same across groups also.
Using the code below we can run the ANOVA analysis on the data and create a summary table.
#code for anova analysis
fit <- aov(gss.coninc ~ gss.degree, data = subset)
#code for anove table
anova.table <- xtable(fit, caption = "Table 2")
print(anova.table, type = "html")
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gss.degree | 4 | 10812684151477.99 | 2703171037869.50 | 2496.17 | 0.0000 |
| Residuals | 50388 | 54566454176214.55 | 1082925581.02 |
In the summary in table 2 you can see the results of the ANOVA test. Our F value from the ANOVA is 2496.17 and our p-value is almost 0. Because of this we can reject the null hypothesis at the alpha = 0.05 level and conclude that at least one income level is different than the other income levels.
Next we can use pairwise t-tests and the Bonferroni correction to determine witch education levels are significantly different from each other. Below is the code used for the tests and the code that generates of results for the test in table 3.
#code for pairwise t-tests
ttests <- pairwise.t.test(subset$gss.coninc,subset$gss.degree,p.adj="bonferroni")
#table for pairwise t-test results
ttest.table <- xtable(ttests$p.value, caption="Table 3")
print(ttest.table, type = "html")
| Lt High School | High School | Junior College | Bachelor | |
|---|---|---|---|---|
| High School | 0.00 | |||
| Junior College | 0.00 | 0.00 | ||
| Bachelor | 0.00 | 0.00 | 0.00 | |
| Graduate | 0.00 | 0.00 | 0.00 | 0.00 |
Above are the results of the pairwise t-test in table 3. Shown in the table is a matrix of p-values for t-tests between each level of education. As you can see all p-values are near 0 so we can conclude that each level of education is significantly different from each other at the Bonferroni corrected level of 0.05/10 = 0.005.
For this analysis only ANOVA and pairwise t-tests are applicable.
From the information collected in the GSS study and the results provided by the ANOVA and pairwise t-tests we can say that there is an association between education level and income level. Results suggest that as education levels increase so does income level. A question that might arise from this study is how much does income increase with different levels of education and does the increase in income from higher levels of education justify the increased cost of that education?
Data Citation:
Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1
Website:
http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/34802/version/1
GSS dataset Codebook:
https://d396qusza40orc.cloudfront.net/statistics%2Fproject%2Fgss1.html
First 45 rows of the GSS dataset columns 12-27 which include the variables DEGREE and CONINC:
#code to print first 45 rows of the GSS data set
#includes columns 12-27 to include variables used in analysis
gss.table <- xtable(gss[1:45,12:27])
print(gss.table, type = "html")
| degree | vetyears | sei | wrkstat | wrkslf | marital | spwrksta | sibs | childs | agekdbrn | incom16 | born | parborn | granborn | income06 | coninc | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bachelor | Working Fulltime | Someone Else | Never Married | 3 | 0 | Average | 25926 | ||||||||
| 2 | Lt High School | Retired | Someone Else | Married | Keeping House | 4 | 5 | Above Average | 33333 | |||||||
| 3 | High School | Working Parttime | Someone Else | Married | Working Fulltime | 5 | 4 | Average | 33333 | |||||||
| 4 | Bachelor | Working Fulltime | Someone Else | Married | Working Fulltime | 5 | 0 | Average | 41667 | |||||||
| 5 | High School | Keeping House | Someone Else | Married | Temp Not Working | 2 | 2 | Below Average | 69444 | |||||||
| 6 | High School | Working Fulltime | Someone Else | Never Married | 1 | 0 | Average | 60185 | ||||||||
| 7 | High School | Working Fulltime | Someone Else | Divorced | 7 | 2 | Above Average | 50926 | ||||||||
| 8 | Bachelor | Working Fulltime | Someone Else | Never Married | 1 | 0 | Average | 18519 | ||||||||
| 9 | High School | Working Parttime | Someone Else | Never Married | 2 | 2 | Average | 3704 | ||||||||
| 10 | High School | Working Fulltime | Someone Else | Married | Working Fulltime | 7 | 4 | Far Below Average | 25926 | |||||||
| 11 | High School | Keeping House | Someone Else | Married | Working Fulltime | 7 | 1 | Below Average | 18519 | |||||||
| 12 | Lt High School | Working Fulltime | Someone Else | Married | Keeping House | 6 | 5 | Average | 18519 | |||||||
| 13 | Lt High School | Working Fulltime | Someone Else | Married | Working Fulltime | 2 | 1 | Below Average | 18519 | |||||||
| 14 | Lt High School | Working Fulltime | Someone Else | Divorced | 2 | 2 | Far Below Average | 18519 | ||||||||
| 15 | Lt High School | Working Fulltime | Someone Else | Married | Working Fulltime | 0 | 5 | Average | 25926 | |||||||
| 16 | High School | Working Fulltime | Someone Else | Married | Working Parttime | 7 | 2 | Far Below Average | 18519 | |||||||
| 17 | High School | School | Someone Else | Married | Working Parttime | 0 | 2 | Average | 33333 | |||||||
| 18 | Lt High School | Keeping House | Someone Else | Married | Working Fulltime | 2 | 3 | Average | 25926 | |||||||
| 19 | Bachelor | Working Fulltime | Someone Else | Married | Working Fulltime | 2 | 3 | Above Average | 60185 | |||||||
| 20 | High School | School | Never Married | 7 | 0 | Above Average | 69444 | |||||||||
| 21 | High School | Working Fulltime | Self-Employed | Married | Working Fulltime | 2 | 2 | Above Average | 50926 | |||||||
| 22 | High School | Working Fulltime | Someone Else | Married | Keeping House | 1 | 2 | Average | 83333 | |||||||
| 23 | High School | Working Fulltime | Someone Else | Married | Working Fulltime | 1 | 0 | Average | 18519 | |||||||
| 24 | High School | Keeping House | Someone Else | Married | Working Fulltime | 7 | 1 | Average | 25926 | |||||||
| 25 | Bachelor | Keeping House | Someone Else | Married | Retired | 2 | 0 | Average | 41667 | |||||||
| 26 | High School | Working Fulltime | Someone Else | Married | Temp Not Working | 7 | 2 | Average | 41667 | |||||||
| 27 | High School | Working Fulltime | Someone Else | Married | Working Fulltime | 5 | 2 | Average | 41667 | |||||||
| 28 | High School | Working Fulltime | Someone Else | Married | Working Fulltime | 4 | 2 | Average | 41667 | |||||||
| 29 | High School | Working Fulltime | Someone Else | Married | Working Fulltime | 2 | 0 | Below Average | ||||||||
| 30 | Lt High School | Working Fulltime | Someone Else | Married | Working Parttime | 4 | 2 | Average | 41667 | |||||||
| 31 | Lt High School | Working Fulltime | Someone Else | Married | Keeping House | 6 | 2 | Average | 33333 | |||||||
| 32 | High School | Working Parttime | Someone Else | Married | Working Parttime | 2 | 1 | Average | 33333 | |||||||
| 33 | Bachelor | Unempl, Laid Off | Self-Employed | Never Married | 1 | 0 | Average | 41667 | ||||||||
| 34 | Lt High School | Keeping House | Someone Else | Married | Retired | 7 | 5 | Average | 3704 | |||||||
| 35 | High School | Temp Not Working | Someone Else | Married | Working Parttime | 7 | 1 | Average | 18519 | |||||||
| 36 | High School | Working Fulltime | Someone Else | Married | Keeping House | 7 | 2 | Average | 41667 | |||||||
| 37 | High School | Keeping House | Someone Else | Married | Working Fulltime | 3 | 2 | Below Average | 69444 | |||||||
| 38 | Lt High School | Keeping House | Married | Working Fulltime | 5 | 1 | Average | 41667 | ||||||||
| 39 | Lt High School | Working Parttime | Self-Employed | Never Married | 4 | 0 | Average | 25926 | ||||||||
| 40 | High School | Working Fulltime | Someone Else | Married | Unempl, Laid Off | 5 | 2 | Above Average | 18519 | |||||||
| 41 | Lt High School | Keeping House | Widowed | 0 | 2 | Average | 3704 | |||||||||
| 42 | High School | Working Fulltime | Self-Employed | Married | Working Fulltime | 2 | 2 | Average | 41667 | |||||||
| 43 | Lt High School | Working Fulltime | Someone Else | Married | Keeping House | 4 | 2 | Average | 18519 | |||||||
| 44 | Lt High School | Keeping House | Someone Else | Married | Working Fulltime | 7 | 0 | Average | 25926 | |||||||
| 45 | Lt High School | Keeping House | Someone Else | Married | Working Fulltime | 7 | 4 | Average | 101852 |