Introduction:

In this analysis we are going to examine the relationship between education level and income. Are higher education levels related to higher incomes? An education beyond highschool can be extremely expensive and can take graduates years to repay once they enter the job market. An assurance of a higher salary after graduation can be a motivating factor in seeking to attain a higher education level before entering the job market. By looking at education levels and how they relate to income we can make the choice to continue education more clear as to the possibility of an increase in income that might come with that increase in education.

Data:

Data for this analysis comes from General Social Survey(GSS), a sociological survey that collects data on demographic characteristics and attitudes of people in the United states. Data was collected for the survey through computer-assisted personal interview, face-to-face interview, and telephone interview.

Data for this study was loaded into R with the following code:

load(url("http://bit.ly/dasi_gss_data"))

Data in the GSS include 57,061 cases and 114 variables which were collected between 1972-2012. Each case represents one person that participated in the survey.

Variables used from the GSS in this study are DEGREE and CONINC. DEGREE is an ordinal categorical variable with the levels LT HIGH SCHOOL, HIGH SCHOOL, JUNIOR COLLEGE, BACHELOR, GRADUATE, and represent the highest level of education attained by the individual. CONINC is a continuous numeric variable and represents the inflation adjusted family income of the individual.

Because data for this analysis was collected using a survey it is considered to be an observational study. Causal conclusions based on this study are not recommended and the conclusions made are only to show possible associations in the data.

Data for GSS was taken from individuals throughout the U.S. but generalizability of the results with the population of the U.S. might not be possible. Possible bias that could limit the generalizability of the survey are that many people took the survey over the telephone so the survey might only be generalizable to people with telephones. The GSS employs survey methods that produce a representative sample, and high quality data (Marsden 1994, p. 286). As a result, survey results are largely generalizable to the population of the United States.

Exploratory data analysis:

A subset of the GSS data set was taken consisting of the variables DEGREE and CONINC. Observations with missing values were removed from the subset and descriptive statistics were computed with the complete observations. Listed below is the code used to subset the data, remove the missing values, and create the summary statistics data set.

library(xtable)             #used to create formated html tables
library(plyr)               #used to subset data set
library(ggplot2)            #used for plots
library(scales)             #used for dollar format

#subset of main data set GSS
subset <- data.frame(gss$degree, gss$coninc)

#removal of missing values 
subset <- na.omit(subset)

#subset of discriptive statistics
summary <- ddply(subset, "gss.degree", summarise, N = length(gss.coninc),
                 MEAN = mean(gss.coninc), STD = sd(gss.coninc), 
                 MIN = min(gss.coninc), MAX = max(gss.coninc),
                 MEDIAN = median(gss.coninc), IQR = IQR(gss.coninc))

#renaming column of summay matrix
names(summary)[1] <- "DEGREE"

In the table 1 below you can see descriptive statistics for income by degree. You can see that average income increases as the degree increases with the lowest average income of $23,427.32 for Lt High School and the highest average income of $78,132.97 for Graduate degrees, standard deviation also increases with degree type.

#converting values to dollar format
summary <- cbind(summary[,1:2],
        apply(summary[c("MEAN","STD","MIN","MAX","MEDIAN","IQR")],2,dollar))

#creating and printing table of summary statistics
stat.table <- xtable(summary, caption = "Table 1")
print(stat.table, type = "html")
Table 1
DEGREE N MEAN STD MIN MAX MEDIAN IQR
1 Lt High School 10268 $25,439.47 $23,427.32 $383 $180,386 $18,519 $25,090
2 High School 26381 $42,025.70 $31,262.84 $383 $180,386 $35,471 $34,913
3 Junior College 2804 $50,036.01 $34,414.48 $383 $180,386 $42,130 $39,398
4 Bachelor 7374 $64,013.17 $41,480.80 $383 $180,386 $53,507 $49,748
5 Graduate 3566 $78,132.97 $44,760.45 $402 $180,386 $68,516 $68,504

In the boxplot below again you can see that each groups income increases as you move to the left of the boxplot. You can also see that each group has observations that spread from the top of the range of incomes to the bottom. Average income differences are smallest between the High School and Jr College levels. ANOVA can be used to determine if there is a significant difference in average income between the different levels of education.

#code for boxplot
g <- ggplot(subset, aes(x = gss.degree, y = gss.coninc, fill = gss.degree))
g <- g + geom_boxplot()
g <- g + xlab("Degree") + ylab("Dollars") + labs(title = "Income by Degree")
g <- g + guides(fill=guide_legend(title=NULL))
print(g)

plot of chunk unnamed-chunk-5

Inference:

ANOVA is the best choice for analysing our data because we have a categorical variable with five level, DEGREE, and a numerical variable, CONINC. With ANOVA our null hypothesis is that all levels of education have the same mean income and any observed difference between education levels is due to chance. While the alternate hypothesis is that at least one education level has a mean income that is different than the other mean incomes.

Next we check the conditions for ANOVA. First we want to make sure the the observations are independent between and within groups and in the survey. Since the survey is conducted at random there is no reason to believe that observations are not independent of each other and there is no pairing between groups. Each level of education also contains less than 10% of the respective population of the U.S. Distribution of the observations in each education level are also approximately normally distributed. From the boxplots you can see the variability is approximately the same across groups also.

Using the code below we can run the ANOVA analysis on the data and create a summary table.

#code for anova analysis
fit <- aov(gss.coninc ~ gss.degree, data = subset)

#code for anove table
anova.table <- xtable(fit, caption = "Table 2")
print(anova.table, type = "html")
Table 2
Df Sum Sq Mean Sq F value Pr(>F)
gss.degree 4 10812684151477.99 2703171037869.50 2496.17 0.0000
Residuals 50388 54566454176214.55 1082925581.02

In the summary in table 2 you can see the results of the ANOVA test. Our F value from the ANOVA is 2496.17 and our p-value is almost 0. Because of this we can reject the null hypothesis at the alpha = 0.05 level and conclude that at least one income level is different than the other income levels.

Next we can use pairwise t-tests and the Bonferroni correction to determine witch education levels are significantly different from each other. Below is the code used for the tests and the code that generates of results for the test in table 3.

#code for pairwise t-tests
ttests <- pairwise.t.test(subset$gss.coninc,subset$gss.degree,p.adj="bonferroni")

#table for pairwise t-test results
ttest.table <- xtable(ttests$p.value, caption="Table 3")
print(ttest.table, type = "html")
Table 3
Lt High School High School Junior College Bachelor
High School 0.00
Junior College 0.00 0.00
Bachelor 0.00 0.00 0.00
Graduate 0.00 0.00 0.00 0.00

Above are the results of the pairwise t-test in table 3. Shown in the table is a matrix of p-values for t-tests between each level of education. As you can see all p-values are near 0 so we can conclude that each level of education is significantly different from each other at the Bonferroni corrected level of 0.05/10 = 0.005.

For this analysis only ANOVA and pairwise t-tests are applicable.

Conclusion:

From the information collected in the GSS study and the results provided by the ANOVA and pairwise t-tests we can say that there is an association between education level and income level. Results suggest that as education levels increase so does income level. A question that might arise from this study is how much does income increase with different levels of education and does the increase in income from higher levels of education justify the increased cost of that education?

References

Data Citation:

Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1

Website:

http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/34802/version/1

GSS dataset Codebook:

https://d396qusza40orc.cloudfront.net/statistics%2Fproject%2Fgss1.html

Appendix

First 45 rows of the GSS dataset columns 12-27 which include the variables DEGREE and CONINC:

#code to print first 45 rows of the GSS data set
#includes columns 12-27 to include variables used in analysis
gss.table <- xtable(gss[1:45,12:27])
print(gss.table, type = "html")
degree vetyears sei wrkstat wrkslf marital spwrksta sibs childs agekdbrn incom16 born parborn granborn income06 coninc
1 Bachelor Working Fulltime Someone Else Never Married 3 0 Average 25926
2 Lt High School Retired Someone Else Married Keeping House 4 5 Above Average 33333
3 High School Working Parttime Someone Else Married Working Fulltime 5 4 Average 33333
4 Bachelor Working Fulltime Someone Else Married Working Fulltime 5 0 Average 41667
5 High School Keeping House Someone Else Married Temp Not Working 2 2 Below Average 69444
6 High School Working Fulltime Someone Else Never Married 1 0 Average 60185
7 High School Working Fulltime Someone Else Divorced 7 2 Above Average 50926
8 Bachelor Working Fulltime Someone Else Never Married 1 0 Average 18519
9 High School Working Parttime Someone Else Never Married 2 2 Average 3704
10 High School Working Fulltime Someone Else Married Working Fulltime 7 4 Far Below Average 25926
11 High School Keeping House Someone Else Married Working Fulltime 7 1 Below Average 18519
12 Lt High School Working Fulltime Someone Else Married Keeping House 6 5 Average 18519
13 Lt High School Working Fulltime Someone Else Married Working Fulltime 2 1 Below Average 18519
14 Lt High School Working Fulltime Someone Else Divorced 2 2 Far Below Average 18519
15 Lt High School Working Fulltime Someone Else Married Working Fulltime 0 5 Average 25926
16 High School Working Fulltime Someone Else Married Working Parttime 7 2 Far Below Average 18519
17 High School School Someone Else Married Working Parttime 0 2 Average 33333
18 Lt High School Keeping House Someone Else Married Working Fulltime 2 3 Average 25926
19 Bachelor Working Fulltime Someone Else Married Working Fulltime 2 3 Above Average 60185
20 High School School Never Married 7 0 Above Average 69444
21 High School Working Fulltime Self-Employed Married Working Fulltime 2 2 Above Average 50926
22 High School Working Fulltime Someone Else Married Keeping House 1 2 Average 83333
23 High School Working Fulltime Someone Else Married Working Fulltime 1 0 Average 18519
24 High School Keeping House Someone Else Married Working Fulltime 7 1 Average 25926
25 Bachelor Keeping House Someone Else Married Retired 2 0 Average 41667
26 High School Working Fulltime Someone Else Married Temp Not Working 7 2 Average 41667
27 High School Working Fulltime Someone Else Married Working Fulltime 5 2 Average 41667
28 High School Working Fulltime Someone Else Married Working Fulltime 4 2 Average 41667
29 High School Working Fulltime Someone Else Married Working Fulltime 2 0 Below Average
30 Lt High School Working Fulltime Someone Else Married Working Parttime 4 2 Average 41667
31 Lt High School Working Fulltime Someone Else Married Keeping House 6 2 Average 33333
32 High School Working Parttime Someone Else Married Working Parttime 2 1 Average 33333
33 Bachelor Unempl, Laid Off Self-Employed Never Married 1 0 Average 41667
34 Lt High School Keeping House Someone Else Married Retired 7 5 Average 3704
35 High School Temp Not Working Someone Else Married Working Parttime 7 1 Average 18519
36 High School Working Fulltime Someone Else Married Keeping House 7 2 Average 41667
37 High School Keeping House Someone Else Married Working Fulltime 3 2 Below Average 69444
38 Lt High School Keeping House Married Working Fulltime 5 1 Average 41667
39 Lt High School Working Parttime Self-Employed Never Married 4 0 Average 25926
40 High School Working Fulltime Someone Else Married Unempl, Laid Off 5 2 Above Average 18519
41 Lt High School Keeping House Widowed 0 2 Average 3704
42 High School Working Fulltime Self-Employed Married Working Fulltime 2 2 Average 41667
43 Lt High School Working Fulltime Someone Else Married Keeping House 4 2 Average 18519
44 Lt High School Keeping House Someone Else Married Working Fulltime 7 0 Average 25926
45 Lt High School Keeping House Someone Else Married Working Fulltime 7 4 Average 101852