Income Predictor Application

Jerome Smith
27th February 2016

Developing Data Products Course Project
Johns Hopkins University
Coursera

The importance of education in overcoming poverty

  • Extensive research has demonstrated the importance of education as a determinant of income, which hence guides public policies aimed at overcoming poverty.
  • Many econometric studies have used the United States Longitudinal Survey of Youth data set: NLSY79.
  • The survey was performed to high standards, and as a consequence the NLSY79 data set is regarded as of high quality.

Exploratory data analysis

The figures show scatterplots of income versus years of schooling, having a professional degree or not, and work experience.

We observe a fairly ostensible positive correlation between income and education as well as work experience.

Regression Coefficients and R Squared

The application generates a linear regression model of income on education, having a professional degree, and work experience, from the NLSY97 data set. It uses the model to make the predictions.

The regression coefficients are the following:

(Intercept)           S    EDUCPROF         EXP 
-21.9146288   2.1988839  38.0030144   0.6480476 

The p-values of the coefficients are low; hence the coefficient estimates are significant:

 (Intercept)            S     EDUCPROF          EXP 
4.956802e-07 1.034823e-20 9.020279e-16 8.588284e-07 

However, R squared is low: 0.29. This is because this model is very simple. In reality income depends on many more variables than just education and experience.

Application Interface

The application uses the model to calculate the predicted value of income, for the values of years of education, professional degree and experience that the user gives as input.

In addition, it shows the 80% confidence interval for the prediction.