Jerome Smith
27th February 2016
Developing Data Products Course Project
Johns Hopkins University
Coursera
The figures show scatterplots of income versus years of schooling, having a professional degree or not, and work experience.
We observe a fairly ostensible positive correlation between income and education as well as work experience.
The application generates a linear regression model of income on education, having a professional degree, and work experience, from the NLSY97 data set. It uses the model to make the predictions.
The regression coefficients are the following:
(Intercept) S EDUCPROF EXP
-21.9146288 2.1988839 38.0030144 0.6480476
The p-values of the coefficients are low; hence the coefficient estimates are significant:
(Intercept) S EDUCPROF EXP
4.956802e-07 1.034823e-20 9.020279e-16 8.588284e-07
However, R squared is low: 0.29. This is because this model is very simple. In reality income depends on many more variables than just education and experience.
The application uses the model to calculate the predicted value of income, for the values of years of education, professional degree and experience that the user gives as input.
In addition, it shows the 80% confidence interval for the prediction.