Prestige Prediction App.

Noha Elprince
Nov 25, 2014

Introduction

The Prestige dataset used in my application has been taken from the library ‘car’ in R. This dataset has 102 observations and 6 attributes. The observations are the Canadian occupations.

Out of the 6 attributes, 4 attributes have been selected namely :

  • ‘type’ representing the type of occupation.
  • ‘education’ representing average education of occupational incumbents
  • ‘income’ which represents average income of incumbents
  • ‘prestige’ represents the prestige score for occupation.

Quick Exploratory Analysis

plot of chunk unnamed-chunk-1

  • We may conclude: There is a linear relationship between predictors and the desired outcome.

Objective and methodology

  • Predict prestige that represents the score for occupation given the predictors: ‘type’ , ‘education’ and 'income'

  • Fit a linear regression model for prediction due to the linear nature of predictors with respect to the desired outcome.

  • Calculate the 95% Confidence Interval for each fitted value.

Diagnostics

summary(fit)

Call:
lm(formula = prestige ~ as.numeric(education) + as.numeric(income) + 
    as.factor(type), data = trainingdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.0956  -4.6313   0.1798   4.6625  17.9948 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            2.4040974  6.7085817   0.358 0.721199    
as.numeric(education)  3.2518354  0.8173297   3.979 0.000173 ***
as.numeric(income)     0.0010817  0.0002518   4.295 5.77e-05 ***
as.factor(type)prof    7.8350319  4.8904655   1.602 0.113837    
as.factor(type)wc     -0.0928479  3.3078956  -0.028 0.977691    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.55 on 67 degrees of freedom
Multiple R-squared:  0.8164,    Adjusted R-squared:  0.8055 
F-statistic:  74.5 on 4 and 67 DF,  p-value: < 2.2e-16

Residual Analysis

plot(fit,1,pch=19,cex=0.5,col="#00000010")

plot of chunk unnamed-chunk-5