1. Introduction

The Prestige.txt consists of 102 observations with 6 variables.The description of the variables are in the data set are as follows:

education: The average number of years of education for occupational incumbents.
income: The average income of occupational incumbents, in dollars.
women: The percentage of women in the occupation.
prestige:The average prestige rating for the occupation.
census: The code of the occupation used in the survey.
type: Professional and managerial(prof), white collar(wc), blue collar(bc), or missing(NA)(Fox and Weisberg 2011)

To find out how prestige rating is related to income, education, and women multiple linear regression is performed.

library(car)
library(stargazer)

Prestige <-read.table("http://socserv.socsci.mcmaster.ca/jfox/books/Companion/data/Prestige.txt", header=TRUE)
scatterplotMatrix(~ prestige + income +education + women, span =0.7, data = Prestige)

The first step starts from the scatter plots among variables. The above scatter plot of the dependent variable prestige and the predictor income shows a nonlinear shape of data points. Instead of using income directly, log of income with base 2 is used to transform the curvature shape of income data. As a result, the scatter plot between prestiage and log2(income) shows no curvature shape below. It is a good place to start for linear regression modeling.(Fox and Weisberg 2011)

scatterplotMatrix(~ prestige + log2(income) +education + women, span =0.7, data = Prestige)

2. Regression Output and Interpretation

prestige.mod1 <- lm(prestige ~ education + log2(income) + women, data= Prestige)

summary(prestige.mod1)
## 
## Call:
## lm(formula = prestige ~ education + log2(income) + women, data = Prestige)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.364  -4.429  -0.101   4.316  19.179 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -110.9658    14.8429  -7.476 3.27e-11 ***
## education       3.7305     0.3544  10.527  < 2e-16 ***
## log2(income)    9.3147     1.3265   7.022 2.90e-10 ***
## women           0.0469     0.0299   1.568     0.12    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.093 on 98 degrees of freedom
## Multiple R-squared:  0.8351, Adjusted R-squared:   0.83 
## F-statistic: 165.4 on 3 and 98 DF,  p-value: < 2.2e-16

From the avobe output, b1=3.7305 implies that prestige rating will be expected to increase by 3.7305 units for an additonal year of education. In addition, the hypothesis that the prestige rating is linearly related to the education level with other predictors being constant is

Ho : b1 is equal to 0 (no linear relationship)
Ha : b1 is not equal to 0 (significant linear relationship)

So, for the test statistic t = 10.527 and p-value for the test statistic(t=10.527) is less than 2*10^(-16). Which means that the probability of getting test statistic 10.527 by chance under the assumption of b1 =0 is extremely rare. So we reject the null hypothesis b1=0 and it shows the evidence of a positive linear relationship between education level and prestiage rating level.

Similary, increasing 1 unit of log2(income) that corresponds to doubling income, holding the other predictors constant, will expect to increase prestige rating by 9.3147 units.

The hypothesis that the prestige rating is linearly related to the log2(income) level with other predictors being constant is

Ho : b2 is equal to 0 (no linear relationship)
Ha : b2 is not equal to 0 (significant linear relationship)

for the test statistic t =7.022 and p-value for the test statistic(7.022) is less than 2.9 *10^(-10). Which means that the probability of getting test statistic 7.022 by chance under the assumption of b2 =0 is extremely rare. So we reject the null hypothesis b2=0 and it shows the evidence of a positive linear relationship between education level and prestiage rating level.

However, women has coefficient b3 = 0.0469 and t value =1.568. P-value of getting t-statistic 1.568 is 0.12 that is greater than alpha=0.05 level. Which implies that there is no significant linear relationship between the percentage of women in the occupation and the prestage rating with all other regressors being constant. Therefore, a new regression model without women also can be considered for the analysis.

Multiple R-squared value is 0.835 which implies that approximately 83.5 % of the vairiability of the dependant variable is explained by the fitted regression line. So the weighted combination of the 3 predictor variables explained approximately 83.5% of the variance of the dependent variable.

F-statistic is 165 for the null hypothesis Ho:b1=b2=b3=0 (no linear relationship between the predictors and the response variable) the alternative hypothesis Ha:any coefficient is not eqeal to 0 (at least one predictor has significant linear relationship with the response variable) and p-value is less than 2.2*10^(-16). Which implies that we are able to reject the null hypothesis at 1% level of significance. In other words, at least one of the coefficients of these variables is significantly different from 0.

Next, the second regression model without women variable is calculated. Only 2 predictors, eucation and log2(income), are used for regression.
Then the below Table shows the comparison of 2 regression models using stargarzer package.(Hlavac 2014)
As we can expect, there is no significant difference between the regression model1 with women variable and the regression model2 without women variable although y-intercept and slopes of education and log2(inocome) have changed slightly from model1 to model2.

Conclusively, the multiple regression reveals that prestige rating is expected to be increasing as the level of education and income increases. In other words, prestige rating has a positve linear relationship with the predictors, education and income.

prestige.mod2 <- lm(prestige ~ education + log2(income), data= Prestige)

summary(prestige.mod2)
## 
## Call:
## lm(formula = prestige ~ education + log2(income), data = Prestige)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.0346  -4.5657  -0.1857   4.0577  18.1270 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -95.1940    10.9979  -8.656 9.27e-14 ***
## education      4.0020     0.3115  12.846  < 2e-16 ***
## log2(income)   7.9278     0.9961   7.959 2.94e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.145 on 99 degrees of freedom
## Multiple R-squared:  0.831,  Adjusted R-squared:  0.8275 
## F-statistic: 243.3 on 2 and 99 DF,  p-value: < 2.2e-16
stargazer(prestige.mod1,prestige.mod2,title="Comparison of 2 Regression outputs",type="text",align=TRUE)
## 
## Comparison of 2 Regression outputs
## ===================================================================
##                                   Dependent variable:              
##                     -----------------------------------------------
##                                        prestige                    
##                               (1)                     (2)          
## -------------------------------------------------------------------
## education                  3.731***                4.002***        
##                             (0.354)                 (0.312)        
##                                                                    
## log2(income)               9.315***                7.928***        
##                             (1.327)                 (0.996)        
##                                                                    
## women                        0.047                                 
##                             (0.030)                                
##                                                                    
## Constant                  -110.966***             -95.194***       
##                            (14.843)                (10.998)        
##                                                                    
## -------------------------------------------------------------------
## Observations                  102                     102          
## R2                           0.835                   0.831         
## Adjusted R2                  0.830                   0.828         
## Residual Std. Error     7.093 (df = 98)         7.145 (df = 99)    
## F Statistic         165.428*** (df = 3; 98) 243.323*** (df = 2; 99)
## ===================================================================
## Note:                                   *p<0.1; **p<0.05; ***p<0.01

References

Fox, John, and Harvey Sanford Weisberg. 2011. An R Companion to Applied Regression. 2nd ed. Thousand Oaks, CA: Sage Publications.

Hlavac, Marek(2014). 2014. stargazer:LaTex Code and ASCII Text for Well-Formatted Regression and Summary Staistics Tables. http://CRAN.R-project.org/package=stargazer.