Introduction:

Multiple linear Regression is a useful approach for predicting a response based on more than one predictor variables. Here, we will learn how to build a mulitple linear regression model and how to visualise the results.

Occupational prestige is a way for sociologists to describe the relative social class positions people have. It refers to the consensual nature of rating a job based on the belief of its worthiness. But what really determines the prestige score? Education? Income? Let’s find out.

Dataset

The Prestige of Canadian Occupations data “prestige” is obtained from the “car” package.

Definition for these variables:

education= Average education (years) of occupational incumbents, in 1971. income = Average income (dollars) of incumbents, 1971. women= Percentage of incumbents who are women, 1971. prestige = Pineo-Porter prestige score for occupation, from a social survey conducted in the mid-1960s. census= Canadian Census occupational code. type= Type of occupation. A factor with levels (note: out of order): bc, Blue Collar; prof, Professional, Managerial, and Technical; wc, White Collar.

Loading the data

First you will need to install the “car” package

To install the package

install.package(“car”)

library(car)

## Warning: package 'car' was built under R version 3.4.4

## Loading required package: carData

## Warning: package 'carData' was built under R version 3.4.4

#load the data "Prestige" as p
p<-data.frame(Prestige)


# Explore the data
# look at the structure of the dataset
str(p)

## 'data.frame':    102 obs. of  6 variables:
##  $ education: num  13.1 12.3 12.8 11.4 14.6 ...
##  $ income   : int  12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
##  $ women    : num  11.16 4.02 15.7 9.11 11.68 ...
##  $ prestige : num  68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
##  $ census   : int  1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
##  $ type     : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...

By exploring the data, we can see there are 102 observations and 6 variables. The Variables are’education, income, women, prestige, census and type.

‘data.frame’: 102 obs. of 6 variables: $ education: num 13.1 12.3 12.8 11.4 14.6 … $ income : int 12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 … $ women : num 11.16 4.02 15.7 9.11 11.68 … $ prestige : num 68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 … $ census : int 1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 … $ type : Factor w/ 3 levels “bc”,“prof”,“wc”: 2 2 2 2 2 2 2 2 2 2 …

summary(p)

##    education          income          women           prestige    
##  Min.   : 6.380   Min.   :  611   Min.   : 0.000   Min.   :14.80  
##  1st Qu.: 8.445   1st Qu.: 4106   1st Qu.: 3.592   1st Qu.:35.23  
##  Median :10.540   Median : 5930   Median :13.600   Median :43.60  
##  Mean   :10.738   Mean   : 6798   Mean   :28.979   Mean   :46.83  
##  3rd Qu.:12.648   3rd Qu.: 8187   3rd Qu.:52.203   3rd Qu.:59.27  
##  Max.   :15.970   Max.   :25879   Max.   :97.510   Max.   :87.20  
##      census       type   
##  Min.   :1113   bc  :44  
##  1st Qu.:3120   prof:31  
##  Median :5135   wc  :23  
##  Mean   :5402   NA's: 4  
##  3rd Qu.:8312            
##  Max.   :9517

summary() function provides a summary on each variable. Here, it will give us some insgihts of how dirty the data is and we will have clean the data if it is dirty. For example if the maximun value in education is 200 or 200 years old or having a negative value of -100, this may suggest that there maybe some invalid values in education as it doesnt make any sense for someone to study for 200 years or negative 100 years. In this case, the data seemes to look normal.

In this mulitple linear regression example, we will examine which of the 6 variables are good predictors of predicting the prestige score.

Before we do that, lets examine their correlation, by using the cor() function

#check correlation between education and income

cor(p$education,p$prestige)

## [1] 0.8501769

cor(p$income, p$prestige)

## [1] 0.7149057

cor(p$women, p$prestige)

## [1] -0.1183342

cor(p$census, p$prestige)

## [1] -0.6345103

Based on the above results, we can see that both income and education have a high postive correlation with the prestige score. The census is only a Canadian Census occupational code in numeric form, it is funny to see that there is a strong negative correlation with the prestiage code which i would assume there should be none. “Women” represents the Percentage of incumbents who are women, so this tells there is little relationship between Percentage of incumbents who are women and prestige score. As the values for “types” are not in numeric form, we cannot calculate the correlation.

Lets visualise these relationships. We can do this by using the ggplot in tidyverse package

#Create Scatter plot
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 3.4.4

## -- Attaching packages ---------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.4
## v tidyr   0.8.0     v stringr 1.3.0
## v readr   1.1.1     v forcats 0.3.0

## Warning: package 'ggplot2' was built under R version 3.4.4

## Warning: package 'tibble' was built under R version 3.4.4

## Warning: package 'tidyr' was built under R version 3.4.4

## Warning: package 'readr' was built under R version 3.4.4

## Warning: package 'purrr' was built under R version 3.4.4

## Warning: package 'stringr' was built under R version 3.4.4

## Warning: package 'forcats' was built under R version 3.4.4

## -- Conflicts ------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x dplyr::recode() masks car::recode()
## x purrr::some()   masks car::some()

ggplot(data=p) + geom_point(mapping = aes(x=education, y=prestige))

ggplot(data=p) + geom_point(mapping = aes(x=income, y=prestige))

ggplot(data=p) + geom_point(mapping = aes(x=women, y=prestige))

ggplot(data=p) + geom_point(mapping = aes(x=type, y=prestige))

Seems like the scatter plots also reflect the correlation values that we calculated earlier. Even we cannot calculate the correlation between “type” and prestige, we can see from the scattor plot that professtionals and white collar have a higher prestige score than blue collar, which kind of make sense.

Let’s run a multiple linear regression model

First: You will need to install the ISLR package.

install.package(“ISLR”)

Then will need to load the package

Lets build a multiple linear regression model, by setting “prestige” as a dependent variable and other variables as independent variables. We shouldnt be including the “census” in the model given that it is only a numeric occupational code, but let see if the model can identify that it is not a good predictor of prestige.

library(ISLR)

## Warning: package 'ISLR' was built under R version 3.4.4

#Run linear regression
p_lm=lm(formula = prestige ~., data=p)
summary(p_lm)

## 
## Call:
## lm(formula = prestige ~ ., data = p)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9863  -4.9813   0.6983   4.8690  19.2402 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.213e+01  8.018e+00  -1.513  0.13380    
## education    3.933e+00  6.535e-01   6.019 3.64e-08 ***
## income       9.946e-04  2.601e-04   3.824  0.00024 ***
## women        1.310e-02  3.018e-02   0.434  0.66524    
## census       1.156e-03  6.183e-04   1.870  0.06471 .  
## typeprof     1.077e+01  4.676e+00   2.303  0.02354 *  
## typewc       2.877e-01  3.139e+00   0.092  0.92718    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.037 on 91 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.841,  Adjusted R-squared:  0.8306 
## F-statistic: 80.25 on 6 and 91 DF,  p-value: < 2.2e-16

The variables for “education”, “income” and “typerof” have a p -value of less than 0.05 which suggests that they are statistically significant in predicting the value “prestige”. The Adjusted R-squared is a measure of fit, a higher value close to 1 means better fit. In this case, the adjusted R square is very high at 0.8306. Let’s see if we can improve our model by removing the statiscally insignificant variables with p value over 0.05.

library(ISLR)


#Run linear regression
p_lm2=lm(formula = prestige ~ education + income + type, data=p)
summary(p_lm2)

## 
## Call:
## lm(formula = prestige ~ education + income + type, data = p)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.9529  -4.4486   0.1678   5.0566  18.6320 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.6229292  5.2275255  -0.119    0.905    
## education    3.6731661  0.6405016   5.735 1.21e-07 ***
## income       0.0010132  0.0002209   4.586 1.40e-05 ***
## typeprof     6.0389707  3.8668551   1.562    0.122    
## typewc      -2.7372307  2.5139324  -1.089    0.279    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.095 on 93 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.8349, Adjusted R-squared:  0.8278 
## F-statistic: 117.5 on 4 and 93 DF,  p-value: < 2.2e-16

The Adjusted R-squared is lowered and we can also see that the variables for “type” is statiscally insignificant with p-value over 0.05. Let’s remove “type” and see if we can improve this model

p_lm3=lm(formula = prestige ~ education + income, data=p)
summary(p_lm3)

## 
## Call:
## lm(formula = prestige ~ education + income, data = p)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.4040  -5.3308   0.0154   4.9803  17.6889 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.8477787  3.2189771  -2.127   0.0359 *  
## education    4.1374444  0.3489120  11.858  < 2e-16 ***
## income       0.0013612  0.0002242   6.071 2.36e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.81 on 99 degrees of freedom
## Multiple R-squared:  0.798,  Adjusted R-squared:  0.7939 
## F-statistic: 195.6 on 2 and 99 DF,  p-value: < 2.2e-16

##calcuate Mean squared error 

#First model
mean(p_lm$residuals^2)

## [1] 45.97815

#Second Model
mean(p_lm2$residuals^2)

## [1] 47.76812

#Third Model
mean(p_lm3$residuals^2)

## [1] 59.20442

mean squared error (MSE) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and what is estimated. It is always non-negative, and values closer to zero are better.

We can also plot some graphs for all the models. This can be done by using the plot() function

#plot linear regression for regression 1
layout(matrix(1:4, 2, 2))
plot(p_lm)

#plot linear regression for regression 2
layout(matrix(1:4, 2, 2))
plot(p_lm2)

#plot linear regression for regression 3
layout(matrix(1:4, 2, 2))
plot(p_lm3)

Based on the above graphs, we can see that the dependent variable “prestige” is normally distributed.

Regression Outlier and Influence Plot

To further check on outliers and influence point, we can also plot the influence plot graphs.

This function creates a “bubble” plot of Studentized residuals versus hat values, with the areas of the circles representing the observations proportional to the value Cook’s distance. Vertical reference lines are drawn at twice and three times the average hat value, horizontal reference lines at -2, 0, and 2 on the Studentized-residual scale.

#check influence for regression 1
influencePlot(p_lm, id.n=3)

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

##                        StudRes        Hat      CookD
## general.managers    -0.9199659 0.43760411 0.09423639
## medical.technicians  2.9700914 0.07964707 0.10042624
## electronic.workers   2.0655585 0.07809617 0.04984309
## pilots              -0.9288104 0.34407247 0.06474496

#check influence for regression 2
influencePlot(p_lm2, id.n=3)

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

##                        StudRes        Hat       CookD
## general.managers    -1.3134574 0.33504477 0.172503975
## physicians          -0.3953204 0.22420309 0.009115491
## medical.technicians  2.8210910 0.06858836 0.109052582
## electronic.workers   2.2251940 0.02701237 0.026372394

#check influence for regression 3
influencePlot(p_lm3, id.n=3)

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

##                     StudRes        Hat      CookD
## general.managers -1.5101887 0.27146172 0.27965035
## physicians       -0.9223321 0.20339599 0.07251181
## newsboys         -2.5960873 0.03105248 0.06805145
## farmers           2.3517921 0.03011295 0.05473610

## to check which one has high residuals for regression 1
qqPlot(p_lm, id.n=3)

## medical.technicians  electronic.workers 
##                  31                  82

## to check which one has high residuals for regression 2
qqPlot(p_lm2, id.n=3)

## medical.technicians  electronic.workers 
##                  31                  82

## to check which one has high residuals for regression 3
qqPlot(p_lm3, id.n=3)

## newsboys  farmers 
##       53       67

# id.n – id observations with high residual

To Create Diagnostic Plots For all 3 regression models

Cook’s distance measures how much an observation influences the overall model or predicted values

Studentizided residualis the quotient resulting from the division of a residual by an estimate of its standard deviation.

Bonferroni test is a type of multiple comparison test used in statistical analysis. When an experimenter performs enough hypothesis tests, he or she will eventually end up with a result that shows statistical significance of the dependent variable, even if there is none.

Hat-points identify influential observations (have a high impact on the predictor variables)

#Regression model 1
influenceIndexPlot(p_lm, id.n = 3)

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

#Regression model 2
influenceIndexPlot(p_lm2, id.n = 3)

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

#Regression model 3
influenceIndexPlot(p_lm3, id.n = 3)

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.window(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "id.n" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "id.n" is not
## a graphical parameter

## Warning in box(...): "id.n" is not a graphical parameter

## Warning in title(...): "id.n" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "id.n" is not a
## graphical parameter

Outliers – Bonferonni test

Null for the Bonferonni adjusted outlier test is the observation is an outlier.

For Regression 1, observation related to ‘medical.technicians’ is an outlier. For Regression 2, observation related to ‘medical.technicians’ is an outlier. For Regression 3, observation related to ‘newsboys’ is an outlier.

outlierTest(p_lm)

## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
##                     rstudent unadjusted p-value Bonferonni p
## medical.technicians 2.970091          0.0038164        0.374

outlierTest(p_lm2)

## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
##                     rstudent unadjusted p-value Bonferonni p
## medical.technicians 2.821091          0.0058632      0.57459

outlierTest(p_lm3)

## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
##           rstudent unadjusted p-value Bonferonni p
## newsboys -2.596087           0.010879           NA

Conclusion

Based on the results above, a person with a higher “income” and “education” will have a higher prestige score.

Now, you should be able to build a multiple linear regression model and how to visualize the results using R packages.

References

Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, Second Edition. Sage.

Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition, Sage.

assign 1 2018

jacky

August 19, 2018