Simple Linear Regression is a very straightforward simple linear approach for predicting a quantitative response y on the basis of a single predictor variable x. It assumes that there is approximately a linear relationship between x and y.
Q1. According to Advertising Age’s annual salary review, Mark Hurd, the 49-year-old chairman, president, and CEO of Hewlett-Packard Co., received an annual salary of $817,000, a bonus of more than $5 million, and other compensation exceeding $17 million. His total compensation was slightly better than the average CEO total pay of $12.4 million. The file ExecSalary.csv
Preview the document shows the age and annual salary (in thousands of dollars) for Mark Hurd and 14 other executives who led publicly held companies (Advertising Age, December 5, 2006)
setwd("C:/Users/plu5638/Desktop/Business Analytics/Module 5/")
Exec<-read.csv("ExecSalary.csv")
print(Exec)
## Executive Title Company Age
## 1 Charles Prince Chmn/CEO Citigroup 56
## 2 Harold McGraw III Chmn/Pres/CEO McGraw-Hill Cos. 57
## 3 James Dimon Pres/CEO JP Morgan Chase & Co. 50
## 4 K. Rupert Murdoch Chmn/CEO News Corp. 75
## 5 Kenneth D. Lewis Chmn/Pres/CEO Bank of America 58
## 6 Kenneth I. Chenault Chmn/CEO American Express Co. 54
## 7 Louis C. Camilleri Chmn/CEO Altria Group 51
## 8 Mark V. Hurd Chmn/Pres/CEO Hewlett-Packard Co. 49
## 9 Martin S. Sorrell CEO WPP Group 61
## 10 Robert L. Nardelli Chmn/Pres/CEO Home Depot 57
## 11 Samuel J. Palmisano Chmn/Pres/CEO IBM Corp. 55
## 12 David C. Novak Chmn/Pres/CEO Yum Brands 53
## 13 Henry R. Silverman Chmn/CEO Cendant Corp. 65
## 14 Robert C. Wright Chmn/CEO NBC Universal 62
## 15 Sumner Redstone Exec Chmn/Founder Viacom 82
## Salary_in_Thousands_USD
## 1 1000
## 2 1172
## 3 1000
## 4 4509
## 5 1500
## 6 1092
## 7 1663
## 8 817
## 9 1562
## 10 2164
## 11 1680
## 12 1173
## 13 3300
## 14 2500
## 15 5807
#Create a Scatter Plot
plot(Exec$Age~Exec$Salary_in_Thousands_USD)
Now let’s create a simple regression model using Age as a predictor (independent) variable and Salary as a response (dependent) variable.
#Simple Linear Regression Model
linear_mod_exec<-lm(Age~Salary_in_Thousands_USD, data=Exec) # Here the dependent variable, Salary (which we are trying to predict) comes first then comes the independent variable.
summary(linear_mod_exec) # this is the regression model output
##
## Call:
## lm(formula = Age ~ Salary_in_Thousands_USD, data = Exec)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5806 -2.0710 0.3294 1.7972 5.0309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.651e+01 1.362e+00 34.16 4.10e-14 ***
## Salary_in_Thousands_USD 6.054e-03 5.476e-04 11.06 5.55e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.946 on 13 degrees of freedom
## Multiple R-squared: 0.9039, Adjusted R-squared: 0.8965
## F-statistic: 122.2 on 1 and 13 DF, p-value: 5.546e-08
#Scatter Plot with best fit line
plot(Exec$Age~Exec$Salary_in_Thousands_USD)
abline(lm(Exec$Age~Exec$Salary_in_Thousands_USD), col="red", lwd=3) #Adds best fit line to the plot
#Adding R-squared value to the scatter plot
#Option 1: Reading the output and adding the R-squared value on the plot
text(81, 10, "R-squared = 0.6641") # Refer page 129 of Stowell for selecting the position of text on the plot
#Option 2: Letting R fill out the R-squared value on the plot
summ<-summary(linear_mod_exec) #we are saving the regression model output in object summ, then we can use it to call the value we want.
rsq<-round((summ$r.squared), digits=4) #Here, we are calling the adj.r.squared value from the summ object, round it to 4 digits and then save the value as rsq
text(81, 10, paste0("R-squared = ", rsq)) # we paste the rsq on the plot
summary(linear_mod_exec)
##
## Call:
## lm(formula = Age ~ Salary_in_Thousands_USD, data = Exec)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5806 -2.0710 0.3294 1.7972 5.0309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.651e+01 1.362e+00 34.16 4.10e-14 ***
## Salary_in_Thousands_USD 6.054e-03 5.476e-04 11.06 5.55e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.946 on 13 degrees of freedom
## Multiple R-squared: 0.9039, Adjusted R-squared: 0.8965
## F-statistic: 122.2 on 1 and 13 DF, p-value: 5.546e-08
The summary(linear_mod_exec) command gives us p values and standard errors for the coefficients, as well as the R2 statistic and F-statistic for the model.
The value of R-squared is a measure of the goodness of fit of the estimated regression equation. The Multiple R-squared: 0.9039 indicates that using Age as independent variable, our model explains about 90.39% variability in Age. In other words, 90.39% of the variability in the values in the age in the sample can be explained by the linear relationship between the age and salary.
The intercept is 4.651e+01 . It is the estimated value of the dependent variable y when the independent variable x is equal to 40. In other words, if the age for an executive is 40 years old then the salary would be about 4.651e+01.
The slope of independent variable Executives Salary in Thousands USD is 6.054e-03. It means every one thousand increase in salary, the mean salary would increase by 6.054e-03 Salary.
The residuals are the difference between the actual response values (Age) and model predicted response values (Age). We will discuss about residuals in the following sections.
Now lets, write our simple linear regression equation using the intercept and coefficient values:
Age_72<-4.651e+01 + 6.054e-03*(72) ## Simple Linear Regression equation
Age_72
## [1] 46.94589
Q. 2. The Dow Jones Industrial Average (DJIA) and the Standard & Poor’s 500 (S&P 500) indexes are used as measures of overall movement in the stock market. The DJIA is based on the price movements of 30 large companies; the S&P 500 is an index composed of 500 stocks. Some say the S&P 500 is a better measure of stock market performance because it is broader based. The closing price for the DJIA and the S&P 500 for 15 weeks, beginning with January 6, 2012, (Barron’s web site, April 17, 2012) are given in DJIAS_P500.csvPreview the document. [12 Points]
setwd("C:/Users/plu5638/Desktop/Business Analytics/Module 5/")
DJIA_500<-read.csv("DJIAs_P500.csv")
print(DJIA_500)
## Date DJIA S.P.500
## 1 January 6 12360 1278
## 2 January 13 12422 1289
## 3 January 20 12720 1315
## 4 January 27 12660 1316
## 5 February 3 12862 1345
## 6 February 10 12801 1343
## 7 February 17 12950 1362
## 8 February 24 12983 1366
## 9 March 2 12978 1370
## 10 March 9 12922 1371
## 11 March 16 13233 1404
## 12 March 23 13081 1397
## 13 March 30 13212 1408
## 14 April 5 13060 1398
## 15 April 13 12850 1370
#Create a Scatter Plot
plot(DJIA_500$DJIA~DJIA_500$S.P.500)
#Simple Linear Regression Model
linear_mod_DJIA_500<-lm(DJIA~S.P.500, data=DJIA_500) # Here the dependent variable, Time (which we are trying to predict) comes first then comes the independent variable.
summary(linear_mod_DJIA_500) # this is the regression model output
##
## Call:
## lm(formula = DJIA ~ S.P.500, data = DJIA_500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.59 -45.15 17.41 42.10 91.15
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4697.1066 528.0917 8.894 6.88e-07 ***
## S.P.500 6.0317 0.3894 15.488 9.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59.5 on 13 degrees of freedom
## Multiple R-squared: 0.9486, Adjusted R-squared: 0.9446
## F-statistic: 239.9 on 1 and 13 DF, p-value: 9.292e-10
#Scatter Plot with best fit line
plot(DJIA_500$DJIA~DJIA_500$S.P.500)
abline(lm(DJIA_500$DJIA~DJIA_500$S.P.500), col="red", lwd=3) #Adds best fit line to the plot
#Adding R-squared value to the scatter plot
#Option 1: Reading the output and adding the R-squared value on the plot
text(81, 10, "R-squared = 0.6641") # Refer page 129 of Stowell for selecting the position of text on the plot
#Option 2: Letting R fill out the R-squared value on the plot
summ_DJIA_500<-summary(linear_mod_DJIA_500) #we are saving the regression model output in object summ, then we can use it to call the value we want.
rsq_DJIA_500<-round((summ_DJIA_500$r.squared), digits=4) #Here, we are calling the adj.r.squared value from the summ object, round it to 4 digits and then save the value as rsq
text(81, 10, paste0("R-squared = ", rsq)) # we paste the rsq on the plot
summary(linear_mod_DJIA_500)
##
## Call:
## lm(formula = DJIA ~ S.P.500, data = DJIA_500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.59 -45.15 17.41 42.10 91.15
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4697.1066 528.0917 8.894 6.88e-07 ***
## S.P.500 6.0317 0.3894 15.488 9.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59.5 on 13 degrees of freedom
## Multiple R-squared: 0.9486, Adjusted R-squared: 0.9446
## F-statistic: 239.9 on 1 and 13 DF, p-value: 9.292e-10
confint(linear_mod_DJIA_500, level = 0.95)# computes a confidence interval for the coefficient estimates
## 2.5 % 97.5 %
## (Intercept) 3556.233890 5837.979265
## S.P.500 5.190417 6.873069
attributes(linear_mod_DJIA_500)# This prints the name of various attributes calculated in the linear model
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"
linear_mod_DJIA_500$coefficients # intercept and coefficient of DJIA we looked on the regression model output
## (Intercept) S.P.500
## 4697.106578 6.031743
linear_mod_DJIA_500$fitted.values #These are the predicted values calculated by using the regression equation
## 1 2 3 4 5 6 7 8
## 12405.67 12472.02 12628.85 12634.88 12809.80 12797.74 12912.34 12936.47
## 9 10 11 12 13 14 15
## 12960.59 12966.63 13165.67 13123.45 13189.80 13129.48 12960.59
linear_mod_DJIA_500$residuals #These are the errors
## 1 2 3 4 5 6
## -45.674299 -50.023473 91.151205 25.119462 52.198911 3.262398
## 7 8 9 10 11 12
## 37.659278 46.532306 17.405333 -44.626410 67.326067 -42.451731
## 13 14 15
## 22.199094 -69.483474 -110.594667
pred_time_DJIA_500<-linear_mod_DJIA_500$fitted.values
residual_error_DJIA_500<-linear_mod_DJIA_500$residuals
DJIA_500_pred<-cbind(DJIA_500,pred_time_DJIA_500,residual_error_DJIA_500) # the command cbind adds these two columns to the table
print(DJIA_500_pred)
## Date DJIA S.P.500 pred_time_DJIA_500 residual_error_DJIA_500
## 1 January 6 12360 1278 12405.67 -45.674299
## 2 January 13 12422 1289 12472.02 -50.023473
## 3 January 20 12720 1315 12628.85 91.151205
## 4 January 27 12660 1316 12634.88 25.119462
## 5 February 3 12862 1345 12809.80 52.198911
## 6 February 10 12801 1343 12797.74 3.262398
## 7 February 17 12950 1362 12912.34 37.659278
## 8 February 24 12983 1366 12936.47 46.532306
## 9 March 2 12978 1370 12960.59 17.405333
## 10 March 9 12922 1371 12966.63 -44.626410
## 11 March 16 13233 1404 13165.67 67.326067
## 12 March 23 13081 1397 13123.45 -42.451731
## 13 March 30 13212 1408 13189.80 22.199094
## 14 April 5 13060 1398 13129.48 -69.483474
## 15 April 13 12850 1370 12960.59 -110.594667
summary(linear_mod_DJIA_500)
##
## Call:
## lm(formula = DJIA ~ S.P.500, data = DJIA_500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.59 -45.15 17.41 42.10 91.15
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4697.1066 528.0917 8.894 6.88e-07 ***
## S.P.500 6.0317 0.3894 15.488 9.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59.5 on 13 degrees of freedom
## Multiple R-squared: 0.9486, Adjusted R-squared: 0.9446
## F-statistic: 239.9 on 1 and 13 DF, p-value: 9.292e-10
#sum of squares due to error (SSE)
SSE_DJIA_500<-sum(((DJIA_500$DJIA)-(linear_mod_DJIA_500$fitted.values))^2) # It measures the total sum of squared error in using our regression equation.
SSE_DJIA_500
## [1] 46028.21
n<-10 #number of observations
RSE_DJIA_500<-sqrt((1/(n-2))*SSE_DJIA_500) #We can find this value on regression output
RSE_DJIA_500
## [1] 75.852
#TOTAL SUM OF SQUARES, SST
SST_DJIA_500<-sum((DJIA_500$S.P.500 - mean(DJIA_500$S.P.500))^2) #It measures the total sum of squared error in using the mean of y to predict.
SST_DJIA_500
## [1] 23345.73
#SUM OF SQUARES DUE TO REGRESSION, SSR
SSR_DJIA_500<-sum((linear_mod_DJIA_500$fitted.values - mean(DJIA_500$S.P.500))^2)
SSR_DJIA_500
## [1] 1990629939
#sum of squares due to error (SSE): another formula
SSE_DJIA_500<-SST_DJIA_500-SSR_DJIA_500
SSE_DJIA_500
## [1] -1990606593
#The COEFFICIENT OF DETERMINATION (R-SQUARED, we also get this value on the linear model regression model output)
RSQ_DJIA_500<-SSR_DJIA_500/SST_DJIA_500 #Please see the interpretation of E-squared in the previous section.
RSQ_DJIA_500
## [1] 85267.4
#Residual Plot Against S.P. 500
plot(linear_mod_DJIA_500$fitted.values, linear_mod_DJIA_500$residuals, xlab="DJIA", ylab="Residuals", col="red",pch=19, main="Residual Plot Against S.P.500")
abline(h=0, lty=3) #Horizontal line at 0 residual value (y-axis), lyt=3 creates dashed line
#Normal Probability Plot of Residual
std_residual_DJIA_500<-rstandard(linear_mod_DJIA_500)
qqnorm(std_residual_DJIA_500,xlab="Normal Scores", ylab="Residuals", col="magenta3", pch=19, main="Normal Probability Plot of Residual" )
qqline(std_residual_DJIA_500)
shapiro.test(linear_mod_DJIA_500$residuals)
##
## Shapiro-Wilk normality test
##
## data: linear_mod_DJIA_500$residuals
## W = 0.95691, p-value = 0.6389
After looking at the output p value (0.6389), I think that the residuals of our regression model are normally distributed.
Q. 3. In 2011, home prices and mortgage rates fell so far that in a number of cities the monthly cost of owning a home was less expensive than renting. The RentMortgage.csv Preview the document data show the average asking rent for 10 markets and the monthly mortgage on the median priced home (including taxes and insurance) for 10 cities where the average monthly mortgage payment was less than the average asking rent (The Wall Street Journal, November 26-27, 2011). [10 Points]
setwd("C:/Users/plu5638/Desktop/Business Analytics/Module 5/")
RentMort<-read.csv("RentMortgage.csv")
print(RentMort)
## City Rent.... Mortgage....
## 1 Atlanta 840 539
## 2 Chicago 1062 1002
## 3 Detroit 823 626
## 4 Jacksonville, Fla. 779 711
## 5 Las Vegas 796 655
## 6 Miami 1071 977
## 7 Minneapolis 953 776
## 8 Orlando, Fla. 851 695
## 9 Phoenix 762 651
## 10 St. Louis 723 654
#Simple Linear Regression Model
linear_mod_RentMort<-lm(Rent....~Mortgage...., data=RentMort) # Here the dependent variable, Rent (which we are trying to predict) comes first then comes the independent variable.
summary(linear_mod_RentMort) # this is the regression model output
##
## Call:
## lm(formula = Rent.... ~ Mortgage...., data = RentMort)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.278 -41.365 5.764 29.495 107.995
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 351.0815 105.3493 3.333 0.01035 *
## Mortgage.... 0.7067 0.1419 4.981 0.00108 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 64.03 on 8 degrees of freedom
## Multiple R-squared: 0.7561, Adjusted R-squared: 0.7257
## F-statistic: 24.81 on 1 and 8 DF, p-value: 0.001079
#Scatter Plot with best fit line
plot(RentMort$Rent....~RentMort$Mortgage....)
abline(lm(RentMort$Rent....~RentMort$Mortgage....), col="red", lwd=3) #Adds best fit line to the plot
#Adding R-squared value to the scatter plot
#Option 1: Reading the output and adding the R-squared value on the plot
text(81, 10, "R-squared = 0.6641") # Refer page 129 of Stowell for selecting the position of text on the plot
#Option 2: Letting R fill out the R-squared value on the plot
summ_RentMort<-summary(linear_mod_RentMort) #we are saving the regression model output in object summ, then we can use it to call the value we want.
rsq_RentMort<-round((summ_RentMort$r.squared), digits=4) #Here, we are calling the adj.r.squared value from the summ object, round it to 4 digits and then save the value as rsq
text(81, 10, paste0("R-squared = ", rsq)) # we paste the rsq on the plot
summary(linear_mod_RentMort)
##
## Call:
## lm(formula = Rent.... ~ Mortgage...., data = RentMort)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.278 -41.365 5.764 29.495 107.995
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 351.0815 105.3493 3.333 0.01035 *
## Mortgage.... 0.7067 0.1419 4.981 0.00108 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 64.03 on 8 degrees of freedom
## Multiple R-squared: 0.7561, Adjusted R-squared: 0.7257
## F-statistic: 24.81 on 1 and 8 DF, p-value: 0.001079
confint(linear_mod_RentMort, level = 0.95)# computes a confidence interval for the coefficient estimates
## 2.5 % 97.5 %
## (Intercept) 108.1455790 594.017520
## Mortgage.... 0.3795108 1.033935
attributes(linear_mod_RentMort)# This prints the name of various attributes calculated in the linear model
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"
linear_mod_RentMort$coefficients # intercept and coefficient of DJIA we looked on the regression model output
## (Intercept) Mortgage....
## 351.0815492 0.7067231
linear_mod_RentMort$fitted.values #These are the predicted values calculated by using the regression equation
## 1 2 3 4 5 6 7 8
## 732.0053 1059.2181 793.4902 853.5617 813.9852 1041.5500 899.4987 842.2541
## 9 10
## 811.1583 813.2785
linear_mod_RentMort$residuals #These are the errors
## 1 2 3 4 5 6 7
## 107.994700 2.781904 29.509790 -74.561673 -17.985180 29.449982 53.501325
## 8 9 10
## 8.745896 -49.158287 -90.278457
pred_time_RentMort<-linear_mod_RentMort$fitted.values
residual_error_RentMort<-linear_mod_RentMort$residuals
RentMort_pred<-cbind(RentMort,pred_time_RentMort,residual_error_RentMort) # the command cbind adds these two columns to the table
print(RentMort_pred)
## City Rent.... Mortgage.... pred_time_RentMort
## 1 Atlanta 840 539 732.0053
## 2 Chicago 1062 1002 1059.2181
## 3 Detroit 823 626 793.4902
## 4 Jacksonville, Fla. 779 711 853.5617
## 5 Las Vegas 796 655 813.9852
## 6 Miami 1071 977 1041.5500
## 7 Minneapolis 953 776 899.4987
## 8 Orlando, Fla. 851 695 842.2541
## 9 Phoenix 762 651 811.1583
## 10 St. Louis 723 654 813.2785
## residual_error_RentMort
## 1 107.994700
## 2 2.781904
## 3 29.509790
## 4 -74.561673
## 5 -17.985180
## 6 29.449982
## 7 53.501325
## 8 8.745896
## 9 -49.158287
## 10 -90.278457
summary(linear_mod_RentMort)
##
## Call:
## lm(formula = Rent.... ~ Mortgage...., data = RentMort)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.278 -41.365 5.764 29.495 107.995
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 351.0815 105.3493 3.333 0.01035 *
## Mortgage.... 0.7067 0.1419 4.981 0.00108 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 64.03 on 8 degrees of freedom
## Multiple R-squared: 0.7561, Adjusted R-squared: 0.7257
## F-statistic: 24.81 on 1 and 8 DF, p-value: 0.001079
#sum of squares due to error (SSE)
SSE_RentMort<-sum(((RentMort$Rent....)-(linear_mod_RentMort$fitted.values))^2) # It measures the total sum of squared error in using our regression equation.
SSE_RentMort
## [1] 32797.25
n<-10 #number of observations
RSE_RentMort<-sqrt((1/(n-2))*SSE_RentMort) #We can find this value on regression output
RSE_RentMort
## [1] 64.02856
#TOTAL SUM OF SQUARES, SST
SST_RentMort<-sum((RentMort$Rent.... - mean(RentMort$Mortgage....))^2) #It measures the total sum of squared error in using the mean of y to predict.
SST_RentMort
## [1] 323281.6
#SUM OF SQUARES DUE TO REGRESSION, SSR
SSR_RentMort<-sum((linear_mod_RentMort$fitted.values - mean(RentMort$Mortgage....))^2)
SSR_RentMort
## [1] 290484.3
#sum of squares due to error (SSE): another formula
SSE_RentMort<-SST_RentMort-SSR_RentMort
SSE_RentMort
## [1] 32797.25
#The COEFFICIENT OF DETERMINATION (R-SQUARED, we also get this value on the linear model regression model output)
RSQ_RentMort<-SSR_RentMort/SST_RentMort #Please see the interpretation of E-squared in the previous section.
RSQ_RentMort
## [1] 0.898549
#Residual Plot Against S.P. 500
plot(linear_mod_RentMort$fitted.values, linear_mod_RentMort$residuals, xlab="Rent", ylab="Residuals", col="red",pch=19, main="Residual Plot Against Mortgage")
abline(h=0, lty=3) #Horizontal line at 0 residual value (y-axis), lyt=3 creates dashed line
#Normal Probability Plot of Residual
std_residual_RentMort<-rstandard(linear_mod_RentMort)
qqnorm(std_residual_RentMort,xlab="Normal Scores", ylab="Residuals", col="magenta3", pch=19, main="Normal Probability Plot of Residual" )
qqline(std_residual_DJIA_500)
shapiro.test(linear_mod_RentMort$residuals)
##
## Shapiro-Wilk normality test
##
## data: linear_mod_RentMort$residuals
## W = 0.97484, p-value = 0.9317
The distribution of the residuals are considered perfectly normal if the data points fall on the straight line on the Normal Q-Q Plot. On the plot, it can be seen that the data points are close to the straight line (although not all points are exactly on the line). We can assume that the residuals are very close to normal distribution.The p-value of the test is more than the alpha level of 0.05 concluding that the observed distribution fits the normal. distribution