Collect the data
## [1] "C:/MSDS_Course/Spring_2022/DATA_605/Week_12"
## Year Month Interest_Rate Unemployment_Rate Stock_Price_Index
## 1 2017 12 2.75 5.3 1464
## 2 2017 11 2.50 5.3 1394
## 3 2017 10 2.50 5.3 1357
## 4 2017 9 2.50 5.3 1293
## 5 2017 8 2.50 5.4 1256
## 6 2017 7 2.50 5.6 1254
## [1] "Year" "Month" "Interest_Rate"
## [4] "Unemployment_Rate" "Stock_Price_Index"
ggplot(data = stock_data, aes(x = Interest_Rate, y = Stock_Price_Index)) +
geom_point()
ggplot(data = stock_data, aes(x = Unemployment_Rate, y = Stock_Price_Index)) +
geom_point()
model <- lm( data = stock_data, Stock_Price_Index ~ Interest_Rate + Unemployment_Rate)
summary(model)
##
## Call:
## lm(formula = Stock_Price_Index ~ Interest_Rate + Unemployment_Rate,
## data = stock_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -175.959 -38.459 7.664 51.635 111.670
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1461.9 942.6 1.551 0.13584
## Interest_Rate 386.5 115.3 3.351 0.00303 **
## Unemployment_Rate -206.3 123.9 -1.664 0.11088
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 73.08 on 21 degrees of freedom
## Multiple R-squared: 0.8902, Adjusted R-squared: 0.8797
## F-statistic: 85.12 on 2 and 21 DF, p-value: 8.444e-11
The p-values of Interest rate is less than 0.05, which is statistically significant in the multiple linear regression model.
Using the coefficients in the summary to build the equation:
Stock_Price_Index = (Intercept) + (Interest_Rate_coeff) * \(X{_1}\) (Unemployment_Rate) * \(X{_2}\)
Stock_Price_Index = 1461.9 + 386.5* \(X{_1}\) + (-206.3)* \(X{_2}\)
Assume interest rate is 2.1 and Unmeployment rate is 5.9 Then the Stock_Price_Index would be …
x1 <- 2.1
x2 <- 5.9
(Stock_Price_Index <- 1461.9 + (386.5* x1) + ((-206.3)* x2))
## [1] 1056.38
Fitted Value vs Residuals
plot(model$fitted.values, model$residuals, xlab='Fitted Values', ylab='Residuals')
abline(0,0)
When we look at the residuals vs X(fitted values) we can see sine-like curve pattern which indicates that the dataset is not linear.
It is possible to say that the outlier values do not show the same variance of the residuals; however, it is not very clear. I think it is reasonable to continue with the analysis and assume similar variance of residuals.
qqnorm(model$residuals)
qqline(model$residuals)
The normal Q-Q plot of the residuals appears to follow the theoretical line. Residuals are reasonably normally distributed.
Adjusted coefficient of determination of a multiple linear regression model is the coefficient of determination.
summary(model)$adj.r.squared
## [1] 0.8797363