Homework 3: Part 1
1 - Brokerage Regression Model
We previously discovered that the name of the brokerage was not a significant variable, thus it will not be included in the regression model.
bro_lm <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + Satisfaction_with_Speed_of_Execution , data = Brokerage)
summary(bro_lm)##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price +
## Satisfaction_with_Speed_of_Execution, data = Brokerage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## Satisfaction_with_Trade_Price 0.7746 0.1521 5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution 0.4897 0.2016 2.429 0.033469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
The relationship between variables is as follows:
\[ \hat{Y} = -0.6633 + 0.7746 \cdot \text{Satisfaction_with_Trade_Price} + 0.4897 \cdot \text{Satisfaction_with_Speed_of_Execution} \]
## 10 % 90 %
## (Intercept) -1.7879306 0.4612749
## Satisfaction_with_Trade_Price 0.5672241 0.9819956
## Satisfaction_with_Speed_of_Execution 0.2148115 0.7645252
1a.
There is an 80% probability that the number B1 will fall between .567 and .981
1b.
There is an 80% probability that the number B2 will fall between .214 and .765
2 - Brokerage Predictions
2a.
obs_for_pred = data.frame(Satisfaction_with_Trade_Price = 4,Satisfaction_with_Speed_of_Execution = 3)
predict(bro_lm,obs_for_pred, interval = "prediction", level = .90, type = "response" )## fit lwr upr
## 1 3.904117 3.174452 4.633781
There is a 90% chance that this prediction will fall between 3.174 and 4.634 when Satisfaction_with_Trade_Price = 4 and Satisfaction_with_Speed_of_Execution = 3.
2b.
## fit lwr upr
## 1 3.904117 3.514362 4.293871
There is a 90% chance that the mean response will fall between 3.514 and 4.294 when Satisfaction_with_Trade_Price = 4 and Satisfaction_with_Speed_of_Execution = 3.
2c.
obs_for_pred2 = data.frame(Satisfaction_with_Trade_Price = 3,Satisfaction_with_Speed_of_Execution = 2)
predict(bro_lm,obs_for_pred2, interval = "prediction", level = .85, type = "response" )## fit lwr upr
## 1 2.639838 1.965909 3.313768
There is an 85% chance that this prediction will fall between 1.966 and 3.314 when Satisfaction_with_Trade_Price = 3 and Satisfaction_with_Speed_of_Execution = 2.
2d.
## fit lwr upr
## 1 2.639838 2.159077 3.1206
There is an 85% chance that the mean response will fall between 2.159 and 3.1206 when Satisfaction_with_Trade_Price = 3 and Satisfaction_with_Speed_of_Execution = 2.
3 - Brokerage Unit Scaling
3.
brokerage_unit_normal = as.data.frame(apply(Brokerage[,2:4], 2, function(x){(x - mean(x))/sd(x)}))
bro_unit_normal <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + Satisfaction_with_Speed_of_Execution , data = brokerage_unit_normal)
bro_unit_normal##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price +
## Satisfaction_with_Speed_of_Execution, data = brokerage_unit_normal)
##
## Coefficients:
## (Intercept) Satisfaction_with_Trade_Price
## 4.115e-16 8.115e-01
## Satisfaction_with_Speed_of_Execution
## 3.870e-01
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price +
## Satisfaction_with_Speed_of_Execution, data = brokerage_unit_normal)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.97638 -0.22987 -0.15121 0.09586 1.07134
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.115e-16 1.522e-01 0.000 1.000000
## Satisfaction_with_Trade_Price 8.115e-01 1.593e-01 5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution 3.870e-01 1.593e-01 2.429 0.033469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5695 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price +
## Satisfaction_with_Speed_of_Execution, data = Brokerage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## Satisfaction_with_Trade_Price 0.7746 0.1521 5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution 0.4897 0.2016 2.429 0.033469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
In both regression models, the Satisfaction_with_Trade_Price is the more influential variable as the coefficient is larger than that of Satisfaction_with_Speed_of_Execution.
Homework 3: Part 2
1 - Rocket Regression Model
1a.
Create a linear regression to predict y based on x
##
## Call:
## lm(formula = y ~ x, data = Rocket)
##
## Residuals:
## Min 1Q Median 3Q Max
## -215.98 -50.68 28.74 66.61 106.76
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2627.822 44.184 59.48 < 2e-16 ***
## x -37.154 2.889 -12.86 1.64e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 96.11 on 18 degrees of freedom
## Multiple R-squared: 0.9018, Adjusted R-squared: 0.8964
## F-statistic: 165.4 on 1 and 18 DF, p-value: 1.643e-10
The relationship is as follows:
\[ \hat{Y} = 2627.822 - 37.154 \cdot X \]
1b.
Create the design matrix for this regression model.
## (Intercept) x
## 1 1 15.50
## 2 1 23.75
## 3 1 8.00
## 4 1 17.00
## 5 1 5.50
## 6 1 19.00
## 7 1 24.00
## 8 1 2.50
## 9 1 7.50
## 10 1 11.00
## 11 1 13.00
## 12 1 3.75
## 13 1 25.00
## 14 1 9.75
## 15 1 22.00
## 16 1 18.00
## 17 1 6.00
## 18 1 12.50
## 19 1 2.00
## 20 1 21.50
## attr(,"assign")
## [1] 0 1
2 - Analysis of Leverage
2a.
Calculate the leverage of all data points in the Rocket data set.
## 1 2 3 4 5 6 7
## 0.05412893 0.14750959 0.07598722 0.06195725 0.10586587 0.07872092 0.15225968
## 8 9 10 11 12 13 14
## 0.15663134 0.08105925 0.05504393 0.05011875 0.13350221 0.17238964 0.06179345
## 15 16 17 18 19 20
## 0.11742196 0.06943538 0.09898644 0.05067227 0.16667373 0.10984216
3 - Rocket Predictors
3a.
Predict the value of y when x is 25.5. Determine whether this point would be extrapolation
## 1
## 1680.406
## [,1]
## [1,] 0.1831324
The leverage when x = 25.5 is .1831, this value is greater than the maximum within the original data set and is considered to be extrapolation.
3b.
Predict the value of y when x is 15. Determine whether this point would be extrapolation
## 1
## 2070.518
## [,1]
## [1,] 0.05242319
The calculated leverage is equal to .0524 which is smaller than that of the maximum for the Rocket data set, meaning that this is not extrapolation.
4 - Cook’s Distance
4b.
Identify the maximum point
## [1] 0.3343769
The maximum value of cook’s distance is .334
4c.
Determine if there are any outliers within the data set
## 5 6
## 0.3343769 0.2290842
Although there are not direct outliers identified by cook’s distance, there are two points that could be considered influential. A general rule of thumb I have encountered is that any point that has a cook’s distance greater than 3 times the mean of the average may need to be examined further. Both points 5 and 6 have a distance are above that threshold. Both the plot of cook’s distance, and the simple calculation.