Homework 3: Part 1

1 - Brokerage Regression Model

We previously discovered that the name of the brokerage was not a significant variable, thus it will not be included in the regression model.

bro_lm <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + Satisfaction_with_Speed_of_Execution , data = Brokerage)
summary(bro_lm)
## 
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + 
##     Satisfaction_with_Speed_of_Execution, data = Brokerage)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58886 -0.13863 -0.09120  0.05781  0.64613 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           -0.6633     0.8248  -0.804 0.438318    
## Satisfaction_with_Trade_Price          0.7746     0.1521   5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution   0.4897     0.2016   2.429 0.033469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared:  0.7256, Adjusted R-squared:  0.6757 
## F-statistic: 14.54 on 2 and 11 DF,  p-value: 0.0008157

The relationship between variables is as follows:

\[ \hat{Y} = -0.6633 + 0.7746 \cdot \text{Satisfaction_with_Trade_Price} + 0.4897 \cdot \text{Satisfaction_with_Speed_of_Execution} \]

confint(bro_lm, level = 0.80)
##                                            10 %      90 %
## (Intercept)                          -1.7879306 0.4612749
## Satisfaction_with_Trade_Price         0.5672241 0.9819956
## Satisfaction_with_Speed_of_Execution  0.2148115 0.7645252

1a.

There is an 80% probability that the number B1 will fall between .567 and .981

1b.

There is an 80% probability that the number B2 will fall between .214 and .765

2 - Brokerage Predictions

2a.

obs_for_pred = data.frame(Satisfaction_with_Trade_Price = 4,Satisfaction_with_Speed_of_Execution = 3)

predict(bro_lm,obs_for_pred, interval = "prediction", level = .90, type = "response" )
##        fit      lwr      upr
## 1 3.904117 3.174452 4.633781

There is a 90% chance that this prediction will fall between 3.174 and 4.634 when Satisfaction_with_Trade_Price = 4 and Satisfaction_with_Speed_of_Execution = 3.

2b.

predict(bro_lm, obs_for_pred, interval = "confidence", level = 0.90, type = "response")
##        fit      lwr      upr
## 1 3.904117 3.514362 4.293871

There is a 90% chance that the mean response will fall between 3.514 and 4.294 when Satisfaction_with_Trade_Price = 4 and Satisfaction_with_Speed_of_Execution = 3.

2c.

obs_for_pred2 = data.frame(Satisfaction_with_Trade_Price = 3,Satisfaction_with_Speed_of_Execution = 2)

predict(bro_lm,obs_for_pred2, interval = "prediction", level = .85, type = "response" )
##        fit      lwr      upr
## 1 2.639838 1.965909 3.313768

There is an 85% chance that this prediction will fall between 1.966 and 3.314 when Satisfaction_with_Trade_Price = 3 and Satisfaction_with_Speed_of_Execution = 2.

2d.

predict(bro_lm, obs_for_pred2, interval = "confidence", level = 0.90, type = "response")
##        fit      lwr    upr
## 1 2.639838 2.159077 3.1206

There is an 85% chance that the mean response will fall between 2.159 and 3.1206 when Satisfaction_with_Trade_Price = 3 and Satisfaction_with_Speed_of_Execution = 2.

3 - Brokerage Unit Scaling

3.

brokerage_unit_normal = as.data.frame(apply(Brokerage[,2:4], 2, function(x){(x - mean(x))/sd(x)}))

bro_unit_normal <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + Satisfaction_with_Speed_of_Execution , data = brokerage_unit_normal)

bro_unit_normal
## 
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + 
##     Satisfaction_with_Speed_of_Execution, data = brokerage_unit_normal)
## 
## Coefficients:
##                          (Intercept)         Satisfaction_with_Trade_Price  
##                            4.115e-16                             8.115e-01  
## Satisfaction_with_Speed_of_Execution  
##                            3.870e-01
summary(bro_unit_normal)
## 
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + 
##     Satisfaction_with_Speed_of_Execution, data = brokerage_unit_normal)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.97638 -0.22987 -0.15121  0.09586  1.07134 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                          4.115e-16  1.522e-01   0.000 1.000000    
## Satisfaction_with_Trade_Price        8.115e-01  1.593e-01   5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution 3.870e-01  1.593e-01   2.429 0.033469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5695 on 11 degrees of freedom
## Multiple R-squared:  0.7256, Adjusted R-squared:  0.6757 
## F-statistic: 14.54 on 2 and 11 DF,  p-value: 0.0008157
summary(bro_lm)
## 
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + 
##     Satisfaction_with_Speed_of_Execution, data = Brokerage)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58886 -0.13863 -0.09120  0.05781  0.64613 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           -0.6633     0.8248  -0.804 0.438318    
## Satisfaction_with_Trade_Price          0.7746     0.1521   5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution   0.4897     0.2016   2.429 0.033469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared:  0.7256, Adjusted R-squared:  0.6757 
## F-statistic: 14.54 on 2 and 11 DF,  p-value: 0.0008157

In both regression models, the Satisfaction_with_Trade_Price is the more influential variable as the coefficient is larger than that of Satisfaction_with_Speed_of_Execution.

Homework 3: Part 2

1 - Rocket Regression Model

1a.

Create a linear regression to predict y based on x

rock_lm <- lm(y ~ x, data = Rocket)
summary(rock_lm)
## 
## Call:
## lm(formula = y ~ x, data = Rocket)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -215.98  -50.68   28.74   66.61  106.76 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2627.822     44.184   59.48  < 2e-16 ***
## x            -37.154      2.889  -12.86 1.64e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 96.11 on 18 degrees of freedom
## Multiple R-squared:  0.9018, Adjusted R-squared:  0.8964 
## F-statistic: 165.4 on 1 and 18 DF,  p-value: 1.643e-10

The relationship is as follows:

\[ \hat{Y} = 2627.822 - 37.154 \cdot X \]

1b.

Create the design matrix for this regression model.

Des_Mat = model.matrix(rock_lm)
Des_Mat
##    (Intercept)     x
## 1            1 15.50
## 2            1 23.75
## 3            1  8.00
## 4            1 17.00
## 5            1  5.50
## 6            1 19.00
## 7            1 24.00
## 8            1  2.50
## 9            1  7.50
## 10           1 11.00
## 11           1 13.00
## 12           1  3.75
## 13           1 25.00
## 14           1  9.75
## 15           1 22.00
## 16           1 18.00
## 17           1  6.00
## 18           1 12.50
## 19           1  2.00
## 20           1 21.50
## attr(,"assign")
## [1] 0 1

2 - Analysis of Leverage

2a.

Calculate the leverage of all data points in the Rocket data set.

leverage_rock <- hatvalues(rock_lm)
leverage_rock
##          1          2          3          4          5          6          7 
## 0.05412893 0.14750959 0.07598722 0.06195725 0.10586587 0.07872092 0.15225968 
##          8          9         10         11         12         13         14 
## 0.15663134 0.08105925 0.05504393 0.05011875 0.13350221 0.17238964 0.06179345 
##         15         16         17         18         19         20 
## 0.11742196 0.06943538 0.09898644 0.05067227 0.16667373 0.10984216

2b.

Find the maximum leverage for Rocket data.

max(leverage_rock)
## [1] 0.1723896

The maximum leverage calculated is .172

3 - Rocket Predictors

3a.

Predict the value of y when x is 25.5. Determine whether this point would be extrapolation

x_value <- 25.5
predicted_y <- predict(rock_lm, newdata = data.frame(x = x_value))
predicted_y
##        1 
## 1680.406
x_new <- c(1,25.5)
t(x_new)%*%solve(t(Des_Mat)%*%Des_Mat)%*%x_new
##           [,1]
## [1,] 0.1831324

The leverage when x = 25.5 is .1831, this value is greater than the maximum within the original data set and is considered to be extrapolation.

3b.

Predict the value of y when x is 15. Determine whether this point would be extrapolation

x_value <- 15
predicted_y <- predict(rock_lm, newdata = data.frame(x = x_value))
predicted_y
##        1 
## 2070.518
x_new <- c(1,15)
t(x_new)%*%solve(t(Des_Mat)%*%Des_Mat)%*%x_new
##            [,1]
## [1,] 0.05242319

The calculated leverage is equal to .0524 which is smaller than that of the maximum for the Rocket data set, meaning that this is not extrapolation.

4 - Cook’s Distance

4a.

Calculate Cook’s Distance for all data points

cooks <- cooks.distance(rock_lm)

4b.

Identify the maximum point

max(cooks)
## [1] 0.3343769

The maximum value of cook’s distance is .334

4c.

Determine if there are any outliers within the data set

plot(rock_lm, which = 4)

influential <- cooks[(cooks > (3 * mean(cooks, na.rm = TRUE)))]
influential
##         5         6 
## 0.3343769 0.2290842

Although there are not direct outliers identified by cook’s distance, there are two points that could be considered influential. A general rule of thumb I have encountered is that any point that has a cook’s distance greater than 3 times the mean of the average may need to be examined further. Both points 5 and 6 have a distance are above that threshold. Both the plot of cook’s distance, and the simple calculation.