Brokerage Satisfaction data is upload to the R Program.

library('xlsx')
Brokerage <- read.xlsx("/Users/jusimioni/Desktop/Brokerage Satisfaction.xlsx", sheetIndex = 1)
Brokerage <- as.data.frame(Brokerage)
class(Brokerage)
## [1] "data.frame"

The columns names are the following.

colnames(Brokerage)
## [1] "X..Brokerage"                               
## [2] "Satisfaction_with_Trade_Price"              
## [3] "Satisfaction_with_Speed_of_Execution"       
## [4] "Overall_Satisfaction_with_Electronic_Trades"

The tables header is the following.

head(Brokerage)
##                  X..Brokerage Satisfaction_with_Trade_Price
## 1             Scottrade, Inc.                           3.2
## 2              Charles Schwab                           3.3
## 3 Fidelity Brokerage Services                           3.1
## 4               TD Ameritrade                           2.8
## 5           E*Trade Financial                           2.9
## 6                (Not listed)                           2.4
##   Satisfaction_with_Speed_of_Execution
## 1                                  3.1
## 2                                  3.1
## 3                                  3.3
## 4                                  3.5
## 5                                  3.2
## 6                                  3.2
##   Overall_Satisfaction_with_Electronic_Trades
## 1                                         3.2
## 2                                         3.2
## 3                                         4.0
## 4                                         3.7
## 5                                         3.0
## 6                                         2.7

The first step it is to use a regression to predict the Overall Satisfaction with Eletronic Trades. Since the data set does not contain many rows, all rows will be used as training data.

model1 <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price+Satisfaction_with_Speed_of_Execution, data = Brokerage)  
summary(model1)
## 
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Trade_Price + 
##     Satisfaction_with_Speed_of_Execution, data = Brokerage)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58886 -0.13863 -0.09120  0.05781  0.64613 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           -0.6633     0.8248  -0.804 0.438318    
## Satisfaction_with_Trade_Price          0.7746     0.1521   5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution   0.4897     0.2016   2.429 0.033469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared:  0.7256, Adjusted R-squared:  0.6757 
## F-statistic: 14.54 on 2 and 11 DF,  p-value: 0.0008157

The linear regression show a strong relationship with Satisfaction_With_Speed_of_Execution according to the p-value of 0.000348. The Satisfaction_with_Trade_Price shows a small relathionship with the variable since the p-value is 0.033469 whats is less than 0.05, but not as strong as the first variable mention above. The overall model explains 34.35% of our data. Calculating the residual of the datapoints:

model1$residuals[9]^2*hatvalues(model1)[9]/(3*(summary(model1)$sigma)^2*(1 - hatvalues(model1)[9])^2) 
##           9 
## 0.006244236

Using a QQ-plot to check the normality of the residual.

qqnorm(model1$residuals, main = "model1") 
qqline(model1$residuals)
abline(h = 0, col = "grey")

The black line shows the residual value, the grey line is a constant over the 0 value. The dots at the chart show the data points. The residual line do not follow a straight line meaning it does not follow a normal distribution.

The next step it is to analyse the leverage of the data points.

X= model.matrix(model1)
X
##    (Intercept) Satisfaction_with_Trade_Price
## 1            1                           3.2
## 2            1                           3.3
## 3            1                           3.1
## 4            1                           2.8
## 5            1                           2.9
## 6            1                           2.4
## 7            1                           2.7
## 8            1                           2.4
## 9            1                           2.6
## 10           1                           2.3
## 11           1                           3.7
## 12           1                           2.5
## 13           1                           3.0
## 14           1                           1.0
##    Satisfaction_with_Speed_of_Execution
## 1                                   3.1
## 2                                   3.1
## 3                                   3.3
## 4                                   3.5
## 5                                   3.2
## 6                                   3.2
## 7                                   3.8
## 8                                   3.7
## 9                                   2.6
## 10                                  2.7
## 11                                  3.9
## 12                                  2.5
## 13                                  3.0
## 14                                  4.0
## attr(,"assign")
## [1] 0 1 2

The first step it is to create the model matrix as shown above. The next thing it is to calculate the average for the leverage.

mean(hatvalues(model1))
## [1] 0.2142857

The average leverage will classify if a datapoint has a high value (to be classified as a high value the leverage of the datapoint needs to be twice the value of the average leverage).

cbind(Brokerage, leverage = hatvalues(model1))
##                           X..Brokerage Satisfaction_with_Trade_Price
## 1                      Scottrade, Inc.                           3.2
## 2                       Charles Schwab                           3.3
## 3          Fidelity Brokerage Services                           3.1
## 4                        TD Ameritrade                           2.8
## 5                    E*Trade Financial                           2.9
## 6                         (Not listed)                           2.4
## 7          Vanguard Brokerage Services                           2.7
## 8              USAA Brokerage Services                           2.4
## 9                          Thinkorswim                           2.6
## 10             Wells Fargo Investments                           2.3
## 11                 Interactive Brokers                           3.7
## 12                           Zecco.com                           2.5
## 13                Firstrade Securities                           3.0
## 14 Banc of America Investment Services                           1.0
##    Satisfaction_with_Speed_of_Execution
## 1                                   3.1
## 2                                   3.1
## 3                                   3.3
## 4                                   3.5
## 5                                   3.2
## 6                                   3.2
## 7                                   3.8
## 8                                   3.7
## 9                                   2.6
## 10                                  2.7
## 11                                  3.9
## 12                                  2.5
## 13                                  3.0
## 14                                  4.0
##    Overall_Satisfaction_with_Electronic_Trades   leverage
## 1                                          3.2 0.12226809
## 2                                          3.2 0.14248379
## 3                                          4.0 0.10348052
## 4                                          3.7 0.09498002
## 5                                          3.0 0.07909281
## 6                                          2.7 0.09225511
## 7                                          2.7 0.17268548
## 8                                          3.4 0.14817354
## 9                                          2.7 0.22725402
## 10                                         2.3 0.22639250
## 11                                         4.0 0.45080022
## 12                                         2.5 0.28805237
## 13                                         3.0 0.10586879
## 14                                         2.0 0.74621276

Based on the results displayed by the leverage table.The brokerages 11 & 14 have surpassed twice of the average value (0.4285714). Because of the abnormal values for those datapoints, the regression may be re-build. The next step it is to run a cook’s distance to show the influence of those points.

cooks.distance(model1)
##            1            2            3            4            5            6 
## 0.0079790547 0.0243407712 0.1518660850 0.0756708677 0.0059271795 0.0012425789 
##            7            8            9           10           11           12 
## 0.2471814307 0.0888808107 0.0062442356 0.0210623352 0.0533835032 0.0000111262 
##           13           14 
## 0.0062752299 0.1601935254

Looking at the Cook’s distance results, none of the results have values greater than 1.

plot(model1)

Looking at the Residual vs Leverage chart. The Branch 14 has a considerable distance from the other datapoints. Removing branch 14 may help developing a stronger analysis model.

The following cases will be predicted using the regression model1. The first case is Satisfaction_with_Speed_of_Execution = 2, Satisfaction_with_Trade_Price = 4.

x_new = c(1, 2, 4)
t(x_new)%*%solve(t(X)%*%X)%*%x_new
##           [,1]
## [1,] 0.3236157

Case 2 - Satisfaction_with_Speed_of_Execution = 3, Satisfaction_with_Trade_Price = 5

x_new = c(1, 3, 5)
t(x_new)%*%solve(t(X)%*%X)%*%x_new
##          [,1]
## [1,] 1.169531

Case 3 - Satisfaction_with_Speed_of_Execution = 3, Satisfaction_with_Trade_Price = 4

x_new = c(1, 3, 4)
t(x_new)%*%solve(t(X)%*%X)%*%x_new
##           [,1]
## [1,] 0.2932324

Case 4 - Satisfaction_with_Speed_of_Execution = 2, Satisfaction_with_Trade_Price = 3

x_new = c(1, 2, 3)
t(x_new)%*%solve(t(X)%*%X)%*%x_new
##           [,1]
## [1,] 0.2047188

Extrapolation occurs when a prediction is made outside of the region of data used to train the model. This is “risky” because we do not have any prior data to see what tends to happen in the particular situation that we are trying to predict. Determinig if a model will involve extrapolation, at first it is needed to determine the largest leverage of all datapoints.

Brokerage <- as.data.frame(hatvalues(model1))
Brokerage
##    hatvalues(model1)
## 1         0.12226809
## 2         0.14248379
## 3         0.10348052
## 4         0.09498002
## 5         0.07909281
## 6         0.09225511
## 7         0.17268548
## 8         0.14817354
## 9         0.22725402
## 10        0.22639250
## 11        0.45080022
## 12        0.28805237
## 13        0.10586879
## 14        0.74621276

Organizing the results in descending order.

Brokerage[order(-Brokerage['hatvalues(model1)']),]
##  [1] 0.74621276 0.45080022 0.28805237 0.22725402 0.22639250 0.17268548
##  [7] 0.14817354 0.14248379 0.12226809 0.10586879 0.10348052 0.09498002
## [13] 0.09225511 0.07909281

The largest data point for the Brokerage Satisfaction excel file is 0.74621276.
Looking at the previous predictions made. The predictions 1, 3, and 4 are consider extrapolation. The only predicion that surpassed the highest extrapolation model is case 2, where the results were 1.169531 what is higher than 0.746212.
Note - predictions made with extrapolation are not extremely trustworthy.