Brokerage Satisfaction data is upload to the R Program.
library('xlsx')
Brokerage <- read.xlsx("/Users/jusimioni/Desktop/Brokerage Satisfaction.xlsx", sheetIndex = 1)
Brokerage <- as.data.frame(Brokerage)
Brokerage <- Brokerage[,-1]
The first step it is to use a regression to predict the Overall Satisfaction with Eletronic Trades. Since the data set does not contain many rows, all rows will be used as training data.
model1 <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution + Satisfaction_with_Trade_Price, data = Brokerage)
summary(model1)
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution +
## Satisfaction_with_Trade_Price, data = Brokerage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## Satisfaction_with_Speed_of_Execution 0.4897 0.2016 2.429 0.033469 *
## Satisfaction_with_Trade_Price 0.7746 0.1521 5.093 0.000348 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
Using the above model the function B1(Satisfaction_with_Speed_of_Execution) + B2(Satisfaction_with_Trade_Price) + B3, where B1, B2, and B3 are real numbers will be generated.
confint(model1, level = 0.80)
## 10 % 90 %
## (Intercept) -1.7879306 0.4612749
## Satisfaction_with_Speed_of_Execution 0.2148115 0.7645252
## Satisfaction_with_Trade_Price 0.5672241 0.9819956
There is an 80% probability that the number B1 will fall between 0.2148115 and 0.7645252. There is an 80% probability that the number B2 will fall between 0.5672241 and 0.9819956.
obs = data.frame(Satisfaction_with_Speed_of_Execution = c(3), Satisfaction_with_Trade_Price = c(4))
obs
## Satisfaction_with_Speed_of_Execution Satisfaction_with_Trade_Price
## 1 3 4
Obtain the prediction for the new observations.
predict(model1, obs, type = "response")
## 1
## 3.904117
Obtain the prediction intervals for new observation.
predict(model1, obs, interval = "predict", level = 0.90, type = "response")
## fit lwr upr
## 1 3.904117 3.174452 4.633781
There is a 90% chance that this prediction will fall between 3.174452
and 4.633781.
b) When the Satisfaction_with_Speed_of_Execution is 3 and the
Satisfaction_with_Trade_Price is 4.
predict(model1, obs, interval = "confidence", level = 0.90, type = "response")
## fit lwr upr
## 1 3.904117 3.514362 4.293871
There is a 90% chance that the mean response (i.e., mean value of the target variable) will fall between 3.514362 and 4.293871.
c)Suppose that we want to use the regression model created in the preceding question to predict the Overall_Satisfaction_with_Electronic_Trades when the Satisfaction_with_Speed_of_Execution is 2, and the Satisfaction_with_Trade_Price is 3.
obs2 = data.frame(Satisfaction_with_Speed_of_Execution = c(3), Satisfaction_with_Trade_Price = c(4))
obs2
## Satisfaction_with_Speed_of_Execution Satisfaction_with_Trade_Price
## 1 3 4
Obtain the prediction intervals for the new observation.
predict(model1, obs2, interval = "predict", level = 0.85, type = "response")
## fit lwr upr
## 1 3.904117 3.275346 4.532887
There is an 85% chance that this prediction will fall between 3.275346 and 4.532887.
predict(model1, obs2, interval = "confidence", level = 0.85, type = "response")
## fit lwr upr
## 1 3.904117 3.568255 4.239978
There is an 85% chance that the mean response will fall between 3.568255 and 4.239978.
Use unit normal scaling to calculate standardized regression coefficients for the model that you created in #1. Based on these coefficients, which covariate is more influential in predicting overall satisfaction? Is the Satisfaction_with_Speed_of_Execution more influential than the Satisfaction_with_Trade_Price? Or is the Satisfaction_with_Trade_Price more influential than the Satisfaction_with_Speed_of_Execution?
summary(model1)
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution +
## Satisfaction_with_Trade_Price, data = Brokerage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## Satisfaction_with_Speed_of_Execution 0.4897 0.2016 2.429 0.033469 *
## Satisfaction_with_Trade_Price 0.7746 0.1521 5.093 0.000348 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
# Transform the data using unit normal scaling
broker_unit_normal = as.data.frame(apply(Brokerage, 2, function(X){(X - mean(X))/sd(X)}))
# Redo regression
model1_unit_normal <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution +
Satisfaction_with_Trade_Price, data = broker_unit_normal)
# Obtain stardardized regression coeffcients
model1_unit_normal
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution +
## Satisfaction_with_Trade_Price, data = broker_unit_normal)
##
## Coefficients:
## (Intercept) Satisfaction_with_Speed_of_Execution
## 4.115e-16 3.870e-01
## Satisfaction_with_Trade_Price
## 8.115e-01
Satisfaction with trade price is more influential predicting the Overall Satisfaction.
Using data_Rocket_Prop file.
rocket<- read.csv("/Users/jusimioni/Desktop/data_RocketProp.csv")
#Linear Regression
model <- lm(y ~x, data = rocket)
summary(model)
##
## Call:
## lm(formula = y ~ x, data = rocket)
##
## Residuals:
## Min 1Q Median 3Q Max
## -215.98 -50.68 28.74 66.61 106.76
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2627.822 44.184 59.48 < 2e-16 ***
## x -37.154 2.889 -12.86 1.64e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 96.11 on 18 degrees of freedom
## Multiple R-squared: 0.9018, Adjusted R-squared: 0.8964
## F-statistic: 165.4 on 1 and 18 DF, p-value: 1.643e-10
The regression model show a strong relathionship between x and y explaining 90.18% of the data.
#Design Matrix
X= model.matrix(model)
X
## (Intercept) x
## 1 1 15.50
## 2 1 23.75
## 3 1 8.00
## 4 1 17.00
## 5 1 5.50
## 6 1 19.00
## 7 1 24.00
## 8 1 2.50
## 9 1 7.50
## 10 1 11.00
## 11 1 13.00
## 12 1 3.75
## 13 1 25.00
## 14 1 9.75
## 15 1 22.00
## 16 1 18.00
## 17 1 6.00
## 18 1 12.50
## 19 1 2.00
## 20 1 21.50
## attr(,"assign")
## [1] 0 1
Claulate the leverage for the data points.
cbind(rocket, leverage = hatvalues(model))
## y x leverage
## 1 2158.70 15.50 0.05412893
## 2 1678.15 23.75 0.14750959
## 3 2316.00 8.00 0.07598722
## 4 2061.30 17.00 0.06195725
## 5 2207.50 5.50 0.10586587
## 6 1708.30 19.00 0.07872092
## 7 1784.70 24.00 0.15225968
## 8 2575.00 2.50 0.15663134
## 9 2357.90 7.50 0.08105925
## 10 2256.70 11.00 0.05504393
## 11 2165.20 13.00 0.05011875
## 12 2399.55 3.75 0.13350221
## 13 1779.80 25.00 0.17238964
## 14 2336.75 9.75 0.06179345
## 15 1765.30 22.00 0.11742196
## 16 2053.50 18.00 0.06943538
## 17 2414.40 6.00 0.09898644
## 18 2200.50 12.50 0.05067227
## 19 2654.20 2.00 0.16667373
## 20 1753.70 21.50 0.10984216
The maximun leverage value for the model
max(hatvalues(model))
## [1] 0.1723896
The maximun value is 0.1723896.
Predicting y when x is 25.5.
x_new = c(x = c(1,25.5))
t(x_new)%*%solve(t(X)%*%X)%*%x_new
## [,1]
## [1,] 0.1831324
This prediction is considered extrapolation since the leverage value is larger than the leverage maximun value. Predicting y when x is 15.
x_new1 = c(x = c(1,15))
t(x_new1)%*%solve(t(X)%*%X)%*%x_new1
## [,1]
## [1,] 0.05242319
This prediction is not extrapolation since the leverage value is smaller the the maximun leverage value.
Calculate Cook’s distance for all datapoints.
cooks.distance(model)
## 1 2 3 4 5 6
## 0.0373281981 0.0497291858 0.0010260760 0.0161482719 0.3343768993 0.2290842436
## 7 8 9 10 11 12
## 0.0270491200 0.0191323748 0.0003959877 0.0047094549 0.0012482345 0.0761514881
## 13 14 15 16 17 18
## 0.0889892211 0.0192517639 0.0166302585 0.0387158541 0.0005955991 0.0041888627
## 19 20
## 0.1317143774 0.0425721512
max(cooks.distance(model))
## [1] 0.3343769
The maximun value for cook’s distance is 0.3343768993. There are no outliers we should be concerned since we do not have values above 1.