Read the excel file titled Brokerage Satisfaction into R Create a regression model to predict Overall_Satisfaction_with_Electronic_Trades based on the Satisfaction_with_Trade_Price and Satisfaction_with_Speed_of_Execution. The column Brokerage was eliminated since it has non-numerical data and will not be useful in the regression model.
# Read the excel file into R
library(readxl)
brokerage3 <- read_excel("G:/Other computers/My Laptop/Documents/Richard 621/Week 3/Brokerage Satisfaction.xlsx")
#Eliminate columns not used
brokerage3$`Â Â Brokerage` =NULL
# Create a regression model
brokerage_3 <- lm(Overall_Satisfaction_with_Electronic_Trades ~ ., data = brokerage3)
summary(brokerage_3)
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ .,
## data = brokerage3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## Satisfaction_with_Trade_Price 0.7746 0.1521 5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution 0.4897 0.2016 2.429 0.033469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
Prediction equation
\[B1(Satisfaction with Speed of Execution) +
B2(Satisfaction with Trade Price) + B3\]
Below an 80% confidence interval is calculated for B1 and B2
# 80% probability of coefficients B1 and B2
confint(brokerage_3, level=0.80)
## 10 % 90 %
## (Intercept) -1.7879306 0.4612749
## Satisfaction_with_Trade_Price 0.5672241 0.9819956
## Satisfaction_with_Speed_of_Execution 0.2148115 0.7645252
The regression model identified in problem 1 was used below to add new data to a data frame and perform predictions, prediciton intervals, and confidence intervals on the new data
# Data frame for new observations
Overall_Sat_predict_3=data.frame(Satisfaction_with_Trade_Price=c(4), Satisfaction_with_Speed_of_Execution=c(3))
Overall_Sat_predict_3
## Satisfaction_with_Trade_Price Satisfaction_with_Speed_of_Execution
## 1 4 3
# Prediction for new observations
predict(brokerage_3,Overall_Sat_predict_3, type="response")
## 1
## 3.904117
# Prediction interval for new observations
predict(brokerage_3, Overall_Sat_predict_3, interval = "prediction", level = 0.90, type = "response")
## fit lwr upr
## 1 3.904117 3.174452 4.633781
# confidence Interval for mean response for new observations
predict(brokerage_3, Overall_Sat_predict_3, interval = "confidence", level = 0.90, type = "response")
## fit lwr upr
## 1 3.904117 3.514362 4.293871
# Data frame for new observations
Overall_Sat_predict_4=data.frame(Satisfaction_with_Trade_Price=c(3), Satisfaction_with_Speed_of_Execution=c(2))
Overall_Sat_predict_4
## Satisfaction_with_Trade_Price Satisfaction_with_Speed_of_Execution
## 1 3 2
# Prediction interval for new observations
predict(brokerage_3,Overall_Sat_predict_4, type="response")
## 1
## 2.639838
# Prediction interval for new observations
predict(brokerage_3, Overall_Sat_predict_4, interval = "prediction", level = 0.85, type = "response")
## fit lwr upr
## 1 2.639838 1.965909 3.313768
# confidence Interval for mean response for new observations
predict(brokerage_3, Overall_Sat_predict_4, interval = "confidence", level = 0.85, type = "response")
## fit lwr upr
## 1 2.639838 2.225554 3.054123
Using unit normal scaling it was found that satisfaction_with_Trade_Price is more influential as it has a coefficient that is almost twice that of Satisfaction_with_Speed_of_Execution
# Need for Standardized regression coefficients and unit normal scaling
head(brokerage3)
## # A tibble: 6 × 3
## Satisfaction_with_Trade_Price Satisfaction_with_Speed_of_Exe… Overall_Satisfa…
## <dbl> <dbl> <dbl>
## 1 3.2 3.1 3.2
## 2 3.3 3.1 3.2
## 3 3.1 3.3 4
## 4 2.8 3.5 3.7
## 5 2.9 3.2 3
## 6 2.4 3.2 2.7
summary(brokerage_3)
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ .,
## data = brokerage3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## Satisfaction_with_Trade_Price 0.7746 0.1521 5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution 0.4897 0.2016 2.429 0.033469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
# transform the data using unit normal scaling
brokerage3_unit_normal = as.data.frame(apply(brokerage3, 2, function(x){(x - mean(x))/sd(x)}))
# redo regression
brokerage_3_unit_normal <- lm(Overall_Satisfaction_with_Electronic_Trades ~ ., data = brokerage3_unit_normal)
# obtain standardized regression coefficients
brokerage_3_unit_normal
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ .,
## data = brokerage3_unit_normal)
##
## Coefficients:
## (Intercept) Satisfaction_with_Trade_Price
## 4.115e-16 8.115e-01
## Satisfaction_with_Speed_of_Execution
## 3.870e-01
summary(brokerage_3_unit_normal)
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ .,
## data = brokerage3_unit_normal)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.97638 -0.22987 -0.15121 0.09586 1.07134
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.115e-16 1.522e-01 0.000 1.000000
## Satisfaction_with_Trade_Price 8.115e-01 1.593e-01 5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution 3.870e-01 1.593e-01 2.429 0.033469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5695 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
# Compare with the original regression, only estimates change
summary(brokerage_3)
##
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ .,
## data = brokerage3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## Satisfaction_with_Trade_Price 0.7746 0.1521 5.093 0.000348 ***
## Satisfaction_with_Speed_of_Execution 0.4897 0.2016 2.429 0.033469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
Read the CSV file titled data_RocketProp into R Create a regression model to predict y based on the x
# Read the CSV file into R
rocket <- read.csv("G:/Other computers/My Laptop/Documents/Richard 621/Week 3/data_RocketProp.csv")
# Create a regression model
rocket_1 <- lm(y ~ ., data = rocket)
summary(rocket_1)
##
## Call:
## lm(formula = y ~ ., data = rocket)
##
## Residuals:
## Min 1Q Median 3Q Max
## -215.98 -50.68 28.74 66.61 106.76
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2627.822 44.184 59.48 < 2e-16 ***
## x -37.154 2.889 -12.86 1.64e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 96.11 on 18 degrees of freedom
## Multiple R-squared: 0.9018, Adjusted R-squared: 0.8964
## F-statistic: 165.4 on 1 and 18 DF, p-value: 1.643e-10
Create the design matrix for the regression model
# Create design matrix
X = model.matrix(rocket_1)
X
## (Intercept) x
## 1 1 15.50
## 2 1 23.75
## 3 1 8.00
## 4 1 17.00
## 5 1 5.50
## 6 1 19.00
## 7 1 24.00
## 8 1 2.50
## 9 1 7.50
## 10 1 11.00
## 11 1 13.00
## 12 1 3.75
## 13 1 25.00
## 14 1 9.75
## 15 1 22.00
## 16 1 18.00
## 17 1 6.00
## 18 1 12.50
## 19 1 2.00
## 20 1 21.50
## attr(,"assign")
## [1] 0 1
The leverage and maximum leverage for the data_RocketProp data set are below
# Leverages
hatvalues(rocket_1)
## 1 2 3 4 5 6 7
## 0.05412893 0.14750959 0.07598722 0.06195725 0.10586587 0.07872092 0.15225968
## 8 9 10 11 12 13 14
## 0.15663134 0.08105925 0.05504393 0.05011875 0.13350221 0.17238964 0.06179345
## 15 16 17 18 19 20
## 0.11742196 0.06943538 0.09898644 0.05067227 0.16667373 0.10984216
# Maximum Leverage
max(hatvalues(rocket_1))
## [1] 0.1723896
Using the regression created in Problem 1 will the predicted x value of 25.5 be considered extrapolation Yes because the predicted x value in the regression model has a leverage higher than the max leverage
x_new=c(1,25.5)
t(x_new)%*%solve(t(X)%*%X)%*%x_new
## [,1]
## [1,] 0.1831324
What if the new predicted value of x is 15? No because the value of the leverage is below the maximum leverage
x_new=c(1,15)
t(x_new)%*%solve(t(X)%*%X)%*%x_new
## [,1]
## [1,] 0.05242319
Cook’s Distance for the data_RocketProp data set and the maximum cook’s distance are calculated below. Cook’s Distance is displayed visually in the fourth graph. None of the leverage points are greater than 1, no points should be removed.
# cook's distance
cooks.distance(rocket_1)
## 1 2 3 4 5 6
## 0.0373281981 0.0497291858 0.0010260760 0.0161482719 0.3343768993 0.2290842436
## 7 8 9 10 11 12
## 0.0270491200 0.0191323748 0.0003959877 0.0047094549 0.0012482345 0.0761514881
## 13 14 15 16 17 18
## 0.0889892211 0.0192517639 0.0166302585 0.0387158541 0.0005955991 0.0041888627
## 19 20
## 0.1317143774 0.0425721512
# max cook's distance
max(cooks.distance(rocket_1))
## [1] 0.3343769
# Plot of cook's distance
plot(rocket_1)