Week 3 Homework Part 1
Use the Brokerage Satisfaction excel file to answer the following questions in R. Create an R Markdown file to answer the questions, and then “knit” your file to create an HTML document. Your HTML document should contain both textual explanations of your answers, as well as all R code needed to support your work.
B1(Satisfaction_with_Speed_of_Execution) + B2(Satisfaction_with_Trade_Price) + B3, where B1, B2, and B3 are real numbers
#import the read excel library
library(readxl)
brokerage <- read_excel("/Users/kamriefoster/Downloads/BrokerageSatisfaction.xlsx")
brokerage <- brokerage[,-1]
brokerage
## # A tibble: 14 × 3
## TradePrice SpeedofExecution ElectronicTrades
## <dbl> <dbl> <dbl>
## 1 3.2 3.1 3.2
## 2 3.3 3.1 3.2
## 3 3.1 3.3 4
## 4 2.8 3.5 3.7
## 5 2.9 3.2 3
## 6 2.4 3.2 2.7
## 7 2.7 3.8 2.7
## 8 2.4 3.7 3.4
## 9 2.6 2.6 2.7
## 10 2.3 2.7 2.3
## 11 3.7 3.9 4
## 12 2.5 2.5 2.5
## 13 3 3 3
## 14 1 4 2
#linear regression cannot use a column with characters
model1 <- lm(ElectronicTrades ~ SpeedofExecution + TradePrice, data = brokerage)
summary(model1)
##
## Call:
## lm(formula = ElectronicTrades ~ SpeedofExecution + TradePrice,
## data = brokerage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## SpeedofExecution 0.4897 0.2016 2.429 0.033469 *
## TradePrice 0.7746 0.1521 5.093 0.000348 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
# Confidence intervals for coefficients
confint(model1, level = 0.80)
## 10 % 90 %
## (Intercept) -1.7879306 0.4612749
## SpeedofExecution 0.2148115 0.7645252
## TradePrice 0.5672241 0.9819956
Fill-in the blanks for the following statements:
There is an 80% probability that the number B1 will fall between 0.2148115 and 0.7645252.
There is an 80% probability that the number B2 will fall between 0.5672241 and 0.9819956.
Suppose that we want to use the regression model created in the preceding question to predict the Overall_Satisfaction_with_Electronic_Trades when the Satisfaction_with_Speed_of_Execution is 3, and the Satisfaction_with_Trade_Price is 4. There is a 90% chance that this prediction will fall between 3.174452 and 4.633781.
When the Satisfaction_with_Speed_of_Execution is 3 and the Satisfaction_with_Trade_Price is 4, there is a 90% chance that the mean response (i.e., mean value of the target variable) will fall between 3.514362 and 4.293871.
# Create a data frame with new observations in order to make predictions
observations_for_pred = data.frame(SpeedofExecution = c(3), TradePrice = c(4))
observations_for_pred
## SpeedofExecution TradePrice
## 1 3 4
# Obtain prediction for new observations
predict(model1, observations_for_pred, type = "response")
## 1
## 3.904117
# Obtain prediction interval for new observations
predict(model1, observations_for_pred, interval = "prediction", level = 0.90, type = "response")
## fit lwr upr
## 1 3.904117 3.174452 4.633781
# Obtain confidence Interval for mean response for new observations
predict(model1, observations_for_pred, interval = "confidence", level = 0.90, type = "response")
## fit lwr upr
## 1 3.904117 3.514362 4.293871
Suppose that we want to use the regression model created in the preceding question to predict the Overall_Satisfaction_with_Electronic_Trades when the Satisfaction_with_Speed_of_Execution is 2, and the Satisfaction_with_Trade_Price is 3. There is an 85% chance that this prediction will fall between 1.965909 and 3.313768.
When the Satisfaction_with_Speed_of_Execution is 2 and the Satisfaction_with_Trade_Price is 3, there is an 85% chance that the mean response will fall between 2.225554 and 3.054123.
# Create a data frame with new observations in order to make predictions
observations_for_pred = data.frame(SpeedofExecution = c(2), TradePrice = c(3))
observations_for_pred
## SpeedofExecution TradePrice
## 1 2 3
# Obtain prediction for new observations
predict(model1, observations_for_pred, type = "response")
## 1
## 2.639838
# Obtain prediction interval for new observations
predict(model1, observations_for_pred, interval = "prediction", level = 0.85, type = "response")
## fit lwr upr
## 1 2.639838 1.965909 3.313768
# Obtain confidence Interval for mean response for new observations
predict(model1, observations_for_pred, interval = "confidence", level = 0.85, type = "response")
## fit lwr upr
## 1 2.639838 2.225554 3.054123
# Need for Standardized regression coefficients and unit normal scaling
head(brokerage)
## # A tibble: 6 × 3
## TradePrice SpeedofExecution ElectronicTrades
## <dbl> <dbl> <dbl>
## 1 3.2 3.1 3.2
## 2 3.3 3.1 3.2
## 3 3.1 3.3 4
## 4 2.8 3.5 3.7
## 5 2.9 3.2 3
## 6 2.4 3.2 2.7
summary(model1)
##
## Call:
## lm(formula = ElectronicTrades ~ SpeedofExecution + TradePrice,
## data = brokerage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58886 -0.13863 -0.09120 0.05781 0.64613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6633 0.8248 -0.804 0.438318
## SpeedofExecution 0.4897 0.2016 2.429 0.033469 *
## TradePrice 0.7746 0.1521 5.093 0.000348 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
# transform the data using unit normal scaling
brokerage_unit_normal = as.data.frame(apply(brokerage, 2, function(x){(x - mean(x))/sd(x)}))
# redo regression
model1_unit_normal <- lm(ElectronicTrades ~ SpeedofExecution + TradePrice, data = brokerage_unit_normal)
#obtain standardized regression coefficients
model1_unit_normal
##
## Call:
## lm(formula = ElectronicTrades ~ SpeedofExecution + TradePrice,
## data = brokerage_unit_normal)
##
## Coefficients:
## (Intercept) SpeedofExecution TradePrice
## 4.115e-16 3.870e-01 8.115e-01
summary(model1_unit_normal)
##
## Call:
## lm(formula = ElectronicTrades ~ SpeedofExecution + TradePrice,
## data = brokerage_unit_normal)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.97638 -0.22987 -0.15121 0.09586 1.07134
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.115e-16 1.522e-01 0.000 1.000000
## SpeedofExecution 3.870e-01 1.593e-01 2.429 0.033469 *
## TradePrice 8.115e-01 1.593e-01 5.093 0.000348 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5695 on 11 degrees of freedom
## Multiple R-squared: 0.7256, Adjusted R-squared: 0.6757
## F-statistic: 14.54 on 2 and 11 DF, p-value: 0.0008157
The Satisfaction_with_Trade_Price is more influential than the Satisfaction_with_Speed_of_Execution.
Week 3 Homework Part 2
Use the data_RocketProp csv file to answer the following questions in R. Create an R Markdown file to answer the questions, and then “knit” your file to create an HTML document. Your HTML document should contain both textual explanations of your answers, as well as all R code needed to support your work.
# Read the excel file into R
rocket <- read.csv("/Users/kamriefoster/Downloads/data_RocketProp.csv")
model <- lm(y ~ ., data = rocket)
summary(model)
##
## Call:
## lm(formula = y ~ ., data = rocket)
##
## Residuals:
## Min 1Q Median 3Q Max
## -215.98 -50.68 28.74 66.61 106.76
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2627.822 44.184 59.48 < 2e-16 ***
## x -37.154 2.889 -12.86 1.64e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 96.11 on 18 degrees of freedom
## Multiple R-squared: 0.9018, Adjusted R-squared: 0.8964
## F-statistic: 165.4 on 1 and 18 DF, p-value: 1.643e-10
X = model.matrix(model)
X
## (Intercept) x
## 1 1 15.50
## 2 1 23.75
## 3 1 8.00
## 4 1 17.00
## 5 1 5.50
## 6 1 19.00
## 7 1 24.00
## 8 1 2.50
## 9 1 7.50
## 10 1 11.00
## 11 1 13.00
## 12 1 3.75
## 13 1 25.00
## 14 1 9.75
## 15 1 22.00
## 16 1 18.00
## 17 1 6.00
## 18 1 12.50
## 19 1 2.00
## 20 1 21.50
## attr(,"assign")
## [1] 0 1
hatvalues(model)
## 1 2 3 4 5 6 7
## 0.05412893 0.14750959 0.07598722 0.06195725 0.10586587 0.07872092 0.15225968
## 8 9 10 11 12 13 14
## 0.15663134 0.08105925 0.05504393 0.05011875 0.13350221 0.17238964 0.06179345
## 15 16 17 18 19 20
## 0.11742196 0.06943538 0.09898644 0.05067227 0.16667373 0.10984216
max(hatvalues(model))
## [1] 0.1723896
The maximum leverage calculated in part a is 0.17238964.
x1_new = c(1,25.5)
t(x1_new)%*%solve(t(X)%*%X)%*%x1_new
## [,1]
## [1,] 0.1831324
Yes the value of y when x is 25.5 is extrapolation because the value calculated is higher than the value calculated for maximum leverage. (0.1831324 > 0.17238964)
x2_new = c(1,15)
t(x2_new)%*%solve(t(X)%*%X)%*%x2_new
## [,1]
## [1,] 0.05242319
No the value of y when x is 15 is not extrapolation because the value calculated is lower than the value calculated for maximum leverage. (0.05242319 < 0.17238964)
cooks.distance(model)
## 1 2 3 4 5 6
## 0.0373281981 0.0497291858 0.0010260760 0.0161482719 0.3343768993 0.2290842436
## 7 8 9 10 11 12
## 0.0270491200 0.0191323748 0.0003959877 0.0047094549 0.0012482345 0.0761514881
## 13 14 15 16 17 18
## 0.0889892211 0.0192517639 0.0166302585 0.0387158541 0.0005955991 0.0041888627
## 19 20
## 0.1317143774 0.0425721512
max(cooks.distance(model))
## [1] 0.3343769
Max Cook’s distance is 0.3343769.
plot(model)
Based on the answer in part b. there are no specific data points that are of concern as outliers in the dataset, since all Cook’s Distance values are under a value of 1.