Week 3

Brokerage Satisfaction data is upload to the R Program.

library('xlsx')
Brokerage <- read.xlsx("/Users/jusimioni/Desktop/Brokerage Satisfaction.xlsx", sheetIndex = 1)
Brokerage <- as.data.frame(Brokerage)
Brokerage <- Brokerage[,-1]

The first step it is to use a regression to predict the Overall Satisfaction with Eletronic Trades. Since the data set does not contain many rows, all rows will be used as training data.

model1 <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution + Satisfaction_with_Trade_Price, data = Brokerage)  
summary(model1)

## 
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution + 
##     Satisfaction_with_Trade_Price, data = Brokerage)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58886 -0.13863 -0.09120  0.05781  0.64613 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           -0.6633     0.8248  -0.804 0.438318    
## Satisfaction_with_Speed_of_Execution   0.4897     0.2016   2.429 0.033469 *  
## Satisfaction_with_Trade_Price          0.7746     0.1521   5.093 0.000348 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared:  0.7256, Adjusted R-squared:  0.6757 
## F-statistic: 14.54 on 2 and 11 DF,  p-value: 0.0008157

Using the above model the function B1(Satisfaction_with_Speed_of_Execution) + B2(Satisfaction_with_Trade_Price) + B3, where B1, B2, and B3 are real numbers will be generated.

confint(model1, level = 0.80)

##                                            10 %      90 %
## (Intercept)                          -1.7879306 0.4612749
## Satisfaction_with_Speed_of_Execution  0.2148115 0.7645252
## Satisfaction_with_Trade_Price         0.5672241 0.9819956

There is an 80% probability that the number B1 will fall between 0.2148115 and 0.7645252. There is an 80% probability that the number B2 will fall between 0.5672241 and 0.9819956.

Suppose that we want to use the regression model created in the preceding question to predict the Overall_Satisfaction_with_Electronic_Trades when the Satisfaction_with_Speed_of_Execution is 3, and the Satisfaction_with_Trade_Price is 4.

obs = data.frame(Satisfaction_with_Speed_of_Execution = c(3), Satisfaction_with_Trade_Price = c(4))
obs

##   Satisfaction_with_Speed_of_Execution Satisfaction_with_Trade_Price
## 1                                    3                             4

Obtain the prediction for the new observations.

predict(model1, obs, type = "response")

##        1 
## 3.904117

Obtain the prediction intervals for new observation.

predict(model1, obs, interval = "predict", level = 0.90, type = "response")

##        fit      lwr      upr
## 1 3.904117 3.174452 4.633781

There is a 90% chance that this prediction will fall between 3.174452 and 4.633781.
b) When the Satisfaction_with_Speed_of_Execution is 3 and the Satisfaction_with_Trade_Price is 4.

predict(model1, obs, interval = "confidence", level = 0.90, type = "response")

##        fit      lwr      upr
## 1 3.904117 3.514362 4.293871

There is a 90% chance that the mean response (i.e., mean value of the target variable) will fall between 3.514362 and 4.293871.

c)Suppose that we want to use the regression model created in the preceding question to predict the Overall_Satisfaction_with_Electronic_Trades when the Satisfaction_with_Speed_of_Execution is 2, and the Satisfaction_with_Trade_Price is 3.

obs2 = data.frame(Satisfaction_with_Speed_of_Execution = c(3), Satisfaction_with_Trade_Price = c(4))
obs2

##   Satisfaction_with_Speed_of_Execution Satisfaction_with_Trade_Price
## 1                                    3                             4

Obtain the prediction intervals for the new observation.

predict(model1, obs2, interval = "predict", level = 0.85, type = "response")

##        fit      lwr      upr
## 1 3.904117 3.275346 4.532887

There is an 85% chance that this prediction will fall between 3.275346 and 4.532887.

When the Satisfaction_with_Speed_of_Execution is 2 and the Satisfaction_with_Trade_Price is 3.

predict(model1, obs2, interval = "confidence", level = 0.85, type = "response")

##        fit      lwr      upr
## 1 3.904117 3.568255 4.239978

There is an 85% chance that the mean response will fall between 3.568255 and 4.239978.

Use unit normal scaling to calculate standardized regression coefficients for the model that you created in #1. Based on these coefficients, which covariate is more influential in predicting overall satisfaction? Is the Satisfaction_with_Speed_of_Execution more influential than the Satisfaction_with_Trade_Price? Or is the Satisfaction_with_Trade_Price more influential than the Satisfaction_with_Speed_of_Execution?

summary(model1)

## 
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution + 
##     Satisfaction_with_Trade_Price, data = Brokerage)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58886 -0.13863 -0.09120  0.05781  0.64613 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           -0.6633     0.8248  -0.804 0.438318    
## Satisfaction_with_Speed_of_Execution   0.4897     0.2016   2.429 0.033469 *  
## Satisfaction_with_Trade_Price          0.7746     0.1521   5.093 0.000348 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3435 on 11 degrees of freedom
## Multiple R-squared:  0.7256, Adjusted R-squared:  0.6757 
## F-statistic: 14.54 on 2 and 11 DF,  p-value: 0.0008157

# Transform the data using unit normal scaling
broker_unit_normal = as.data.frame(apply(Brokerage, 2, function(X){(X - mean(X))/sd(X)}))
# Redo regression
model1_unit_normal <- lm(Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution + 
    Satisfaction_with_Trade_Price, data = broker_unit_normal)
# Obtain stardardized regression coeffcients
model1_unit_normal

## 
## Call:
## lm(formula = Overall_Satisfaction_with_Electronic_Trades ~ Satisfaction_with_Speed_of_Execution + 
##     Satisfaction_with_Trade_Price, data = broker_unit_normal)
## 
## Coefficients:
##                          (Intercept)  Satisfaction_with_Speed_of_Execution  
##                            4.115e-16                             3.870e-01  
##        Satisfaction_with_Trade_Price  
##                            8.115e-01

Satisfaction with trade price is more influential predicting the Overall Satisfaction.

Using data_Rocket_Prop file.

rocket<- read.csv("/Users/jusimioni/Desktop/data_RocketProp.csv")

#Linear Regression
model <- lm(y ~x, data = rocket)
summary(model)

## 
## Call:
## lm(formula = y ~ x, data = rocket)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -215.98  -50.68   28.74   66.61  106.76 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2627.822     44.184   59.48  < 2e-16 ***
## x            -37.154      2.889  -12.86 1.64e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 96.11 on 18 degrees of freedom
## Multiple R-squared:  0.9018, Adjusted R-squared:  0.8964 
## F-statistic: 165.4 on 1 and 18 DF,  p-value: 1.643e-10

The regression model show a strong relathionship between x and y explaining 90.18% of the data.

#Design Matrix
X= model.matrix(model)
X

##    (Intercept)     x
## 1            1 15.50
## 2            1 23.75
## 3            1  8.00
## 4            1 17.00
## 5            1  5.50
## 6            1 19.00
## 7            1 24.00
## 8            1  2.50
## 9            1  7.50
## 10           1 11.00
## 11           1 13.00
## 12           1  3.75
## 13           1 25.00
## 14           1  9.75
## 15           1 22.00
## 16           1 18.00
## 17           1  6.00
## 18           1 12.50
## 19           1  2.00
## 20           1 21.50
## attr(,"assign")
## [1] 0 1

Claulate the leverage for the data points.

cbind(rocket, leverage = hatvalues(model))

##          y     x   leverage
## 1  2158.70 15.50 0.05412893
## 2  1678.15 23.75 0.14750959
## 3  2316.00  8.00 0.07598722
## 4  2061.30 17.00 0.06195725
## 5  2207.50  5.50 0.10586587
## 6  1708.30 19.00 0.07872092
## 7  1784.70 24.00 0.15225968
## 8  2575.00  2.50 0.15663134
## 9  2357.90  7.50 0.08105925
## 10 2256.70 11.00 0.05504393
## 11 2165.20 13.00 0.05011875
## 12 2399.55  3.75 0.13350221
## 13 1779.80 25.00 0.17238964
## 14 2336.75  9.75 0.06179345
## 15 1765.30 22.00 0.11742196
## 16 2053.50 18.00 0.06943538
## 17 2414.40  6.00 0.09898644
## 18 2200.50 12.50 0.05067227
## 19 2654.20  2.00 0.16667373
## 20 1753.70 21.50 0.10984216

The maximun leverage value for the model

max(hatvalues(model))

## [1] 0.1723896

The maximun value is 0.1723896.

Predicting y when x is 25.5.

x_new = c(x = c(1,25.5))
t(x_new)%*%solve(t(X)%*%X)%*%x_new

##           [,1]
## [1,] 0.1831324

This prediction is considered extrapolation since the leverage value is larger than the leverage maximun value. Predicting y when x is 15.

x_new1 = c(x = c(1,15))
t(x_new1)%*%solve(t(X)%*%X)%*%x_new1

##            [,1]
## [1,] 0.05242319

This prediction is not extrapolation since the leverage value is smaller the the maximun leverage value.

Calculate Cook’s distance for all datapoints.

cooks.distance(model)

##            1            2            3            4            5            6 
## 0.0373281981 0.0497291858 0.0010260760 0.0161482719 0.3343768993 0.2290842436 
##            7            8            9           10           11           12 
## 0.0270491200 0.0191323748 0.0003959877 0.0047094549 0.0012482345 0.0761514881 
##           13           14           15           16           17           18 
## 0.0889892211 0.0192517639 0.0166302585 0.0387158541 0.0005955991 0.0041888627 
##           19           20 
## 0.1317143774 0.0425721512

max(cooks.distance(model))

## [1] 0.3343769

The maximun value for cook’s distance is 0.3343768993. There are no outliers we should be concerned since we do not have values above 1.

Week 3

Julia Simioni

2022-10-03