R Markdown
rm(list=ls())
bank = na.omit(read.csv(choose.files(), header = TRUE))
attach(bank)
1.)
c.)
Deal with what you found in part (b) appropriately.
par(mfrow = c(1,3))
hist(inc)
loginc = log(inc)
hist(loginc)
sqinc = sqrt(inc)
hist(sqinc)

From the graphs shown above, having the income variable undergo a
log transformation gives us a graph with the most normal data
points.
e.)
Produce a scatterplot matrix. Do the correlations from part(d)
correspond to what you see in these plots? Describe how you know from
what you see visually with at least 2 examples. Should they correspond?
Explain.
pairs(~loan + inc + check + acc + revol + recov + rate)

The correlations from part (d) do correspond to what is shown in
these plots. This is prevalent through the scatterplot with loan against
rate, where the data is observed to have a small positive correlation
across the data points. Additionally, the recov against rate graph
displays a small positive correlation across the data points due to the
points have an up and rightward trend.
f.)
Fit a regression model predicting the interest rate with all of the
other quantitative variables. Call this model m1.
m1 = lm(rate ~ loan + loginc + check + acc + revol + recov)
summary(m1)
##
## Call:
## lm(formula = rate ~ loan + loginc + check + acc + revol + recov)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3611 -2.4167 -0.0143 2.5288 8.0753
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.115e+00 3.984e+00 2.037 0.04229 *
## loan 1.338e-04 2.162e-05 6.188 1.5e-09 ***
## loginc 3.251e-01 3.735e-01 0.871 0.38452
## check 2.011e-01 7.729e-02 2.601 0.00962 **
## acc 2.865e-02 3.353e-02 0.854 0.39339
## revol -2.786e-06 6.103e-06 -0.457 0.64822
## recov 1.104e-04 8.692e-05 1.270 0.20484
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.402 on 405 degrees of freedom
## Multiple R-squared: 0.1496, Adjusted R-squared: 0.137
## F-statistic: 11.88 on 6 and 405 DF, p-value: 2.74e-12
h.)
Use m1 and do a hypothesis test to determine whether the recov
variable is a significant predictor of interest rate. Provide hypothesis
statements, test statistic, critical value, and a decision. Use alpha =
0.10.
qt(0.05, 405)
## [1] -1.648625
pt(-1.27, 405)*2
## [1] 0.2048138
H0: B6 = 0
Ha: B6 != 0
Test statistic = 1.27
t c.v = 1.648625
Given that the test statistic = 1.27 < t c.v = 1.648625, we Do
not reject the Null Hypothesis. There is not significant evidence to
suggest that the null hypothesis is false. We can not conclude if recov
is a significant predictor of interest rate at the alpha level of
0.10
i.)
Using m1, is the check variable significant at alpha = 0.05? How do
you know?
qt(0.025, 405)
## [1] -1.965839
H0: B3 = 0
Ha: B3 != 0
Test statistic = 2.601
t c.v = 1.965839
Given that the test statistic = 2.601 > t c.v = 1.965839, we Do
reject the Null Hypothesis. We can conclude that check is a significant
predictor of interest rate at an alpha level of 0.05. Additionally this
is noted in the model by viewing the star symbols to the right of the
p-values. Because check has two star values, it is found to be a
significant predictor of interest rate at the alpha level of 0.05.
j.)
Using m1, interpret the slope for the acc variable.
Interpret the slope for acc: Given the slope is 2.865e-02, for every
one change in acc, the interest rate will change by 2.865e-0 rate units,
all other variables held constant.
k.)
Starting from m1, try a squared term. Tell my why you chose what you
did and whether it turned out significant at alpha = 0.05.
sqacc = acc^2
m2 = lm(rate ~ loan + loginc + check + sqacc + revol + recov)
summary(m2)
##
## Call:
## lm(formula = rate ~ loan + loginc + check + sqacc + revol + recov)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2482 -2.4094 -0.0155 2.4835 8.1352
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.640e+00 3.985e+00 2.168 0.03073 *
## loan 1.350e-04 2.159e-05 6.254 1.02e-09 ***
## loginc 2.798e-01 3.712e-01 0.754 0.45150
## check 1.996e-01 7.698e-02 2.592 0.00988 **
## sqacc 2.120e-03 1.291e-03 1.642 0.10146
## revol -3.642e-06 6.097e-06 -0.597 0.55060
## recov 1.120e-04 8.671e-05 1.292 0.19725
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.394 on 405 degrees of freedom
## Multiple R-squared: 0.1537, Adjusted R-squared: 0.1412
## F-statistic: 12.26 on 6 and 405 DF, p-value: 1.086e-12
qt(0.025, 405)
## [1] -1.965839
I chose the acc variable to undergo a squaring transformation
because viewing the scatterplots in part (e), the acc scatterplot has a
distribution of points that appear to have a strong initial slope, and
then taper off partway through. Applying the squaring function helps
adjust the points to appear more linearly.
H0: B4 = 0
Ha: B4 != 0
Test statistic = 1.642
t c.v = 1.965839
Given that the test statistic = 1.642 < t c.v = 1.965839, we Do
not reject the Null Hypothesis. There is not significant evidence to
suggest that the null hypothesis is false. We can not conclude if sqacc
is a significant predictor of interest rate at the alpha level of
0.05
l.)
Starting from m1, pick a logical variable and find its partial
correlation coefficient. What is the interpretation of this
coefficient.
6.188/(sqrt(6.188^2 + 405))
## [1] 0.2939041
Interpretation of the coefficient from part (l): 0.2939041 of the
change in interest rate is expressed by the predictor “loan”, holding
other predictors constant. on of the coefficient from part (l):
0.2968017 of the change in interest rate is expressed by the predictor
“loan”, holding other predictors constant.
Comment on R^2 and R^2a from m1. Interpret R^2a. Make these clearly separate.
Comment on R^2: R^2 from m1 is a very small value only being 0.1496. This is not a strong value and likely means there is still another predictor out there that can help in determining rate. Comment on R^2a: This value is very small and indicates that the model is not a strong predictor of rate. It is likely there is still another variable not in our model that can help in predicting interest rate. Interpreting R^2a: 0.137 of the change in interest rate is predicted by the variables in the model after adjusting for the amount of variables used in the model.