Data Set

The data set for this assignment is the Credit data from the ISLR package. As described in course lectures, the goal for this assignment is to better understand those individuals who keep a balance on their credit card. Due to that, we’re only going to look at individuals who have a balance on their credit cards.

library(ISLR)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(alr4)
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## Loading required package: effects
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
Credit2=Credit%>%filter(Balance>0)

For this assignment you will use the Credit2 data set.

Problem 1

We want to better understand how Balance relates to Limit, Rating, and Income. We know from class we shouldn’t use Limit and Rating in the same model. Pick only Limit or Rating (your choice) and explain why you choose it for the model. Then fit a linear regression with Balance as the response and your pick of Limit or Rating and Income as predictors. Interpret the model to explain how the two predictors relate to the response and provide visual insights around what each of the slope coefficients mean and their importance. As a reminder interpreting a model means discussing the regression coefficients, R-squared value, and any other insights about the relationship between the predictors that may be of interest to a stakeholder. For instance how does the presence of Income in the model impact the effect of Limit or Rating? To do this use effects and/or added variable plots.

summary(Credit2)
##        ID             Income           Limit           Rating     
##  Min.   :  1.00   Min.   : 10.35   Min.   : 1160   Min.   :126.0  
##  1st Qu.: 98.25   1st Qu.: 23.15   1st Qu.: 3976   1st Qu.:304.0  
##  Median :209.50   Median : 37.14   Median : 5147   Median :380.0  
##  Mean   :202.44   Mean   : 49.98   Mean   : 5485   Mean   :405.1  
##  3rd Qu.:306.50   3rd Qu.: 63.74   3rd Qu.: 6453   3rd Qu.:469.0  
##  Max.   :400.00   Max.   :186.63   Max.   :13913   Max.   :982.0  
##      Cards            Age          Education        Gender    Student  
##  Min.   :1.000   Min.   :23.00   Min.   : 5.00    Male :145   No :271  
##  1st Qu.:2.000   1st Qu.:42.00   1st Qu.:11.00   Female:165   Yes: 39  
##  Median :3.000   Median :55.50   Median :14.00                         
##  Mean   :2.997   Mean   :55.61   Mean   :13.43                         
##  3rd Qu.:4.000   3rd Qu.:69.00   3rd Qu.:16.00                         
##  Max.   :9.000   Max.   :98.00   Max.   :20.00                         
##  Married              Ethnicity      Balance      
##  No :118   African American: 78   Min.   :   5.0  
##  Yes:192   Asian           : 74   1st Qu.: 338.0  
##            Caucasian       :158   Median : 637.5  
##                                   Mean   : 671.0  
##                                   3rd Qu.: 960.8  
##                                   Max.   :1999.0
head(Credit2)
dim(Credit2)
## [1] 310  12
o1<-lm(Balance~Limit+Income, data=Credit2)
summary(o1)
## 
## Call:
## lm(formula = Balance ~ Limit + Income, data = Credit2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -176.70  -92.93  -52.35   -4.41  521.68 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.165e+02  3.145e+01  -16.43   <2e-16 ***
## Limit        2.978e-01  8.424e-03   35.35   <2e-16 ***
## Income      -8.924e+00  4.564e-01  -19.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 167.8 on 307 degrees of freedom
## Multiple R-squared:  0.8366, Adjusted R-squared:  0.8356 
## F-statistic: 786.2 on 2 and 307 DF,  p-value: < 2.2e-16
plot(o1)

avPlots(o1)

plot(allEffects(o1))

I chose limit and income as the variables for the model because the units for both are money and I believe it provides a more consistent model. The intercept for the model is at -516. As limit goes up by one unit, the balance will increase by 2.97. As income increases by one unit, the balance will decrease by 8.92. The p-values of income and limit are both significant for balance because they are less than 0.05. The multiple R-squared is 0.8366, which tells me there can be some signifcant variance in the model.

The added variable plot for limit and balance shows that there is a positive linear relationship between them, meaning that as limit increases, so does balance. The added variable plot for balance and income shows a negative linear relationship, meaning that as income increases, the balance will as well. The effects plot for limit and balance confirms this relationship, showing that as limit increases, so does balance, with increased variance later in the data. The effects plot for income also confirms the relationship, where balance decreases, so does income. It also shows increased variance later in the data set.

Problem 2

We want to better understand how Balance relates to Limit, Income, and Education. Then fit a linear regression with Balance as the response and Limit, Income, and Education as predictors. Interpret the model to explain how the three predictors relate to the response and provide visual insights around what each of the slope coefficients mean and their importance. As a reminder, interpreting a model means discussing the regression coefficients, R-squared value, and any other insights about the relationship between the predictors that may be of interest to a stakeholder. For instance how does the presence of Income in the model impact the effect of Limit and Education? To do this use effects and/or added variable plots.

summary(Credit2)
##        ID             Income           Limit           Rating     
##  Min.   :  1.00   Min.   : 10.35   Min.   : 1160   Min.   :126.0  
##  1st Qu.: 98.25   1st Qu.: 23.15   1st Qu.: 3976   1st Qu.:304.0  
##  Median :209.50   Median : 37.14   Median : 5147   Median :380.0  
##  Mean   :202.44   Mean   : 49.98   Mean   : 5485   Mean   :405.1  
##  3rd Qu.:306.50   3rd Qu.: 63.74   3rd Qu.: 6453   3rd Qu.:469.0  
##  Max.   :400.00   Max.   :186.63   Max.   :13913   Max.   :982.0  
##      Cards            Age          Education        Gender    Student  
##  Min.   :1.000   Min.   :23.00   Min.   : 5.00    Male :145   No :271  
##  1st Qu.:2.000   1st Qu.:42.00   1st Qu.:11.00   Female:165   Yes: 39  
##  Median :3.000   Median :55.50   Median :14.00                         
##  Mean   :2.997   Mean   :55.61   Mean   :13.43                         
##  3rd Qu.:4.000   3rd Qu.:69.00   3rd Qu.:16.00                         
##  Max.   :9.000   Max.   :98.00   Max.   :20.00                         
##  Married              Ethnicity      Balance      
##  No :118   African American: 78   Min.   :   5.0  
##  Yes:192   Asian           : 74   1st Qu.: 338.0  
##            Caucasian       :158   Median : 637.5  
##                                   Mean   : 671.0  
##                                   3rd Qu.: 960.8  
##                                   Max.   :1999.0
head(Credit2)
t1<-lm(Balance~Income+Education+Limit, data=Credit2)
summary(t1)
## 
## Call:
## lm(formula = Balance ~ Income + Education + Limit, data = Credit2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -180.92  -93.75  -51.57   -4.59  531.25 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.536e+02  5.137e+01 -10.778   <2e-16 ***
## Income      -8.919e+00  4.566e-01 -19.535   <2e-16 ***
## Education    2.720e+00  2.978e+00   0.913    0.362    
## Limit        2.979e-01  8.427e-03  35.347   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 167.9 on 306 degrees of freedom
## Multiple R-squared:  0.8371, Adjusted R-squared:  0.8355 
## F-statistic: 524.1 on 3 and 306 DF,  p-value: < 2.2e-16
#library("alr4")
avPlots(t1)

plot(allEffects(t1))

The summary of the data shows that the intercept of the data is at -553 and as the income is increased by one, the balance will decrease by 8.919 units. As education goes up by one unit, the balance will go up by 2.72 units, and as the limit is increased by one, the balance will increase by 2.979 units. The p-value for income shows there is a relationship between income and balance because it is below 0.05. The same applies for limit and income. The p-value for education shows that there is not a significant relationship between balance and education because it is greater than 0.05.

Based on the added-variable plots, we can see that when limit and income are set to their averages, that balance has a positive linear relationship with limit and an negative linear relationship with income. So, as limit increases, balance does as well and as income increases, balance decreases. For income, this is confirmed with the effects plot, where you can see increased variability as it goes. We can also see the trend for the limit as well with its effects plot. It shows that as limit increases, so does balance with increased variability as it goes. When education and balance are set to their averages, we can see that there is a linear relationship that isn’t as pretty as the other graphs. It shows that as education increases, balance does as well. We can also see this with the education effects plot, where balance increases as education does, with a bit more increased variability.