crdt=read.table('DATA_3.01_CREDIT.csv',sep=',',header=TRUE)
For the Credit Scoring dataset (DATA_3.01_CREDIT.csv): using the same specifications as the examples covered during the videos, which of the following claims is correct? (You can select multiple answers.)
The regression predicts correctly 98.67% of the observations
The variable education has a positive impact on the rating, everything else being equal
The variable balance has a positive impact on the rating, everything else being equal
The variable gender has a positive impact on the rating, everything else being equal
Students have, everything else being equal, a weaker rating
People with larger income have, everything else being equal, a better rating
linreg=lm(Rating~ .,data=crdt)
summary(linreg)
##
## Call:
## lm(formula = Rating ~ ., data = crdt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -81.316 -10.820 4.875 16.588 43.501
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 140.881377 9.666167 14.575 <2e-16 ***
## Income 2.094703 0.048118 43.533 <2e-16 ***
## Cards -0.762853 1.079874 -0.706 0.4805
## Age 0.144603 0.085872 1.684 0.0933 .
## Education 0.179388 0.473743 0.379 0.7052
## GenderFemale 1.770375 2.917842 0.607 0.5445
## StudentYes -98.804778 4.959789 -19.921 <2e-16 ***
## MarriedYes 3.176873 3.005535 1.057 0.2914
## EthnicityAsian -4.428289 4.006859 -1.105 0.2700
## EthnicityCaucasian -1.250612 3.533864 -0.354 0.7237
## Balance 0.231363 0.003661 63.189 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.91 on 289 degrees of freedom
## Multiple R-squared: 0.9736, Adjusted R-squared: 0.9727
## F-statistic: 1067 on 10 and 289 DF, p-value: < 2.2e-16
model Accuracy : %97.27, the significant variables are Income, Student, Balance
Number 3, 5,6 selected
The answer is positive NS NS
linreg=lm(Rating~ Income+Cards+Married,data=crdt)
summary(linreg)
##
## Call:
## lm(formula = Rating ~ Income + Cards + Married, data = crdt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -173.92 -77.36 -0.61 85.98 181.53
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 165.2144 16.6412 9.928 <2e-16 ***
## Income 3.4196 0.1637 20.896 <2e-16 ***
## Cards 8.2699 4.0988 2.018 0.0445 *
## MarriedYes 11.8404 11.3387 1.044 0.2972
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 95.71 on 296 degrees of freedom
## Multiple R-squared: 0.6016, Adjusted R-squared: 0.5975
## F-statistic: 149 on 3 and 296 DF, p-value: < 2.2e-16
The answer is D Positive Positive NS
datatot=read.table('DATA_3.02_HR2.csv', header = T,sep=',')
logreg = glm(left ~ ., family=binomial(logit), data=datatot)
cutoff=.5 # Cutoff to determine when P[leaving] should be considered as a leaver or not. Note you can play with it...
sum((logreg$fitted.values<=cutoff)&(datatot$left==0))/sum(datatot$left==0) # Compute the percentage of correctly classified employees who stayed
## [1] 0.9464
sum((logreg$fitted.values>cutoff)&(datatot$left==1))/sum(datatot$left==1) # Compute the percentage of correctly classified employees who left
## [1] 0.1905
mean((logreg$fitted.values>cutoff)==(datatot$left==1)) # Compute the overall percentage of correctly classified employees
## [1] 0.8204167
The answer is 95 19 82
logreg = glm(formula = left ~ ., family = binomial(logit), data = datatot)
summary(logreg)
##
## Call:
## glm(formula = left ~ ., family = binomial(logit), data = datatot)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1727 -0.5410 -0.3535 -0.1994 3.0949
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2412448 0.1601334 -7.751 9.09e-15 ***
## S -3.8163201 0.1207448 -31.607 < 2e-16 ***
## LPE 0.5044011 0.1809102 2.788 0.0053 **
## NP -0.3591952 0.0264709 -13.569 < 2e-16 ***
## ANH 0.0037840 0.0006237 6.067 1.30e-09 ***
## TIC 0.6187913 0.0271161 22.820 < 2e-16 ***
## Newborn -1.4851023 0.1128772 -13.157 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10813.5 on 11999 degrees of freedom
## Residual deviance: 8508.9 on 11993 degrees of freedom
## AIC: 8522.9
##
## Number of Fisher Scoring iterations: 6
The answer is B,C,D