The dataset teengamb concerns a study of teenage gambling in Britain. Fit a regression model with the expenditure on gambling as the response and the sex, status, income and verbal score as predictors. Present the output.
rm(list=ls(all=TRUE))
library(faraway)
data(teengamb)
teengamb$sex <- factor(teengamb$sex)
attach(teengamb)
teengamb[1:3,]
## sex status income verbal gamble
## 1 1 51 2.0 8 0
## 2 1 28 2.5 8 0
## 3 1 37 2.0 6 0
gamb.lm <- lm(gamble ~ sex+status+income+verbal) #how does it treat factor veriables?
summary(gamb.lm)
##
## Call:
## lm(formula = gamble ~ sex + status + income + verbal)
##
## Residuals:
## Min 1Q Median 3Q Max
## -51.082 -11.320 -1.451 9.452 94.252
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.55565 17.19680 1.312 0.1968
## sex1 -22.11833 8.21111 -2.694 0.0101 *
## status 0.05223 0.28111 0.186 0.8535
## income 4.96198 1.02539 4.839 1.79e-05 ***
## verbal -2.95949 2.17215 -1.362 0.1803
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.69 on 42 degrees of freedom
## Multiple R-squared: 0.5267, Adjusted R-squared: 0.4816
## F-statistic: 11.69 on 4 and 42 DF, p-value: 1.815e-06
#(1) What percentage of variation in the response is explained by these predictors?
# R^2 = 0.5267
#(2) Which observation has the largest (positive) residual? Give the case number.
# the 24th observation
gamb.lm$residuals; max(gamb.lm$residuals);
## 1 2 3 4 5 6
## 10.6507430 9.3711318 5.4630298 -17.4957487 29.5194692 -2.9846919
## 7 8 9 10 11 12
## -7.0242994 -12.3060734 6.8496267 -10.3329505 1.5934936 -3.0958161
## 13 14 15 16 17 18
## 0.1172839 9.5331344 2.8488167 17.2107726 -25.2627227 -27.7998544
## 19 20 21 22 23 24
## 13.1446553 -15.9510624 -16.0041386 -9.5801478 -27.2711657 94.2522174
## 25 26 27 28 29 30
## 0.6993361 -9.1670510 -25.8747696 -8.7455549 -6.8803097 -19.8090866
## 31 32 33 34 35 36
## 10.8793766 15.0599340 11.7462296 -3.5932770 -14.4016736 45.6051264
## 37 38 39 40 41 42
## 20.5472529 11.2429290 -51.0824078 8.8669438 -1.4513921 -3.8361619
## 43 44 45 46 47
## -4.3831786 -14.8940753 5.4506347 1.4092321 7.1662399
## [1] 94.25222
#(3) mean and median of the residuals
mean(gamb.lm$residuals); #mean is nealy 0.
## [1] -3.065293e-17
median(gamb.lm$residuals)
## [1] -1.451392
#(4) correlation of the residuals with the fitted values
fitted_value <- gamble - gamb.lm$residuals
cor(gamb.lm$residuals, fitted_value)
## [1] -1.070659e-16
#confused---
#but the fitted value above is different with using the following method,why?
#beta <- summary(gamb.lm)$coefficients[,1]
#fitted_value_2 <- beta[1] + beta[2]*sex1+ beta[3]*status + beta[4]*income + beta[5]*verbal
#(5) correlation of the residuals with the income
cor(gamb.lm$residuals, income)
## [1] -7.242382e-17
Sex=1 : female
Sex=0 : male
coefficient of sex: -22.1183301
Based on this result, on average, teenage females spent 22.1183301 $ less on gamble than teenage males.