The dataset teengamb concerns a study of teenage gambling in Britain. Fit a regression model with the expenditure on gambling as the response and the sex, status, income and verbal score as predictors. Present the output.

rm(list=ls(all=TRUE)) 
library(faraway)
data(teengamb)
teengamb$sex <- factor(teengamb$sex)
attach(teengamb)
teengamb[1:3,]
##   sex status income verbal gamble
## 1   1     51    2.0      8      0
## 2   1     28    2.5      8      0
## 3   1     37    2.0      6      0
gamb.lm <- lm(gamble ~ sex+status+income+verbal) #how does it treat factor veriables?
summary(gamb.lm)
## 
## Call:
## lm(formula = gamble ~ sex + status + income + verbal)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -51.082 -11.320  -1.451   9.452  94.252 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  22.55565   17.19680   1.312   0.1968    
## sex1        -22.11833    8.21111  -2.694   0.0101 *  
## status        0.05223    0.28111   0.186   0.8535    
## income        4.96198    1.02539   4.839 1.79e-05 ***
## verbal       -2.95949    2.17215  -1.362   0.1803    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.69 on 42 degrees of freedom
## Multiple R-squared:  0.5267, Adjusted R-squared:  0.4816 
## F-statistic: 11.69 on 4 and 42 DF,  p-value: 1.815e-06
#(1) What percentage of variation in the response is explained by these predictors?
    # R^2 = 0.5267

#(2) Which observation has the largest (positive) residual? Give the case number.
    # the 24th observation
gamb.lm$residuals; max(gamb.lm$residuals); 
##           1           2           3           4           5           6 
##  10.6507430   9.3711318   5.4630298 -17.4957487  29.5194692  -2.9846919 
##           7           8           9          10          11          12 
##  -7.0242994 -12.3060734   6.8496267 -10.3329505   1.5934936  -3.0958161 
##          13          14          15          16          17          18 
##   0.1172839   9.5331344   2.8488167  17.2107726 -25.2627227 -27.7998544 
##          19          20          21          22          23          24 
##  13.1446553 -15.9510624 -16.0041386  -9.5801478 -27.2711657  94.2522174 
##          25          26          27          28          29          30 
##   0.6993361  -9.1670510 -25.8747696  -8.7455549  -6.8803097 -19.8090866 
##          31          32          33          34          35          36 
##  10.8793766  15.0599340  11.7462296  -3.5932770 -14.4016736  45.6051264 
##          37          38          39          40          41          42 
##  20.5472529  11.2429290 -51.0824078   8.8669438  -1.4513921  -3.8361619 
##          43          44          45          46          47 
##  -4.3831786 -14.8940753   5.4506347   1.4092321   7.1662399
## [1] 94.25222
#(3) mean and median of the residuals
mean(gamb.lm$residuals); #mean is nealy 0.
## [1] -3.065293e-17
median(gamb.lm$residuals)
## [1] -1.451392
#(4) correlation of the residuals with the fitted values
fitted_value <- gamble - gamb.lm$residuals
cor(gamb.lm$residuals, fitted_value)
## [1] -1.070659e-16
#confused---
#but the fitted value above is different with using the following method,why?
#beta <- summary(gamb.lm)$coefficients[,1]
#fitted_value_2 <- beta[1] + beta[2]*sex1+ beta[3]*status + beta[4]*income + beta[5]*verbal


#(5) correlation of the residuals with the income
cor(gamb.lm$residuals, income)
## [1] -7.242382e-17
  1. For all other predictors held constant, what would be the difference in predicted expenditure on gambling for a male compared to a female?

Sex=1 : female

Sex=0 : male

coefficient of sex: -22.1183301

Based on this result, on average, teenage females spent 22.1183301 $ less on gamble than teenage males.