Problem Set 3

Page 52
Page 58
Page 72
Page 80
Page 91
Page 100
Page 104
Page 115
Page 124

Page 52

Exercise

Test whether the slope coefficient for the father.son data is different from zero (father as predictor, son as outcome).


Call:
lm(formula = sheight ~ fheight, data = father.son)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.8772 -1.5144 -0.0079  1.6285  8.9685 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.88660    1.83235   18.49   <2e-16 ***
fheight      0.51409    0.02705   19.01   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.437 on 1076 degrees of freedom
Multiple R-squared:  0.2513,    Adjusted R-squared:  0.2506 
F-statistic: 361.2 on 1 and 1076 DF,  p-value: < 2.2e-16

    Y = β_0 + β_1x + ε
      H_o: β_1 = o
      H_a: β_1 ≠ o

Refer to question 1. Form a confidence interval for the slope coefficient.

                 2.5 %     97.5 %
(Intercept) 30.2912126 37.4819961
fheight      0.4610188  0.5671673

Refer to question 1. Form a confidence interval for the intercept (center the fathers’ heights first to get an intercept that is easier to interpret).

                                2.5 %     97.5 %
(Intercept)                68.5384554 68.8296839
I(fheight - mean(fheight))  0.4610188  0.5671673

Refer to question 1. Form a mean value interval for the expected son’s height at the average father’s height.

       fit      lwr      upr
1 68.68407 68.53846 68.82968

Refer to question 1. Form a prediction interval for the son’s height at the average father’s height.

       fit      lwr      upr
1 68.68407 63.90091 73.46723

Load the mtcars dataset. Fit a linear regression with miles per gallon as the outcome and horsepower as the predictor. Test whether or not the horsepower power coefficient is statistically different from zero. Interpret your test.

               Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 30.09886054  1.6339210 18.421246 6.642736e-18
hp          -0.06822828  0.0101193 -6.742389 1.787835e-07

The extremely low p-value for the slope (1.7878353^{-7}) indicates strong evidence to reject the null hypothesis. The slope coefficient significantly deviates from zero, implying that horsepower is a statistically significant predictor of miles per gallon.

Refer to question 6. Form a confidence interval for the slope coefficient.

                  2.5 %     97.5 %
(Intercept) 26.76194879 33.4357723
hp          -0.08889465 -0.0475619

Refer to quesiton 6. Form a confidence interval for the intercept (center the HP variable first).

                  2.5 %     97.5 %
(Intercept) 18.69599452 21.4852555
chp         -0.08889465 -0.0475619

Refer to question 6. Form a mean value interval for the expected MPG for the average HP.

       fit      lwr      upr
1 20.09062 18.69599 21.48526

Refer to question 6. Form a prediction interval for the expected MPG for the average HP.

       fit      lwr      upr
1 20.09062 12.07908 28.10217

Refer to question 6. Create a plot that has the fitted regression line plus curves at the expected value and prediction intervals.

Page 58

Exercises

Load the dataset Seatbelts as part of the datasets package via data(Seatbelts). Use as.data.frame to convert the object to a dataframe. Fit a linear model of driver deaths with kms and PetrolPrice as predictors. Interpret your results.

                 Estimate   Std. Error   t value     Pr(>|t|)
(Intercept)  2.157461e+02 1.466559e+01 14.711047 3.772201e-33
kms         -1.749546e-03 6.145401e-04 -2.846919 4.902428e-03
PetrolPrice -6.437895e+02 1.482896e+02 -4.341435 2.304713e-05

             Estimate Std. Error t value Pr(>|t|)
(Intercept)  215.7461    14.6656 14.7110   0.0000
kms           -0.0017     0.0006 -2.8469   0.0049
PetrolPrice -643.7895   148.2896 -4.3414   0.0000

The interpretation of the intercept is problematic when considering it as the expected number of drivers killed for zero kilometers driven and a petrol price of 0, which lacks practical significance. To enhance interpretation, we center and re-scale both the kilometer and petrol price variables, placing them on a more meaningful scale.

              Estimate Std. Error   t value      Pr(>|t|)
(Intercept) 122.802083  1.6628507 73.850336 2.395106e-141
mkm          -1.749546  0.6145401 -2.846919  4.902428e-03
ppn          -7.838674  1.8055491 -4.341435  2.304713e-05

Predict the number of driver deaths at the average kms and petrol levels.

       1 
122.8021

Hence, we have the number of deaths at the mean of kms and PetrolPrice.

Take the residual for DriversKilled having regressed out kms and an intercept and the residual for PetrolPrice having regressed out kms and an intercept. Fit a regression through the origin of the two residuals and show that it is the same as your coefficient obtained in question 1.


Call:
lm(formula = edk ~ epp - 1)

Residuals:
   Min     1Q Median     3Q    Max 
-51.06 -17.77  -4.15  15.67  59.33 

Coefficients:
    Estimate Std. Error t value Pr(>|t|)    
epp   -643.8      147.5  -4.364 2.09e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.92 on 191 degrees of freedom
Multiple R-squared:  0.09068,   Adjusted R-squared:  0.08592 
F-statistic: 19.05 on 1 and 191 DF,  p-value: 2.086e-05

     Estimate Std. Error   t value     Pr(>|t|)
epp -643.7895   147.5111 -4.364345 2.085664e-05

                 Estimate   Std. Error   t value     Pr(>|t|)
(Intercept)  2.157461e+02 1.466559e+01 14.711047 3.772201e-33
kms         -1.749546e-03 6.145401e-04 -2.846919 4.902428e-03
pp          -6.437895e+02 1.482896e+02 -4.341435 2.304713e-05

Hence, we can see that epp and pp row are the same.

Take the residual for DriversKilled having regressed out PetrolPrice and an intercept. Take the residual for kms having regressed out PetrolPrice and an intercept. Fit a regression through the origin of the two residuals and show that it is the same as your coefficient obtained in question 1.

     Estimate Std. Error t value Pr(>|t|)
ekms  -0.0017      6e-04 -2.8619   0.0047

Thus, the coefficient we obtain is the same in question 1.

Page 72

Excercises

Do exercise 1 of the previous chapter if you have not already. Load the dataset Seatbelts as part of the datasets package via data(Seatbelts). Use as.data.frame to convert the object to a dataframe. Fit a linear model of driver deaths with kms and PetrolPrice as predictors. Interpret your results.

            Estimate Std. Error t value Pr(>|t|)
(Intercept) 122.8021     1.6629 73.8503   0.0000
mkm          -1.7495     0.6145 -2.8469   0.0049
ppn          -7.8387     1.8055 -4.3414   0.0000

The intercept, indicating deaths for 0 kilometers and 0 PetrolPrice, isn’t practical. It’s recommended to center the variables. The impact of 1 km on deaths is statistically insignificant and lacks meaning. Rescaling the variable would be more appropriate.

Repeat question 1 for the outcome being the log of the count of driver deaths. Interpret your coefficients.

               Estimate  Std. Error    t value      Pr(>|t|)
(Intercept)  4.78966306 0.013426810 356.723817 2.737888e-269
mkm         -0.01400794 0.004962149  -2.822959  5.267843e-03
ppn         -0.06412578 0.014579039  -4.398492  1.818005e-05

[1] 0.06211298

[1] 0.01391029

The interpretation of our normalized petrol price variable (ppn) is as follows: We expect a 6% decrease in the geometric mean of driver fatalities for each 1 standard deviation increase in normalized petrol price, while keeping kilometers constant.

Refer to question 1. Add the dummy variable law and interpret the results. Repeat this question with a factor variable that you create called lawFactor that takes the levels No and Yes. Change the reference level from No to Yes.


Call:
lm(formula = DriversKilled ~ mkm + ppn + law, data = stblts)

Residuals:
   Min     1Q Median     3Q    Max 
-50.69 -17.29  -4.05  14.33  60.71 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 124.2263     1.8012  68.967  < 2e-16 ***
mkm          -1.2233     0.6657  -1.838 0.067676 .  
ppn          -6.9199     1.8514  -3.738 0.000246 ***
law         -11.8892     6.0258  -1.973 0.049955 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.87 on 188 degrees of freedom
Multiple R-squared:  0.201, Adjusted R-squared:  0.1882 
F-statistic: 15.76 on 3 and 188 DF,  p-value: 3.478e-09

To find the post-law intercept at the average petrol price and average kilometers driven, we subtract 11. This implies that, after the law took effect, we anticipate about 12 fewer deaths per month. In other words, when the law variable changes from 0 to 1, we expect a decrease of 12 deaths per month, while keeping petrol price and kilometers driven constant.

                  Estimate Std. Error   t value      Pr(>|t|)
(Intercept)     124.226311  1.8012324 68.967399 1.976641e-135
mkm              -1.223318  0.6656567 -1.837761  6.767594e-02
ppn              -6.919949  1.8513987 -3.737687  2.463128e-04
I(factor(law))1 -11.889202  6.0257850 -1.973055  4.995497e-02

It’s worth noting that we observe identical values: -124.226311 for the intercept, -1.223318 for the first column, -6.919949 for the next, and -11.889202 for the last. These numbers indicate the choice of the reference level for each factor. Therefore, the value -11.889202 corresponds to factor level 1.

Discretize the PetrolPrice variable into four factor levels. Fit the linear model with this factor to see how R treats multiple level factor variables.


 1  2  3  4 
 6 96 71 19


Call:
lm(formula = DriversKilled ~ mkm + ppf + law, data = stblts)

Residuals:
    Min      1Q  Median      3Q     Max 
-53.384 -17.211  -3.421  14.849  65.613 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 109.8405     9.5066  11.554   <2e-16 ***
mkm          -1.2991     0.7668  -1.694   0.0919 .  
ppf2         10.8271     9.9462   1.089   0.2778    
ppf3         18.6904     9.9374   1.881   0.0616 .  
ppf4         25.0074    10.9163   2.291   0.0231 *  
law         -15.3445     6.0345  -2.543   0.0118 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.24 on 186 degrees of freedom
Multiple R-squared:  0.1833,    Adjusted R-squared:  0.1614 
F-statistic:  8.35 on 5 and 186 DF,  p-value: 3.835e-07

Perform the plot requested at the end of the last chapter.

#### Two levels model with interaction

Page 80

Exercises

Load the dataset Seatbelts as part of the datasets package via data(Seatbelts). Use as.data.frame to convert the object to a dataframe. Fit a linear model of driver deaths with kms and PetrolPrice as predictors. Interpret your results.

                 Estimate   Std. Error   t value     Pr(>|t|)
(Intercept)  2.157461e+02 1.466559e+01 14.711047 3.772201e-33
kms         -1.749546e-03 6.145401e-04 -2.846919 4.902428e-03
PetrolPrice -6.437895e+02 1.482896e+02 -4.341435 2.304713e-05

              Estimate Std. Error   t value      Pr(>|t|)
(Intercept) 122.802083  1.6628507 73.850336 2.395106e-141
mkm          -1.749546  0.6145401 -2.846919  4.902428e-03
ppn          -7.838674  1.8055491 -4.341435  2.304713e-05

Compare the kms coefficient with and without the inclusion of the PetrolPrice variable in the model.

Correlation between the kms and petrol price

[1] 0.3839004

              Estimate Std. Error  t value      Pr(>|t|)
(Intercept) 122.802083  1.7391997 70.60839 2.665611e-138
mkm          -2.773787  0.5935049 -4.67357  5.596266e-06

              Estimate Std. Error   t value      Pr(>|t|)
(Intercept) 122.802083  1.6628507 73.850336 2.395106e-141
mkm          -1.749546  0.6145401 -2.846919  4.902428e-03
ppn          -7.838674  1.8055491 -4.341435  2.304713e-05

In this scenario, the estimate is negative, aligning with expectations as both effects are in the same direction and correlated. However, the estimate has shifted from -2.846919 to -1.749546, highlighting the confounding impact of the PetrolPrice variable on the regression between DriversKiller and kilometers driven.

Compare the PetrolPrice coefficient with and without the inclusion fo the kms variable in the model.

              Estimate Std. Error   t value      Pr(>|t|)
(Intercept) 122.802083   1.693656 72.507096 2.061333e-140
ppn          -9.812019   1.698084 -5.778288  3.044208e-08

              Estimate Std. Error   t value      Pr(>|t|)
(Intercept) 122.802083  1.6628507 73.850336 2.395106e-141
ppn          -7.838674  1.8055491 -4.341435  2.304713e-05
mkm          -1.749546  0.6145401 -2.846919  4.902428e-03

In this instance, the estimate remains negative, which is logical given that both effects align in the same direction and are correlated. However, the estimate has shifted from -9.812019 to 7.838674, signaling the confounding impact of the variable kilometers driven on the regression between DriversKiller and PetrolPrice.

Page 91

Excercises

Load the dataset Seatbelts as part of the datasets package via data(Seatbelts). Use as.data.frame to convert the object to a dataframe. Fit a linear model of driver deaths with kms, PetrolPrice and law as predictors.
Refer to question 1. Directly estimate the residual variation via the function resid. Compare with R’s residual variance estimate.

[1] 522.8903

[1] 522.8903

Refer to question 1. Perform an analysis of diagnostic measures including, dffits, dfbetas, influence and hat diagonals.

Evaluating the influence:

Using ggplot, we can create a dataframe with the dffits values, classify them, establish a threshold value, in this case |0.4|, and plot the values indicating the id of the values above the threshold (specialPoint).

  (Intercept)        mkm         ppn         law
1  -0.1022614 0.11881682 -0.09349927 -0.26195987
2  -0.1331656 0.22275746 -0.15807657 -0.56786007
3  -0.1279422 0.11452337 -0.07920934 -0.23464337
4  -0.2032275 0.13006726 -0.06413772 -0.23584650
5  -0.0524778 0.02353784 -0.01104844 -0.02756976
6  -0.1188004 0.03961875 -0.01108847 -0.02386593

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00600 0.01175 0.01700 0.02082 0.02600 0.06700

Page 100

Excercise

Load the dataset Seatbelts as part of the datasets package via data(Seatbelts). Use as.data.frame to convert the object to a dataframe. Fit a linear model of driver deaths with kms, PetrolPrice and law as predictors.

                 Estimate   Std. Error   t value     Pr(>|t|)
(Intercept)  2.157461e+02 1.466559e+01 14.711047 3.772201e-33
kms         -1.749546e-03 6.145401e-04 -2.846919 4.902428e-03
PetrolPrice -6.437895e+02 1.482896e+02 -4.341435 2.304713e-05

Perform a model selection exercise to arrive at a final model.

Analysis of Variance Table

Model 1: DriversKilled ~ law
Model 2: DriversKilled ~ law + mkm
Model 3: DriversKilled ~ law + ppn
Model 4: DriversKilled ~ law + mkm + ppn
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1    190 109754                                
2    189 105608  1    4145.3 7.9276 0.005388 **
3    189 100069  0    5538.9                   
4    188  98303  1    1766.0 3.3774 0.067676 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis of Variance Table

Model 1: DriversKilled ~ law
Model 2: DriversKilled ~ law + mkm + ppn
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1    190 109754                                  
2    188  98303  2     11450 10.949 3.177e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

      Estimate Std. Error   t value     Pr(>|t|)
[1,] -25.60895   5.341655 -4.794198 3.288375e-06
[2,] -17.55372   6.028888 -2.911602 4.028394e-03
[3,] -16.32618   5.555579 -2.938700 3.706585e-03
[4,] -11.88920   6.025785 -1.973055 4.995497e-02

Page 104

Exercises

True or false, generalized linear models transform the observed outcome. (Discuss.)

If we transform the observed outcome, \(log(Y)=β_0+β_1+ϵ\) all we’re doing is fitted a linear model where we transform the outcome (transform data then fit model) \(log(E[Y])=β_0+β_1X\) In GLMs, we’re transforming the parameters. Hence, it’s false.

True or false, the interpretation of the coefficients in a GLM are on the scale of the link function. (Discuss.)

All coefficeints are interpreted on link function scale. To get natural scale, you have to invert link function (in the case of poisson, take the exponent) \(log(E[Y|X=x])=β_0+β_1X\) So, when we get an estimate for \(β_1\) and see it in our coefficient table that is interpreted on the scale of the log of the expected value of the outcome so it is interpreted on the link function scale. Hence, True.

True or false, the generalized linear model assumes an exponential family for the outcome. (Discuss.)

True. GLMS start with assuming a distribution (linear-normal, bernoilli-binomial, poisson, gamma) for y. We then connect the distribution parameter of y to Xs with a link functions. We do this to continue modelling y on natural scale, and handle certain conditions (bernoilli - 0 or 1, poisson - strictly positive). One of the key parameters is the mean \((μ)\), and the relationship is mediated by a link function \((g)\). For instance, the link function might be expressed as: \(g(μ)=β_0+β_1X_1+β_2X_2\) This indirect connection allows GLMs to handle a wide range of distributions and relationships between predictors and the response variable. For example, in Poisson regression, the natural log function is often used as the link function: \(log(μ)=β_0+β_1X_1+β_2X_2\)

True or false, GLM estimates are obtained by maximizing the likelihood. (Discuss.) True. Suppose we have a Poisson regression. \(log(E[Y_i])=log(μ_i)=β_0+β_1\) This tells us that, \(e^{μ_i}=e^{β_0+β_1}\) Assuming \(Y_i\) follows an independent Poisson distribution with mean \(μ_i\), \(μ_i^{e^μ_i}/Yi!\) the likelihood function takes the form: \(∏μ_i{e^{−μ_i}}=L(β_0,β_1|Y)\) The likelihood depends on the parameters \(β_0\) and \(β_1\) and given the observed data Y.
True or false,some GLM distributions impose restrictions on the relationship between the mean and the variance. (Discuss.)

There is often an implied relationship between mean and variance. In poisson, mean and variance are the same.

Poisson Model

\(log(E[Y_i])=log(μ_i)=β_0+β_1X_i\)

Assume that,

\(Y_i∼Poisson(μ_i)⟹E[Yi_]=μ_i\)

However, for the Poisson distribution

\(var(Y_i)=E[Y_i]=μ_i\)

Thus, for the Poisson, Bernoulli distribution and many other instances of generalized linear models (GLM) there is an implied relationship between the mean and the variance.

Page 115

Execises

Load the dataset Seatbelts as part of the datasets package via data(Seatbelts). Use as.data.frame to convert the object to a dataframe. Create a new outcome variable for whether or not greater than 119 drivers were killed that month. Fit a logistic regression GLM with this variable as the outcome and kms, PetrolPrice and law as predictors. Interpret your parameters.


FALSE  TRUE 
   98    94


 0  1 
98 94

                Estimate Std. Error     z value   Pr(>|z|)
(Intercept)  0.024313512 0.16077499  0.15122695 0.87979669
ppn         -0.416407701 0.16973435 -2.45329074 0.01415559
mkm         -0.002938343 0.05984816 -0.04909663 0.96084229
law         -0.615522450 0.57780755 -1.06527242 0.28675267

            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.0243     0.1608  0.1512   0.8798
ppn          -0.4164     0.1697 -2.4533   0.0142
mkm          -0.0029     0.0598 -0.0491   0.9608
law          -0.6155     0.5778 -1.0653   0.2868

After the law came into effect, the log-odds of having more than 118 drivers killed in a month decreased by -0.6155. Additionally, a change of -0.0029 on the logit scale implies that, for every thousand-kilometer increase in driving distance, we estimate a reduction of 0.0029 in the log-odds of having more than 119 drivers killed in that month.

[1] 0.5403585

[1] 0.4596415

The odds ratio comparing after the law was enacted to before the law was 54%, indicating a 46% decrease in the odds of having more than 119 drivers killed in a month after the law, while holding other variables constant.

[1] 0.997066

[1] 0.00293403

This implies a roughly 0.3% reduction in the odds of more than 119 drivers being killed in a month for each additional thousand driver miles traveled in that month.

Fit a binomial model with DriversKilled as the outcome and drivers as the total count with kms , PetrolPrice and law as predictors, interpret your results.


Call:
glm(formula = cbind(DriversKilled, drivers - DriversKilled) ~ 
    ppn + mkm + law, family = binomial, data = stblts)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4371  -0.7270  -0.0235   0.7111   3.0313  

Coefficients:
             Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -2.536637   0.007399 -342.829   <2e-16 ***
ppn         -0.007829   0.007479   -1.047    0.295    
mkm          0.003645   0.002733    1.334    0.182    
law          0.030785   0.026527    1.161    0.246    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 234.93  on 191  degrees of freedom
Residual deviance: 229.93  on 188  degrees of freedom
AIC: 1496

Number of Fisher Scoring iterations: 3

Refer to Question 1. Use the anova function to compare models with just law, law and PetrolPrice and all three predictors.

Analysis of Deviance Table

Model 1: dkb ~ law
Model 2: dkb ~ law + PetrolPrice
Model 3: dkb ~ law + PetrolPrice + kms
  Resid. Df Resid. Dev Df Deviance
1       190     260.40            
2       189     253.62  1   6.7760
3       188     253.62  1   0.0024

               Estimate Std. Error    z value   Pr(>|z|)
(Intercept)  0.08288766  0.1539783  0.5383074 0.59036483
law         -1.12434153  0.4991985 -2.2522935 0.02430373

               Estimate Std. Error   z value    Pr(>|z|)
(Intercept)   3.5875552  1.3851773  2.589961 0.009598679
law          -0.6260282  0.5367204 -1.166396 0.243454555
PetrolPrice -34.3737200 13.4887754 -2.548320 0.010824304

                 Estimate   Std. Error     z value   Pr(>|z|)
(Intercept)  3.612261e+00 1.473634e+00  2.45126028 0.01423570
law         -6.155225e-01 5.778076e-01 -1.06527242 0.28675267
PetrolPrice -3.419952e+01 1.394026e+01 -2.45329074 0.01415559
kms         -2.938343e-06 5.984816e-05 -0.04909663 0.96084229

The analysis indicates a substantial decrease in the log odds associated with the law variable before and after its enactment, initially rendering it significant. However, the impact diminishes considerably when petrol prices are included in the model, where petrol price itself is found to be significant. Notably, the law variable loses its significance as the p-value increases by a factor of 10. The introduction of kilometers as a variable results in its non-significance, but it has minimal effect on the law variable. Considering model selection, the second model, incorporating petrol price without kilometers, might be preferred, highlighting the importance of careful variable selection and its influence on the significance and interpretation of coefficients, particularly in the context of predicting the odds of fatalities.

Page 124

Excercises

Load the dataset Seatbelts as part of the datasets package via data(Seatbelts). Use as.data.frame to convert the object to a dataframe. Fit a Poisson regression GLM with UKDriversKilled as the outcome and kms, PetrolPrice and law as predictors. Interpret your results.

                Estimate  Std. Error    z value     Pr(>|z|)
(Intercept)  4.819845137 0.007127388 676.242884 0.000000e+00
ppn         -0.055361338 0.007243262  -7.643150 2.119715e-14
mkm         -0.009980975 0.002614002  -3.818274 1.343887e-04
law         -0.114877106 0.025557951  -4.494770 6.964526e-06

(Intercept)         ppn         mkm         law 
     4.8198     -0.0554     -0.0100     -0.1149

[1] 0.8914757

[1] 0.1085243

The law decreased the expected number of drivers killed by 11%.

[1] 0.9900687

[1] 0.00993133

There is an approximate 1% reduction in the anticipated number of drivers killed. The intercept represents the expected number of drivers killed on the logarithmic scale, and when exponentiated, it provides the actual expected number.

[1] 123.9459

Refer to question 1. Fit a linear model with the log of drivers killed as the outcome. Interpret your results.


Call:
glm(formula = DriversKilled ~ ppn + mkm + law, family = poisson, 
    data = stblts)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-4.7909  -1.6247  -0.3526   1.2900   4.8720  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.819845   0.007127 676.243  < 2e-16 ***
ppn         -0.055361   0.007243  -7.643 2.12e-14 ***
mkm         -0.009981   0.002614  -3.818 0.000134 ***
law         -0.114877   0.025558  -4.495 6.96e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 984.50  on 191  degrees of freedom
Residual deviance: 778.32  on 188  degrees of freedom
AIC: 2059.1

Number of Fisher Scoring iterations: 4

                Estimate  Std. Error    z value     Pr(>|z|)
(Intercept)  4.819845137 0.007127388 676.242884 0.000000e+00
ppn         -0.055361338 0.007243262  -7.643150 2.119715e-14
mkm         -0.009980975 0.002614002  -3.818274 1.343887e-04
law         -0.114877106 0.025557951  -4.494770 6.964526e-06

                Estimate  Std. Error    t value      Pr(>|t|)
(Intercept)  4.805409776 0.014411832 333.435035 1.381656e-262
ppn         -0.053968063 0.014813218  -3.643237  3.481611e-04
mkm         -0.008189793 0.005325983  -1.537705  1.258020e-01
law         -0.131450856 0.048212881  -2.726468  7.007200e-03

[1] 0.1231776

Hence, this is a 12% decrease in the geometric mean number of driver deaths for enactment of the law prior to the law having been enacted.

Refer to question 1. Fit your Poisson log-linear model with drivers as a log offset (to consider the proportion of drivers killed of those killed or seriously injured.)

                Estimate  Std. Error     z value  Pr(>|z|)
(Intercept) -2.612798146 0.007122545 -366.834931 0.0000000
mkm          0.003377675 0.002630717    1.283937 0.1991640
ppn         -0.007255064 0.007199577   -1.007707 0.3135952
law          0.028484328 0.025512651    1.116479 0.2642173

                Estimate  Std. Error    z value     Pr(>|z|)
(Intercept)  4.819845137 0.007127388 676.242884 0.000000e+00
mkm         -0.009980975 0.002614002  -3.818274 1.343887e-04
ppn         -0.055361338 0.007243262  -7.643150 2.119715e-14
law         -0.114877106 0.025557951  -4.494770 6.964526e-06

Refer to Question 1. Use the anova function to compare models with just law, law and PetrolPrice and all three predictors.

Analysis of Deviance Table

Model 1: DriversKilled ~ law
Model 2: DriversKilled ~ law + PetrolPrice
Model 3: DriversKilled ~ law + PetrolPrice + kms
  Resid. Df Resid. Dev Df Deviance
1       190     870.06            
2       189     792.88  1   77.178
3       188     778.32  1   14.561

              Estimate  Std. Error   z value     Pr(>|z|)
(Intercept)  4.8352482 0.006856395 705.21727 0.000000e+00
law         -0.2274727 0.021923993 -10.37552 3.204779e-25

              Estimate Std. Error   z value     Pr(>|z|)
(Intercept)  5.3499077 0.05886323 90.887084 0.000000e+00
law         -0.1516301 0.02364153 -6.413720 1.420110e-10
PetrolPrice -5.0697421 0.57792672 -8.772292 1.750654e-18

                 Estimate   Std. Error   z value     Pr(>|z|)
(Intercept)  5.440656e+00 6.360114e-02 85.543371 0.000000e+00
law         -1.148771e-01 2.555795e-02 -4.494770 6.964526e-06
PetrolPrice -4.546821e+00 5.948884e-01 -7.643150 2.119715e-14
kms         -9.980975e-06 2.614002e-06 -3.818274 1.343887e-04

Problem Set 3

DEVINE FATHY MAE S. GRINO

2023-12-05

Page 52

Exercise

Page 58

Exercises

Page 72

Excercises

Page 80

Exercises

Page 91

Excercises

Evaluating the influence:

Page 100

Excercise

Page 104

Exercises

Page 115

Execises

Page 124

Excercises